Category:TXR: Difference between revisions

Partial rewrite of intro paragraphs.
(→‎Simple Query: Obsolete, dangling text removed.)
(Partial rewrite of intro paragraphs.)
Line 2:
|site=http://www.nongnu.org/txr/}}
 
TXR is a new text extraction language implemented in [[C]], running on POSIX platforms such as [[Linux]] and, [[CygwinMac OS X]] (and possiblyon Windows othervia [[POSIXCygwin]] platforms) as well as Windowsmore natively thanks to [[MinGW]].
 
TXR started as a language for "reversing here-documents": evaluating a template of text containing variables, plus useful pattern matching directives, against some body of text and binding pieces of the text which matches variables. The variable bindings were output in POSIX shell variable assignment syntax, allowing for shell code like
The source of a TXR query is literal text except for directives and variables preceded by the <code>@</code> character.
 
<code>eval $(txr <txr-program> <args> ...)</code>
Computation evolves by textual pattern matching with implicit backtracking. Non-pattern matching activities are embedded into a pattern matching paradigm. For instance, the line
 
TXR remains close to these roots: its main language is the pattern-based text extraction notation well suited for matching large regions of
entire text documents.
 
About the <code>@</code> character: this serves as a multi-level escape in TXR. In the fundamental TXR syntax, which is literal text, this character is a signal which indicates that the object which follows is a variable or directive. Then inside a directive, the character indicates that the object which follows is a TXR Lisp expression to be evaluated as such, rather than according to the expression evaluation rules of the pattern language. (And it is possible for TXR Lisp code to give this character additional meanings, since inside TXR Lisp, the notation expands to Lisp syntax: <code>@foo</code> denotes <code>(sys:var foo)</code>, and <code>@(foo ...)</code> denotes <code>(sys:expr foo>)</code>. In any context which needs to separate meta-variables and meta-expressions from variables and expressions, this may come in handy.)
 
TheThus, the source of a TXR query is literal text except for directives and variables preceded by the <code>@</code> character.
 
ComputationIn this pattern language, computation evolves by textual pattern matching with implicit backtracking. Non-pattern matching activities are embedded into a pattern matching paradigm. For instance, the line
 
<pre>Four score and seven years ago,</pre>
Line 17 ⟶ 26:
 
The success of a directive means that computation proceeds to the next directive (and, if this is a pattern match, the input position advances). Failure means that the enclosing query fails, triggering back-tracking behaviors and possibly failure of the entire query.
 
A bizarre feature of TXR is that the directives like <code>@(collect)</code> are independent pieces of Lisp. But, they are also de-facto "tokens" in a block-structure language. For instance <code>@(collect)</code> starts a block, which must be terminated by <code>@(end)</code>. Inside <code>@(collect)</code> there can be additional syntax, such as <code>@(collect :gap 0 :vars (a b c))</code>.
 
==Extremely Simple Query==
543

edits