Category:TXR: Difference between revisions

Content added Content deleted
(→‎Simple Query: Obsolete, dangling text removed.)
(Partial rewrite of intro paragraphs.)
Line 2: Line 2:
|site=http://www.nongnu.org/txr/}}
|site=http://www.nongnu.org/txr/}}


TXR is a new text extraction language implemented in [[C]], running on [[Linux]] and [[Cygwin]] (and possibly other [[POSIX]] platforms) as well as Windows.
TXR is a new language implemented in [[C]], running on POSIX platforms such as [[Linux]], [[Mac OS X]] and on Windows via [[Cygwin]] as well as more natively thanks to [[MinGW]].


TXR started as a language for "reversing here-documents": evaluating a template of text containing variables, plus useful pattern matching directives, against some body of text and binding pieces of the text which matches variables. The variable bindings were output in POSIX shell variable assignment syntax, allowing for shell code like
The source of a TXR query is literal text except for directives and variables preceded by the <code>@</code> character.


<code>eval $(txr <txr-program> <args> ...)</code>
Computation evolves by textual pattern matching with implicit backtracking. Non-pattern matching activities are embedded into a pattern matching paradigm. For instance, the line

TXR remains close to these roots: its main language is the pattern-based text extraction notation well suited for matching large regions of
entire text documents.

About the <code>@</code> character: this serves as a multi-level escape in TXR. In the fundamental TXR syntax, which is literal text, this character is a signal which indicates that the object which follows is a variable or directive. Then inside a directive, the character indicates that the object which follows is a TXR Lisp expression to be evaluated as such, rather than according to the expression evaluation rules of the pattern language. (And it is possible for TXR Lisp code to give this character additional meanings, since inside TXR Lisp, the notation expands to Lisp syntax: <code>@foo</code> denotes <code>(sys:var foo)</code>, and <code>@(foo ...)</code> denotes <code>(sys:expr foo>)</code>. In any context which needs to separate meta-variables and meta-expressions from variables and expressions, this may come in handy.)

Thus, the source of a TXR query is literal text except for directives and variables preceded by the <code>@</code> character.

In this pattern language, computation evolves by textual pattern matching with implicit backtracking. Non-pattern matching activities are embedded into a pattern matching paradigm. For instance, the line


<pre>Four score and seven years ago,</pre>
<pre>Four score and seven years ago,</pre>
Line 17: Line 26:


The success of a directive means that computation proceeds to the next directive (and, if this is a pattern match, the input position advances). Failure means that the enclosing query fails, triggering back-tracking behaviors and possibly failure of the entire query.
The success of a directive means that computation proceeds to the next directive (and, if this is a pattern match, the input position advances). Failure means that the enclosing query fails, triggering back-tracking behaviors and possibly failure of the entire query.

A bizarre feature of TXR is that the directives like <code>@(collect)</code> are independent pieces of Lisp. But, they are also de-facto "tokens" in a block-structure language. For instance <code>@(collect)</code> starts a block, which must be terminated by <code>@(end)</code>. Inside <code>@(collect)</code> there can be additional syntax, such as <code>@(collect :gap 0 :vars (a b c))</code>.


==Extremely Simple Query==
==Extremely Simple Query==