Inverted index: Difference between revisions

jq
(jq)
Line 1,623:
 
</lang>
 
=={{header|jq}}==
 
In the first part of this section, the core functions for computing an inverted index and for searching it are presented.
These functions will work with jq 1.4 as well as later (and possibly earlier) versions.
 
The second section shows how to accomplish the interactive task using a version of jq with support for 'input' and 'input_filename' (e.g. jq 1.5).
 
'''Part 1: inverted_index and search'''
<lang jq># Given an array of [ doc, array_of_distinct_words ]
# construct a lookup table: { word: array_of_docs }
def inverted_index:
reduce .[] as $pair
({};
$pair[0] as $doc
| reduce $pair[1][] as $word
(.; .[$word] += [$doc]));
 
def search(words):
def overlap(that): . as $this
| reduce that[] as $item ([]; if $this|index($item) then . + [$item] else . end);
. as $dict
| if (words|length) == 0 then []
else reduce words[1:][] as $word
( $dict[words[0]]; overlap( $dict[$word] ) )
end ; </lang>
 
'''Part 2: Interactive Search'''
 
In this section, a solution to the task is presented using two
invocations of jq: one parses the input files, and the other does
everything else. If your shell does not support <(...) then you
could create a temporary file to hold the parsed output.
 
<lang jq>def prompt_search:
"Enter a string or an array of strings to search for, quoting each string, or 0 to exit:",
( (input | if type == "array" then . elif type == "string" then [.] else empty end) as $in
| search($in), prompt_search ) ;
 
$in | inverted_index | prompt_search</lang>
 
'''Example''':
<lang sh>
$ jq -r -c -n --argfile in <(jq -R 'split(" ") | select(length>0) | [input_filename, unique]' T?.txt) -f Inverted_index.jq
Enter a string or an array of strings to search for, quoting each string, or 0 to exit:
"is"
["T0.txt","T1.txt","T2.txt"]
Enter a string or an array of strings to search for, quoting each string, or 0 to exit:
["is", "banana"]
["T2.txt"]
Enter a string or an array of strings to search for, quoting each string, or 0 to exit:
0
$</lang>
 
=={{header|OCaml}}==
2,489

edits