Web scraping: Difference between revisions

Content added Content deleted
(→‎Naive: Likewise.)
m (→‎Robust: FIx unnecessary use of quasiliteral to ordinary string literal.)
Line 1,668: Line 1,668:
If the web page changes too much, the query will fail to match. TXR will print the word "false" and terminate with a failed exit status. This is preferrable to finding a false positive match and printing a wrong result. (E.g. any random garbage that happened to be in a line of HTML accidentally containing the string UTC).
If the web page changes too much, the query will fail to match. TXR will print the word "false" and terminate with a failed exit status. This is preferrable to finding a false positive match and printing a wrong result. (E.g. any random garbage that happened to be in a line of HTML accidentally containing the string UTC).


<lang txr>@(next @(open-command `wget -c http://tycho.usno.navy.mil/cgi-bin/timer.pl -O - 2> /dev/null`))
<lang txr>@(next @(open-command "wget -c http://tycho.usno.navy.mil/cgi-bin/timer.pl -O - 2> /dev/null"))
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final"//EN>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final"//EN>
<html>
<html>