Web scraping: Difference between revisions

Content added Content deleted
m (added whitespace before the TOC (table of contents), added a ;Task: (bold) header.)
Line 12: Line 12:
{{omit from|Retro|Does not have network access.}}
{{omit from|Retro|Does not have network access.}}
{{omit from|ZX Spectrum Basic|Does not have network access.}}
{{omit from|ZX Spectrum Basic|Does not have network access.}}

Create a program that downloads the time from this URL: [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl] and then prints the current UTC time by extracting just the UTC time from the web page's [[HTML]].
;Task:
Create a program that downloads the time from this URL:   [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl]   and then prints the current UTC time by extracting just the UTC time from the web page's [[HTML]].


<!-- As of March 2014, the page is available
<!-- As of March 2014, the page is available
{{task|Networking and Web Interaction}}
{{task|Networking and Web Interaction}}


The page http://tycho.usno.navy.mil/cgi-bin/timer.pl is no longer available since july 2011. The relevant part of that page source looked like this:
The page http://tycho.usno.navy.mil/cgi-bin/timer.pl is no longer available since July 2011.
The relevant part of that page source looked like this:

<pre>
<pre>
...
...
Line 34: Line 36:
...
...
</pre>
</pre>

End of comment -->
End of comment -->

If possible, only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [[Boost]] for C++.
If possible, only use libraries that come at no ''extra'' monetary cost with the programming language and that are widely available and popular such as [http://www.cpan.org/ CPAN] for Perl or [[Boost]] for C++.
<br><br>


=={{header|8th}}==
=={{header|8th}}==