Web scraping: Difference between revisions

m
→‎{{header|Wren}}: Changed to Wren S/H
m (→‎{{header|REBOL}}: Remove vanity tags)
m (→‎{{header|Wren}}: Changed to Wren S/H)
 
(38 intermediate revisions by 18 users not shown)
Line 14:
 
;Task:
Create a program that downloads the time from this URL:   [http://tycho.usno.navy.mil/cgi-bin/timer.pl http://tycho.usno.navy.mil/cgi-bin/timer.pl]   and then prints the current UTC time by extracting just the UTC time from the web page's [[HTML]]. Alternatively, if the above url is not working, grab the first date/time off this page's talk page.
 
<!-- As of March 2014, the page is available
Line 42:
 
=={{header|8th}}==
<langsyntaxhighlight lang="forth">\ Web-scrape sample: get UTC time from the US Naval Observatory:
: read-url \ -- s
"http://tycho.usno.navy.mil/cgi-bin/timer.pl" net:get
Line 56:
 
get-time bye
</syntaxhighlight>
</lang>
{{output}}
<pre>14:08:20</pre>
Line 62:
=={{header|Ada}}==
{{libheader|AWS}}
<langsyntaxhighlight Adalang="ada">with AWS.Client, AWS.Response, AWS.Resources, AWS.Messages;
with Ada.Text_IO, Ada.Strings.Fixed;
use Ada, AWS, AWS.Resources, AWS.Messages;
Line 95:
end if;
end loop;
end Get_UTC_Time;</langsyntaxhighlight>
 
=={{header|AutoHotkey}}==
<lang AutoHotkey>UrlDownloadToFile, http://tycho.usno.navy.mil/cgi-bin/timer.pl, time.html
FileRead, timefile, time.html
pos := InStr(timefile, "UTC")
msgbox % time := SubStr(timefile, pos - 9, 8)</lang>
 
=={{header|AWK}}==
 
This is inspired by [http://www.gnu.org/software/gawk/manual/gawkinet/html_node/GETURL.html#GETURL GETURL] example in the manual for gawk.
 
<tt><lang awk>#! /usr/bin/awk -f
 
BEGIN {
purl = "/inet/tcp/0/tycho.usno.navy.mil/80"
ORS = RS = "\r\n\r\n"
print "GET /cgi-bin/timer.pl HTTP/1.0" |& purl
purl |& getline header
while ( (purl |& getline ) > 0 )
{
split($0, a, "\n")
for(i=1; i <= length(a); i++)
{
if ( a[i] ~ /UTC/ )
{
sub(/^<BR>/, "", a[i])
printf "%s\n", a[i]
}
}
}
close(purl)
}</lang></tt>
 
=={{header|ALGOL 68}}==
Line 134 ⟶ 102:
{{works with|ALGOL 68G|Any - tested with release [http://sourceforge.net/projects/algol68/files/algol68g/algol68g-1.18.0/algol68g-1.18.0-9h.tiny.el5.centos.fc11.i386.rpm/download 1.18.0-9h.tiny]}}
{{wont work with|ELLA ALGOL 68|Any (with appropriate job cards) - tested with release [http://sourceforge.net/projects/algol68/files/algol68toc/algol68toc-1.8.8d/algol68toc-1.8-8d.fc9.i386.rpm/download 1.8-8d] - due to extensive use of ''grep in string'' and ''http content''}}
<langsyntaxhighlight lang="algol68">STRING
domain="tycho.usno.navy.mil",
page="cgi-bin/timer.pl";
Line 175 ⟶ 143:
done: SKIP
ELSE raise error (strerror (rc))
FI</langsyntaxhighlight>
{{out}} Sample
<pre>
Line 185 ⟶ 153:
It also has powerful text and list processing language blocks that simplify text scraping.<br>
This is how the code would look if it could be typed:<br>
<langsyntaxhighlight lang="dos">
when ScrapeButton.Click do
set ScrapeWeb.Url to SourceTextBox.Text
Line 195 ⟶ 163:
set Right to select list item (list: get Left, index: 2)
set ResultLabel.Text to select list item (list: split at first (text:get Right, at: PostTextBox.Text), index: 1)
</syntaxhighlight>
</lang>
 
[https://lh5.googleusercontent.com/-lFRRPzsi5N0/UvBM9E_gZMI/AAAAAAAAKBQ/AuqDPGlwXNg/s1600/composite.png A picture of the graphical program]/
 
=={{header|AppleScript}}==
''"Alternatively, if the above url is not working, grab the first date/time off this page's talk page."'' At the time of posting, the first date/time on the talk page is in Michael Mol's invitation to RC's Slack. The code here's aimed specifically at the ''first'' date on a page (but can be modified) and doesn't include error handling.
 
<syntaxhighlight lang="applescript">use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
 
on firstDateonWebPage(URLText)
set |⌘| to current application
set pageURL to |⌘|'s class "NSURL"'s URLWithString:(URLText)
-- Fetch the page HTML as data.
-- The Xcode documentation advises against using dataWithContentsOfURL: over a network,
-- but I'm guessing this applies to downloading large files rather than a Web page's HTML.
-- If in doubt, the HTML can be fetched as text instead and converted to data in house.
set HTMLData to |⌘|'s class "NSData"'s dataWithContentsOfURL:(pageURL)
-- Or:
(*
set {HTMLText, encoding} to |⌘|'s class "NSString"'s stringWithContentsOfURL:(pageURL) ¬
usedEncoding:(reference) |error|:(missing value)
set HTMLData to HTMLText's dataUsingEncoding:(encoding)
*)
-- Extract the page's visible text from the HTML.
set straightText to (|⌘|'s class "NSAttributedString"'s alloc()'s initWithHTML:(HTMLData) ¬
documentAttributes:(missing value))'s |string|()
-- Use an NSDataDetector to locate the first date in the text. (It's assumed here there'll be one.)
set dateDetector to |⌘|'s class "NSDataDetector"'s dataDetectorWithTypes:(|⌘|'s NSTextCheckingTypeDate) ¬
|error|:(missing value)
set matchRange to dateDetector's rangeOfFirstMatchInString:(straightText) options:(0) ¬
range:({0, straightText's |length|()})
-- Return the date text found.
return (straightText's substringWithRange:(matchRange)) as text
end firstDateonWebPage
 
firstDateonWebPage("https://www.rosettacode.org/wiki/Talk:Web_scraping")</syntaxhighlight>
 
{{output}}
<syntaxhighlight lang="applescript">"20:59, 30 May 2020"</syntaxhighlight>
 
=={{header|AutoHotkey}}==
<syntaxhighlight lang="autohotkey">UrlDownloadToFile, http://tycho.usno.navy.mil/cgi-bin/timer.pl, time.html
FileRead, timefile, time.html
pos := InStr(timefile, "UTC")
msgbox % time := SubStr(timefile, pos - 9, 8)</syntaxhighlight>
 
=={{header|AWK}}==
 
This is inspired by [http://www.gnu.org/software/gawk/manual/gawkinet/html_node/GETURL.html#GETURL GETURL] example in the manual for gawk.
 
<tt><syntaxhighlight lang="awk">#! /usr/bin/awk -f
 
BEGIN {
purl = "/inet/tcp/0/tycho.usno.navy.mil/80"
ORS = RS = "\r\n\r\n"
print "GET /cgi-bin/timer.pl HTTP/1.0" |& purl
purl |& getline header
while ( (purl |& getline ) > 0 )
{
split($0, a, "\n")
for(i=1; i <= length(a); i++)
{
if ( a[i] ~ /UTC/ )
{
sub(/^<BR>/, "", a[i])
printf "%s\n", a[i]
}
}
}
close(purl)
}</syntaxhighlight></tt>
 
=={{header|BBC BASIC}}==
{{works with|BBC BASIC for Windows}}
Note that the URL cache is cleared so the code works correctly if run more than once.
<langsyntaxhighlight lang="bbcbasic"> SYS "LoadLibrary", "URLMON.DLL" TO urlmon%
SYS "GetProcAddress", urlmon%, "URLDownloadToFileA" TO UDTF%
SYS "LoadLibrary", "WININET.DLL" TO wininet%
Line 219 ⟶ 259:
IF INSTR(text$, "UTC") PRINT MID$(text$, 5)
UNTIL EOF#file%
CLOSE #file%</langsyntaxhighlight>
 
=={{header|C}}==
Line 228 ⟶ 268:
There is no proper error handling.
 
<langsyntaxhighlight lang="c">#include <stdio.h>
#include <string.h>
#include <curl/curl.h>
Line 275 ⟶ 315:
regfree(&cregex);
return 0;
}</langsyntaxhighlight>
 
=={{header|C sharp|C#}}==
<syntaxhighlight lang="csharp">class Program
{
static void Main(string[] args)
{
WebClient wc = new WebClient();
Stream myStream = wc.OpenRead("http://tycho.usno.navy.mil/cgi-bin/timer.pl");
string html = "";
using (StreamReader sr = new StreamReader(myStream))
{
while (sr.Peek() >= 0)
{
html = sr.ReadLine();
if (html.Contains("UTC"))
{
break;
}
}
}
Console.WriteLine(html.Remove(0, 4));
 
Console.ReadLine();
}
}
</syntaxhighlight>
 
=={{header|C++}}==
Line 281 ⟶ 348:
{{works with|Visual Studio| 2010 Express Edition with boost-1.46.1 from boostpro.com}}
{{works with|gcc|4.5.2 with boost-1.46.1, compiled with -lboost_regex -lboost_system -lboost_thread}}
<langsyntaxhighlight lang="cpp">#include <iostream>
#include <string>
#include <boost/asio.hpp>
Line 303 ⟶ 370:
}
}
}</langsyntaxhighlight>
 
=={{header|C sharp|C#}}==
<lang csharp>class Program
{
static void Main(string[] args)
{
WebClient wc = new WebClient();
Stream myStream = wc.OpenRead("http://tycho.usno.navy.mil/cgi-bin/timer.pl");
string html = "";
using (StreamReader sr = new StreamReader(myStream))
{
while (sr.Peek() >= 0)
{
html = sr.ReadLine();
if (html.Contains("UTC"))
{
break;
}
}
}
Console.WriteLine(html.Remove(0, 4));
 
Console.ReadLine();
}
}
</lang>
 
=={{header|Caché ObjectScript}}==
 
<langsyntaxhighlight lang="cos">
Class Utils.Net [ Abstract ]
{
Line 395 ⟶ 435:
 
}
</syntaxhighlight>
</lang>
{{out|Examples}}
<pre>
Line 415 ⟶ 455:
=={{header|Ceylon}}==
Don't forget to import ceylon.uri and ceylon.http.client in your module.ceylon file.
<langsyntaxhighlight lang="ceylon">import ceylon.uri {
parse
}
Line 445 ⟶ 485:
.filter((String element) => element.contains("UTC"))
.first
?.substring(4, 21);</langsyntaxhighlight>
 
=={{header|Clojure}}==
Clojure 1.2:
 
<langsyntaxhighlight lang="clojure">
(second (re-find #" (\d{1,2}:\d{1,2}:\d{1,2}) UTC" (slurp "http://tycho.usno.navy.mil/cgi-bin/timer.pl")))
</syntaxhighlight>
</lang>
 
=={{header|CoffeeScript}}==
{{works with|node.js}}
 
<langsyntaxhighlight lang="coffeescript">
http = require 'http'
 
Line 497 ⟶ 537:
wget CONFIG.host, CONFIG.path, (data) ->
scrape_tycho_ust_time data
</syntaxhighlight>
</lang>
{{out}}
<pre>
Line 509 ⟶ 549:
{{libheader|DRAKMA}}
 
<langsyntaxhighlight lang="lisp">BOA> (let* ((url "http://tycho.usno.navy.mil/cgi-bin/timer.pl")
(regexp (load-time-value
(cl-ppcre:create-scanner "(?m)^.{4}(.+? UTC)")))
Line 518 ⟶ 558:
(when start
(subseq data (aref start-regs 0) (aref end-regs 0)))))
"Aug. 12, 04:29:51 UTC"</langsyntaxhighlight>
 
Another Common Lisp solution
<langsyntaxhighlight lang="lisp">CL-USER> (cl-ppcre:do-matches-as-strings
(m ".*<BR>(.*)UTC.*"
(drakma:http-request "http://tycho.usno.navy.mil/cgi-bin/timer.pl"))
(print (cl-ppcre:regex-replace "<BR>(.*UTC).*" m "\\1")))
"Jul. 13, 06:32:01 UTC"</langsyntaxhighlight>
 
=={{header|D}}==
<langsyntaxhighlight lang="d">void main() {
import std.stdio, std.string, std.net.curl, std.algorithm;
 
Line 534 ⟶ 574:
if (line.canFind(" UTC"))
line[4 .. $].writeln;
}</langsyntaxhighlight>
 
=={{header|Delphi}}==
Line 542 ⟶ 582:
There are a number of ways to do this with Delphi using any one of a number of free/open source TCP/IP component suites such as, for example, ICS, Synapse and Indy (which ships with Delphi anyway). However, I thought it would be interesting to do this using the Winsock API direct.
 
<syntaxhighlight lang="delphi">
<lang Delphi>
 
program WebScrape;
Line 751 ⟶ 791:
end.
 
</syntaxhighlight>
</lang>
 
 
Example using Indy's IdHTTP component.
 
<langsyntaxhighlight Delphilang="delphi">program ReadUTCTime;
 
{$APPTYPE CONSOLE}
Line 783 ⟶ 823:
lReader.Free;
end;
end.</langsyntaxhighlight>
 
=={{header|E}}==
 
<langsyntaxhighlight lang="e">interp.waitAtTop(when (def html := <http://tycho.usno.navy.mil/cgi-bin/timer.pl>.getText()) -> {
def rx`(?s).*>(@time.*? UTC).*` := html
println(time)
})</langsyntaxhighlight>
 
=={{header|Erlang}}==
 
Using regular expressions:
<langsyntaxhighlight lang="erlang">-module(scraping).
-export([main/0]).
-define(Url, "http://tycho.usno.navy.mil/cgi-bin/timer.pl").
Line 804 ⟶ 844:
{ok, {_Status, _Header, HTML}} = httpc:request(?Url),
{match, [Time]} = re:run(HTML, ?Match, [{capture, all_but_first, binary}]),
io:format("~s~n",[Time]).</langsyntaxhighlight>
 
=={{header|F_Sharp|F#}}==
This code is asynchronous - it will not block any threads while it waits on a response from the remote server.
<langsyntaxhighlight lang="fsharp">
open System
open System.Net
Line 820 ⟶ 860:
|> Async.RunSynchronously
|> printfn "%s"
</syntaxhighlight>
</lang>
 
=={{header|Factor}}==
<langsyntaxhighlight lang="factor">USING: http.client io sequences ;
 
"http://tycho.usno.navy.mil/cgi-bin/timer.pl" http-get nip
[ "UTC" swap start [ 9 - ] [ 1 - ] bi ] keep subseq print</langsyntaxhighlight>
 
=={{header|Forth}}==
{{works with|GNU Forth|0.7.0}}
<langsyntaxhighlight lang="forth">include unix/socket.fs
 
: extract-time ( addr len type len -- time len )
Line 844 ⟶ 884:
s\" \r\n\r\n" search 0= abort" can't find headers!" \ skip headers
s" UTC" extract-time type cr
close-socket</langsyntaxhighlight>
 
=={{header|FunL}}==
<langsyntaxhighlight lang="funl">import io.Source
 
case Source.fromURL( 'http://tycho.usno.navy.mil/cgi-bin/timer.pl', 'UTF-8' ).getLines().find( ('Eastern' in) ) of
Some( time ) -> println( time.substring(4) )
None -> error( 'Easter time not found' )</langsyntaxhighlight>
 
{{out}}
Line 860 ⟶ 900:
 
=={{header|Gambas}}==
<langsyntaxhighlight lang="gambas">Public Sub Main()
Dim sWeb, sTemp, sOutput As String 'Variables
 
Line 876 ⟶ 916:
Print Mid(sOutput, 5) 'Print the result without the '<BR>' tag
 
End</langsyntaxhighlight>
Output:
<pre>
Line 883 ⟶ 923:
 
=={{header|Go}}==
<langsyntaxhighlight lang="go">package main
 
import (
Line 938 ⟶ 978:
// there is a human readable time in there somewhere.
fmt.Println(us)
}</langsyntaxhighlight>
{{out}}
<pre>
Line 945 ⟶ 985:
 
=={{header|Groovy}}==
<langsyntaxhighlight lang="groovy">def time = "unknown"
def text = new URL('http://tycho.usno.navy.mil/cgi-bin/timer.pl').eachLine { line ->
def matcher = (line =~ "<BR>(.+) UTC")
Line 952 ⟶ 992:
}
}
println "UTC Time was '$time'"</langsyntaxhighlight>
{{out}}
<pre>UTC Time was 'Feb. 26, 11:02:30'</pre>
Line 958 ⟶ 998:
=={{header|Haskell}}==
Using package HTTP-4000.0.8 from [http://hackage.haskell.org/packages/hackage.html HackgageDB]
<langsyntaxhighlight Haskelllang="haskell">import Data.List
import Network.HTTP (simpleHTTP, getResponseBody, getRequest)
 
Line 964 ⟶ 1,004:
 
readUTC = simpleHTTP (getRequest tyd)>>=
fmap ((!!2).head.dropWhile ("UTC"`notElem`).map words.lines). getResponseBody>>=putStrLn</langsyntaxhighlight>
Usage in GHCi:
<langsyntaxhighlight Haskelllang="haskell">*Main> readUTC
08:30:23</langsyntaxhighlight>
 
== Icon and Unicon ==
Line 973 ⟶ 1,013:
Icon has capability to read web pages using the external function cfunc. The Unicon messaging extensions are more succinct.
==={{header|Unicon}}===
<langsyntaxhighlight Uniconlang="unicon">procedure main()
m := open(url := "http://tycho.usno.navy.mil/cgi-bin/timer.pl","m") | stop("Unable to open ",url)
every (p := "") ||:= |read(m) # read the page into a single string
Line 979 ⟶ 1,019:
 
map(p) ? ( tab(find("<br>")), ="<br>", write("UTC time=",p[&pos:find(" utc")])) # scrape and show
end</langsyntaxhighlight>
 
=={{header|J}}==
<langsyntaxhighlight lang="j"> require 'web/gethttp'
 
_8{. ' UTC' taketo gethttp 'http://tycho.usno.navy.mil/cgi-bin/timer.pl'
04:32:44</langsyntaxhighlight>
 
The <code>[[J:Addons/web/gethttp|web/gethttp]]</code> addon uses Wget on Linux or Windows (J ships with Wget on Windows) and cURL on the Mac.
Line 992 ⟶ 1,032:
 
=={{header|Java}}==
The http://tycho.usno.navy.mil/cgi-bin/timer.pl address is no longer available, although the parsing of the text is incredibly simple.
<lang java>import java.io.BufferedReader;
<syntaxhighlight lang="java">
String scrapeUTC() throws URISyntaxException, IOException {
String address = "http://tycho.usno.navy.mil/cgi-bin/timer.pl";
URL url = new URI(address).toURL();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()))) {
Pattern pattern = Pattern.compile("^.+? UTC");
Matcher matcher;
String line;
while ((line = reader.readLine()) != null) {
matcher = pattern.matcher(line);
if (matcher.find())
return matcher.group().replaceAll("<.+?>", "");
}
}
return null;
}
</syntaxhighlight>
I'm using a cached page and get the following output.
<pre>
Jun. 25, 17:59:15 UTC
</pre>
<br />
 
Alternately, using Java 8, with the new web address given in the task description.
<syntaxhighlight lang="java">
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.URL;
import java.net.URLConnection;
 
public final class WebScraping {
 
public class WebTime{
public static void main(String[] argsaArgs) {
try {
URI uri = new URI("https://www.rosettacode.org/wiki/Talk:Web_scraping").parseServerAuthority();
URL address = new URL(
URL address = uri.toURL();
"http://tycho.usno.navy.mil/cgi-bin/timer.pl");
URLConnection conn HttpURLConnection connection = (HttpURLConnection) address.openConnection();
BufferedReader inreader = new BufferedReader( new InputStreamReader(connection.getInputStream()) );
new InputStreamReader(conn.getInputStream()));
final int responseCode = connection.getResponseCode();
String line;
System.out.println("Response code: " + responseCode);
while(!(line = in.readLine()).contains("UTC"));
System.out.println(line.substring(4));
String line;
}catch(IOException e){
while ( ! ( line = reader.readLine() ).contains("UTC") ) {
System.err.println("error connecting to server.");
e.printStackTrace(); /* Empty block */
}
}
final int index = line.indexOf("UTC");
}</lang>
System.out.println(line.substring(index - 16, index + 4));
reader.close();
connection.disconnect();
} catch (IOException ioe) {
System.err.println("Error connecting to server: " + ioe.getCause());
} catch (URISyntaxException use) {
System.err.println("Unable to connect to URI: " + use.getCause());
}
}
}
</syntaxhighlight>
{{ out }}
<pre>
Response code: 200
25 August 2022 (UTC)
</pre>
 
=={{header|JavaScript}}==
Due to browser cross-origin restrictions, this script will probably not work in other domains.
 
<langsyntaxhighlight lang="javascript">var req = new XMLHttpRequest();
req.onload = function () {
var re = /[JFMASOND].+ UTC/; //beginning of month name to 'UTC'
Line 1,026 ⟶ 1,112:
};
req.open('GET', 'http://tycho.usno.navy.mil/cgi-bin/timer.pl', true);
req.send();</langsyntaxhighlight>
 
=={{header|jq}}==
Line 1,032 ⟶ 1,118:
 
Currently jq does not have built-in curl support, but jq is intended to work seamlessly with other command-line utilities, so we present a simple solution to the problem in the form of a three-line script:
<langsyntaxhighlight lang="sh">#!/bin/bash
curl -Ss 'http://tycho.usno.navy.mil/cgi-bin/timer.pl' |\
jq -R -r 'if index(" UTC") then .[4:] else empty end'</langsyntaxhighlight>
{{out}}
<langsyntaxhighlight lang="sh">$ ./Web_scraping.jq
Apr. 21, 05:19:32 UTC Universal Time</langsyntaxhighlight>
 
=={{header|Julia}}==
I'm using the <code>Requests.jl</code> package for this solution. Note, I used a slightly different URL after finding that the one specified in the task description is deprecated (though it still works).
<langsyntaxhighlight Julialang="julia">using Requests, Printf
function getusnotime()
Line 1,063 ⟶ 1,149:
println("Failed to fetch UNSO time:\n", t)
end
</syntaxhighlight>
</lang>
 
{{out}}
Line 1,080 ⟶ 1,166:
 
By breaking the regular expression (using <code>&lt;BR&gt;(.*UTd)</code>)
<langsyntaxhighlight lang="html">
Failed to fetch UNSO time:
raw html:
Line 1,098 ⟶ 1,184:
 
</body></html>
</syntaxhighlight>
</lang>
 
=={{header|Kotlin}}==
<langsyntaxhighlight lang="scala">// version 1.1.3
 
import java.net.URL
Line 1,119 ⟶ 1,205:
}
sc.close()
}</langsyntaxhighlight>
 
Sample output:
Line 1,127 ⟶ 1,213:
 
=={{header|Lasso}}==
<langsyntaxhighlight Lassolang="lasso">/* have to be used
local(raw_htmlstring = '<TITLE>What time is it?</TITLE>
<H2> US Naval Observatory Master Clock Time</H2> <H3><PRE>
Line 1,153 ⟶ 1,239:
// added bonus showing how parsed string can be converted to date object
local(mydate = date(#datepart_txt, -format = `MMM'.' dd',' HH:mm:ss`))
#mydate -> format(`YYYY-MM-dd HH:mm:ss`)</langsyntaxhighlight>
Result:
Jul. 27, 22:57:22
Line 1,160 ⟶ 1,246:
 
=={{header|Liberty BASIC}}==
<langsyntaxhighlight lang="lb">if DownloadToFile("http://tycho.usno.navy.mil/cgi-bin/timer.pl", DefaultDir$ + "\timer.htm") = 0 then
open DefaultDir$ + "\timer.htm" for input as #f
html$ = lower$(input$(#f, LOF(#f)))
Line 1,182 ⟶ 1,268:
DownloadToFile as ulong '0=success
close #url
end function</langsyntaxhighlight>
 
Because Liberty has to do web operations in ways like this, calling Windows DLLs, there is the [[#Run BASIC|Run BASIC]] variant of LB, in which the task becomes a one-liner.
Line 1,189 ⟶ 1,275:
{{libheader|LuaSocket}}
The web page is split on the HTML line break tags. Each line is checked for the required time zone code. Once it is found, we return the instance within that line of three numbers separated by colons - I.E. the time.
<syntaxhighlight lang="lua">
<lang Lua>
local http = require("socket.http") -- Debian package is 'lua-socket'
 
Line 1,205 ⟶ 1,291:
print(scrapeTime(url, "UTC"))
 
</syntaxhighlight>
</lang>
The task description states "just the UTC time" but of course we could return the whole line including the zone name and date if required.
 
=={{header|M2000 Interpreter}}==
<syntaxhighlight lang="m2000 interpreter">
Module Web_scraping {
Print "Web scraping"
function GetTime$(a$, what$="UTC") {
document a$ ' change string to document
find a$, what$ ' place data to stack
Read find_pos
if find_pos>0 then
read par_order, par_pos
b$=paragraph$(a$, par_order)
k=instr(b$,">")
if k>0 then if k<par_pos then b$=mid$(b$,k+1) :par_pos-=k
k=rinstr(b$,"<")
if k>0 then if k>par_pos then b$=Left(b$,k-1)
=b$
end if
}
declare msxml2 "MSXML2.XMLHTTP.6.0"
rem print type$(msxml2)="IXMLHTTPRequest"
Url$ = "http://tycho.usno.navy.mil/cgi-bin/timer.pl"
try ok {
method msxml2, "Open", "GET", url$, false
method msxml2,"Send"
with msxml2,"responseText" as txt$
Print GetTime$(txt$)
}
If error or not ok then Print Error$
declare msxml2 nothing
}
Web_scraping
</syntaxhighlight>
 
=={{header|Maple}}==
<langsyntaxhighlight Maplelang="maple">text := URL:-Get("http://tycho.usno.navy.mil/cgi-bin/timer.pl"):
printf(StringTools:-StringSplit(text,"<BR>")[2]);</langsyntaxhighlight>
{{Out|Output}}
<pre>May. 16, 20:17:28 UTC Universal Time</pre>
 
=={{header|Mathematica}}/{{header|Wolfram Language}}==
<syntaxhighlight lang="mathematica">test = StringSplit[Import["http://tycho.usno.navy.mil/cgi-bin/timer.pl"], "\n"];
<lang mathematica>
Extract[test, Flatten@Position[StringFreeQ[test, "UTC"], False]]</syntaxhighlight>
test = StringSplit[Import["http://tycho.usno.navy.mil/cgi-bin/timer.pl"], "\n"];
Extract[test, Flatten@Position[StringFreeQ[test, "UTC"], False]]
</lang>
 
=={{header|MATLAB}} / {{header|Octave}}==
<langsyntaxhighlight MATLABlang="matlab">s = urlread('http://tycho.usno.navy.mil/cgi-bin/timer.pl');
ix = [findstr(s,'<BR>'), length(s)+1];
for k = 2:length(ix)
Line 1,228 ⟶ 1,345:
disp(tok);
end;
end;</langsyntaxhighlight>
 
=={{header|Microsoft Small Basic}}==
<syntaxhighlight lang="vb">
'Entered by AykayayCiti -- Earl L. Montgomery
url_name = "http://tycho.usno.navy.mil/cgi-bin/timer.pl"
url_data = Network.GetWebPageContents(url_name)
find = "UTC"
' the length from the UTC to the time is -18 so we need
' to subtract from the UTC position
pos = Text.GetIndexOf(url_data,find)-18
result = Text.GetSubText(url_data,pos,(18+3)) 'plus 3 to add the UTC
TextWindow.WriteLine(result)
 
'you can eleminate a line of code by putting the
' GetIndexOf insde the GetSubText
'result2 = Text.GetSubText(url_data,Text.GetIndexOf(url_data,find)-18,(18+3))
'TextWindow.WriteLine(result2)</syntaxhighlight>
{{out}}
<pre>
Mar. 19, 04:19:34 UTC
Press any key to continue...
</pre>
 
=={{header|mIRC Scripting Language}}==
<langsyntaxhighlight lang="mirc">alias utc {
sockclose UTC
sockopen UTC tycho.usno.navy.mil 80
Line 1,253 ⟶ 1,392:
sockread %UTC
}
}</langsyntaxhighlight>
 
 
 
=={{header|Microsoft Small Basic}}==
<lang vb>
'Entered by AykayayCiti -- Earl L. Montgomery
url_name = "http://tycho.usno.navy.mil/cgi-bin/timer.pl"
url_data = Network.GetWebPageContents(url_name)
find = "UTC"
' the length from the UTC to the time is -18 so we need
' to subtract from the UTC position
pos = Text.GetIndexOf(url_data,find)-18
result = Text.GetSubText(url_data,pos,(18+3)) 'plus 3 to add the UTC
TextWindow.WriteLine(result)
 
'you can eleminate a line of code by putting the
' GetIndexOf insde the GetSubText
'result2 = Text.GetSubText(url_data,Text.GetIndexOf(url_data,find)-18,(18+3))
'TextWindow.WriteLine(result2)</lang>
{{out}}
<pre>
Mar. 19, 04:19:34 UTC
Press any key to continue...
</pre>
 
=={{header|NetRexx}}==
<langsyntaxhighlight NetRexxlang="netrexx">/* NetRexx */
options replace format comments java crossref symbols binary
 
Line 1,316 ⟶ 1,431:
method isFalse() public constant returns boolean
return \isTrue()
</syntaxhighlight>
</lang>
 
{{out}}
Line 1,328 ⟶ 1,443:
 
=={{header|Nim}}==
Using Rosetta Talk page URL.
<lang nim>import httpclient, strutils
<syntaxhighlight lang="nim">import httpclient, strutils
 
var client = newHttpClient()
for line in getContent("http://tycho.usno.navy.mil/cgi-bin/timer.pl").splitLines:
 
if " UTC" in line:
var res: string
echo line[4..line.high]</lang>
for line in client.getContent("https://rosettacode.org/wiki/Talk:Web_scraping").splitLines:
let k = line.find("UTC")
if k >= 0:
res = line[0..(k - 3)]
let k = res.rfind("</a>")
res = res[(k + 6)..^1]
break
echo if res.len > 0: res else: "No date/time found."</syntaxhighlight>
 
{{out}}
<pre>20:59, 30 May 2020</pre>
 
=={{header|Objeck}}==
<langsyntaxhighlight lang="objeck">
use Net;
use IO;
Line 1,361 ⟶ 1,488:
}
}
</syntaxhighlight>
</lang>
 
=={{header|OCaml}}==
 
<langsyntaxhighlight lang="ocaml">let () =
let _,_, page_content = make_request ~url:Sys.argv.(1) ~kind:GET () in
 
Line 1,378 ⟶ 1,505:
let str = Str.global_replace (Str.regexp "<BR>") "" str in
print_endline str;
;;</langsyntaxhighlight>
 
There are libraries for this, but it's rather interesting to see how to use a socket to achieve this, so see the implementation of the above function <tt>make_request</tt> on [[Web_Scraping/OCaml|this page]].
Line 1,387 ⟶ 1,514:
As an alternative the supplied rxSock socket library could be used
 
<syntaxhighlight lang="oorexx">
<lang ooRexx>
/* load the RexxcURL library */
Call RxFuncAdd 'CurlLoadFuncs', 'rexxcurl', 'CurlLoadFuncs'
Line 1,410 ⟶ 1,537:
utcTime = content~substr(content~lastpos('<BR>') + 4)
say utcTime
end</langsyntaxhighlight>
 
=={{header|Oz}}==
<langsyntaxhighlight lang="oz">declare
[Regex] = {Module.link ['x-oz://contrib/regex']}
 
Line 1,432 ⟶ 1,559:
Url = "http://tycho.usno.navy.mil/cgi-bin/timer.pl"
in
{System.showInfo {GetDateString {GetPage Url}}}</langsyntaxhighlight>
 
=={{header|Peloton}}==
English dialect, short form, using integrated Rexx pattern matcher:
<langsyntaxhighlight lang="html"><@ DEFAREPRS>Rexx Parse</@>
<@ DEFPRSLIT>Rexx Parse|'<BR>' UTCtime 'UTC'</@>
<@ LETVARURL>timer|http://tycho.usno.navy.mil/cgi-bin/timer.pl</@>
<@ ACTRPNPRSVAR>Rexx Parse|timer</@>
<@ SAYVAR>UTCtime</@></langsyntaxhighlight>
 
English dialect, padded variable-length form:
<langsyntaxhighlight lang="html"><# DEFINE WORKAREA PARSEVALUES>Rexx Parse</#>
<# DEFINE PARSEVALUES LITERAL>Rexx Parse|'<BR>' UTCtime 'UTC'</#>
<# LET VARIABLE URLSOURCE>timer|http://tycho.usno.navy.mil/cgi-bin/timer.pl</#>
<# ACT REPLACEBYPATTERN PARSEVALUES VARIABLE>Rexx Parse|timer</#>
<# SAY VARIABLE>UTCtime</#></langsyntaxhighlight>
 
English dialect, padded short form, using string functions AFT and BEF:
<langsyntaxhighlight lang="html"><@ SAY AFT BEF URL LIT LIT LIT >http://tycho.usno.navy.mil/cgi-bin/timer.pl| UTC|<BR></@></langsyntaxhighlight>
 
=={{header|Perl}}==
{{libheader|LWP}}
<langsyntaxhighlight lang="perl">use LWP::Simple;
 
my $url = 'http://tycho.usno.navy.mil/cgi-bin/timer.pl';
get($url) =~ /<BR>(.+? UTC)/
and print "$1\n";</langsyntaxhighlight>
 
=={{header|Perl 6Phix}}==
{{libheader|Phix/libcurl}}
<lang perl6>use HTTP::Client; # https://github.com/supernovus/perl6-http-client/
<!--<syntaxhighlight lang="phix">(notonline)-->
my $site = "http://tycho.usno.navy.mil/cgi-bin/timer.pl";
<span style="color: #000080;font-style:italic;">--
HTTP::Client.new.get($site).content.match(/'<BR>'( .+? <ws> UTC )/)[0].say</lang>
-- demo\rosetta\web_scrape.exw
 
-- ===========================
Note that the string between '<' and '>' refers to regex tokens, so to match a literal '&lt;BR&gt;' you need to quote it, while <ws> refers to the built-in token whitespace.
--</span>
Also, whitespace is ignored by default in Perl&nbsp;6 regexes.
<span style="color: #008080;">without</span> <span style="color: #008080;">js</span> <span style="color: #000080;font-style:italic;">-- (libcurl)</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">\</span><span style="color: #000000;">libcurl</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
<span style="color: #008080;">include</span> <span style="color: #000000;">builtins</span><span style="color: #0000FF;">\</span><span style="color: #004080;">timedate</span><span style="color: #0000FF;">.</span><span style="color: #000000;">e</span>
<span style="color: #004080;">object</span> <span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">curl_easy_perform_ex</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"https://rosettacode.org/wiki/Talk:Web_scraping"</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">if</span> <span style="color: #004080;">string</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">split</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #008000;">'\n'</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">for</span> <span style="color: #000000;">i</span><span style="color: #0000FF;">=</span><span style="color: #000000;">1</span> <span style="color: #008080;">to</span> <span style="color: #7060A8;">length</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">do</span>
<span style="color: #008080;">if</span> <span style="color: #008080;">not</span> <span style="color: #7060A8;">match</span><span style="color: #0000FF;">(</span><span style="color: #008000;">`&lt;div id="siteNotice"&gt;`</span><span style="color: #0000FF;">,</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])</span> <span style="color: #008080;">then</span> <span style="color: #000080;font-style:italic;">-- (24/11/22, exclude notice)</span>
<span style="color: #004080;">integer</span> <span style="color: #000000;">k</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">match</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"UTC"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">])</span>
<span style="color: #008080;">if</span> <span style="color: #000000;">k</span> <span style="color: #008080;">then</span>
<span style="color: #004080;">string</span> <span style="color: #000000;">line</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">i</span><span style="color: #0000FF;">]</span> <span style="color: #000080;font-style:italic;">-- (debug aid)</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #000000;">line</span><span style="color: #0000FF;">[</span><span style="color: #000000;">1</span><span style="color: #0000FF;">..</span><span style="color: #000000;">k</span><span style="color: #0000FF;">-</span><span style="color: #000000;">3</span><span style="color: #0000FF;">]</span>
<span style="color: #000000;">k</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">rmatch</span><span style="color: #0000FF;">(</span><span style="color: #008000;">"&lt;/a&gt;"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span>
<span style="color: #000000;">res</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">trim</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">[</span><span style="color: #000000;">k</span><span style="color: #0000FF;">+</span><span style="color: #000000;">5</span><span style="color: #0000FF;">..$])</span>
<span style="color: #008080;">exit</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">for</span>
<span style="color: #0000FF;">?</span><span style="color: #000000;">res</span>
<span style="color: #008080;">if</span> <span style="color: #004080;">string</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)</span> <span style="color: #008080;">then</span>
<span style="color: #004080;">timedate</span> <span style="color: #000000;">td</span> <span style="color: #0000FF;">=</span> <span style="color: #7060A8;">parse_date_string</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span> <span style="color: #0000FF;">{</span><span style="color: #008000;">"hh:mm, d Mmmm yyyy"</span><span style="color: #0000FF;">})</span>
<span style="color: #0000FF;">?</span><span style="color: #7060A8;">format_timedate</span><span style="color: #0000FF;">(</span><span style="color: #000000;">td</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"Dddd Mmmm ddth yyyy h:mpm"</span><span style="color: #0000FF;">)</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<span style="color: #008080;">else</span>
<span style="color: #0000FF;">?{</span><span style="color: #008000;">"some error"</span><span style="color: #0000FF;">,</span><span style="color: #000000;">res</span><span style="color: #0000FF;">,</span><span style="color: #7060A8;">curl_easy_strerror</span><span style="color: #0000FF;">(</span><span style="color: #000000;">res</span><span style="color: #0000FF;">)}</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">if</span>
<!--</syntaxhighlight>-->
{{out}}
(From/as per talk page)
<pre>
"20:53, 20 August 2008"
"Wednesday August 20th 2008 8:53pm"
</pre>
 
=={{header|PHP}}==
By iterating over each line:
 
<syntaxhighlight lang="php"><?
<lang PHP><?
 
$contents = file('http://tycho.usno.navy.mil/cgi-bin/timer.pl');
Line 1,478 ⟶ 1,639:
echo subStr($line, 4, $pos - 4); //Prints something like "Dec. 06, 16:18:03"
break;
}</langsyntaxhighlight>
 
By [[regular expressions]] ({{works with|PHP|4.3.0}}):
 
<syntaxhighlight lang="php"><?
<lang PHP><?
 
echo preg_replace(
Line 1,489 ⟶ 1,650:
file_get_contents('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
);
</syntaxhighlight>
</lang>
 
=={{header|PicoLisp}}==
<langsyntaxhighlight PicoLisplang="picolisp">(load "@lib/http.l")
 
(client "tycho.usno.navy.mil" 80 "cgi-bin/timer.pl"
(when (from "<BR>")
(pack (trim (till "U"))) ) )</langsyntaxhighlight>
{{out}}
<pre>-> "Feb. 19, 18:11:37"</pre>
 
=={{header|PowerShell}}==
<langsyntaxhighlight lang="powershell">$wc = New-Object Net.WebClient
$html = $wc.DownloadString('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
$html -match ', (.*) UTC' | Out-Null
Write-Host $Matches[1]</langsyntaxhighlight>
 
===fyi===
.NET provides a property named '''UtcNow''':
<syntaxhighlight lang="powershell">
<lang PowerShell>
[System.DateTime]::UtcNow
</syntaxhighlight>
</lang>
{{Out}}
<pre>
Line 1,516 ⟶ 1,677:
</pre>
I am currently in the Pacific timezone:
<syntaxhighlight lang="powershell">
<lang PowerShell>
[System.DateTime]::Now
</syntaxhighlight>
</lang>
{{Out}}
<pre>
Line 1,525 ⟶ 1,686:
 
=={{header|PureBasic}}==
<langsyntaxhighlight Purebasiclang="purebasic">URLDownloadToFile_( #Null, "http://tycho.usno.navy.mil/cgi-bin/timer.pl", "timer.htm", 0, #Null)
ReadFile(0, "timer.htm")
While Not Eof(0) : Text$ + ReadString(0) : Wend
MessageRequester("Time", Mid(Text$, FindString(Text$, "UTC", 1) - 9 , 8))</langsyntaxhighlight>
 
=={{header|Python}}==
<langsyntaxhighlight lang="python">import urllib
page = urllib.urlopen('http://tycho.usno.navy.mil/cgi-bin/timer.pl')
for line in page:
Line 1,537 ⟶ 1,698:
print line.strip()[4:]
break
page.close()</langsyntaxhighlight>
{{out}}
<pre>Aug. 12, 15:22:08 UTC Universal Time</pre>
Line 1,549 ⟶ 1,710:
Read the page as lines, find the line containing the string "UTC", then extract the portion of that string that is the date.
 
<syntaxhighlight lang="r">
<lang R>
all_lines <- readLines("http://tycho.usno.navy.mil/cgi-bin/timer.pl")
utc_line <- grep("UTC", all_lines, value = TRUE)
matched <- regexpr("(\\w{3}.*UTC)", utc_line)
utc_time_str <- substring(line, matched, matched + attr(matched, "match.length") - 1L)
</syntaxhighlight>
</lang>
 
The last three lines can be made simpler by using {{libheader|stringr}}
 
<langsyntaxhighlight Rlang="r">
library(stringr)
utc_line <- all_lines[str_detect(all_lines, "UTC")]
utc_time_str <- str_extract(utc_line, "\\w{3}.*UTC")
</syntaxhighlight>
</lang>
 
Finally, the date and time must be parsed and printed in the desired format.
 
<langsyntaxhighlight Rlang="r">
utc_time <- strptime(utc_time_str, "%b. %d, %H:%M:%S UTC")
strftime(utc_time, "%A, %d %B %Y, %H:%M:%S")
</syntaxhighlight>
</lang>
 
Friday, 13 May 2011, 15:12:20
Line 1,580 ⟶ 1,741:
First, retrieve the web page. See [[HTTP_Request]] for more options with this.
 
<syntaxhighlight lang="r">
<lang R>
library(RCurl)
web_page <- getURL("http://tycho.usno.navy.mil/cgi-bin/timer.pl")
</syntaxhighlight>
</lang>
 
Now parse the html code into a tree and retrieve the pre node that contains interesting bit. Without xPath, the syntax is quite clunky.
 
<syntaxhighlight lang="r">
<lang R>
library(XML)
page_tree <- htmlTreeParse(webpage)
Line 1,593 ⟶ 1,754:
times_node <- times_node[names(times_node) == "text"]
time_lines <- sapply(times_node, function(x) x$value)
</syntaxhighlight>
</lang>
 
Here, xPath simplifies things a little bit.
 
<syntaxhighlight lang="r">
<lang R>
page_tree <- htmlTreeParse(web_page, useInternalNodes = TRUE)
times_node <- xpathSApply(page_tree, "//pre")[[1]]
times_node <- times_node[names(times_node) == "text"]
time_lines <- sapply(times_node, function(x) as(x, "character"))
</syntaxhighlight>
</lang>
 
Either way, the solution proceeds from here as in the regex method.
<syntaxhighlight lang="r">
<lang R>
utc_line <- time_lines[str_detect(time_lines, "UTC")]
#etc.
</syntaxhighlight>
</lang>
 
=={{header|Racket}}==
 
<syntaxhighlight lang="racket">
<lang Racket>
#lang racket
(require net/url)
((compose1 car (curry regexp-match #rx"[^ <>][^<>]+ UTC")
port->string get-pure-port string->url)
"httphttps://tycho.usno.navy.mil/cgi-bin/timer.pl")
</syntaxhighlight>
</lang>
 
=={{header|Raku}}==
(formerly Perl 6)
<syntaxhighlight lang="raku" line># 20210301 Updated Raku programming solution
 
use HTTP::Client; # https://github.com/supernovus/perl6-http-client/
 
#`[ Site inaccessible since 2019 ?
my $site = "http://tycho.usno.navy.mil/cgi-bin/timer.pl";
HTTP::Client.new.get($site).content.match(/'<BR>'( .+? <ws> UTC )/)[0].say
# ]
 
my $site = "https://www.utctime.net/";
my $matched = HTTP::Client.new.get($site).content.match(
/'<td>UTC</td><td>'( .*Z )'</td>'/
)[0];
 
say $matched;
#$matched = '12321321:412312312 123';
with DateTime.new($matched.Str) {
say 'The fetch result seems to be of a valid time format.'
} else {
CATCH { put .^name, ': ', .Str }
}</syntaxhighlight>
 
Note that the string between '<' and '>' refers to regex tokens, so to match a literal '&lt;BR&gt;' you need to quote it, while <ws> refers to the built-in token whitespace.
Also, whitespace is ignored by default in Raku regexes.
{{out}}
<pre>
「2021-03-01T17:02:37Z」
The fetch result seems to be of a valid time format.
</pre>
 
=={{header|REBOL}}==
<langsyntaxhighlight REBOLlang="rebol">REBOL [
Title: "Web Scraping"
URL: http://rosettacode.org/wiki/Web_Scraping
Line 1,645 ⟶ 1,838:
parse html [thru <br> copy current thru "UTC" to end]
 
print ["Current UTC time:" current]</langsyntaxhighlight>
 
=={{header|Run BASIC}}==
<lang runbasic>print word$(word$(httpget$("http://tycho.usno.navy.mil/cgi-bin/timer.pl"),1,"UTC"),2,"<BR>")</lang>
{{out}}
<pre>May. 09, 16:13:44</pre>
 
=={{header|Ruby}}==
Line 1,656 ⟶ 1,844:
A verbose example for comparison
 
<langsyntaxhighlight lang="ruby">require "open-uri"
 
open('http://tycho.usno.navy.mil/cgi-bin/timer.pl') do |p|
Line 1,666 ⟶ 1,854:
end
end
</syntaxhighlight>
</lang>
 
A more concise example
 
<langsyntaxhighlight lang="ruby">require 'open-uri'
puts URI.parse('http://tycho.usno.navy.mil/cgi-bin/timer.pl').read.match(/ (\d{1,2}:\d{1,2}:\d{1,2}) UTC/)[1]
</syntaxhighlight>
</lang>
 
=={{header|Run BASIC}}==
<syntaxhighlight lang="runbasic">print word$(word$(httpget$("http://tycho.usno.navy.mil/cgi-bin/timer.pl"),1,"UTC"),2,"<BR>")</syntaxhighlight>
{{out}}
<pre>May. 09, 16:13:44</pre>
 
=={{header|Rust}}==
{{trans|Raku}}
 
<syntaxhighlight lang="rust">// 202100302 Rust programming solution
 
use std::io::Read;
use regex::Regex;
 
fn main() {
 
let client = reqwest::blocking::Client::new();
let site = "https://www.utctime.net/";
let mut res = client.get(site).send().unwrap();
let mut body = String::new();
 
res.read_to_string(&mut body).unwrap();
 
let re = Regex::new(r#"<td>UTC</td><td>(.*Z)</td>"#).unwrap();
let caps = re.captures(&body).unwrap();
 
println!("Result : {:?}", caps.get(1).unwrap().as_str());
}</syntaxhighlight>
{{out}}
<pre>
Result : "2021-03-02T16:27:02Z"
</pre>
 
=={{header|Scala}}==
 
<langsyntaxhighlight lang="scala">
import scala.io.Source
 
Line 1,687 ⟶ 1,907:
}
}
</syntaxhighlight>
</lang>
 
=={{header|Scheme}}==
{{works with|Guile}}
<langsyntaxhighlight lang="scheme">; Use the regular expression module to parse the url
(use-modules (ice-9 regex) (ice-9 rdelim))
 
Line 1,724 ⟶ 1,944:
; Display result
(display time)
(newline)</langsyntaxhighlight>
 
=={{header|Seed7}}==
<langsyntaxhighlight lang="seed7">$ include "seed7_05.s7i";
include "gethttp.s7i";
 
Line 1,746 ⟶ 1,966:
end if;
end if;
end func;</langsyntaxhighlight>
 
=={{header|Sidef}}==
<langsyntaxhighlight lang="ruby">var ua = frequire('LWP::Simple');
var url = 'http://tycho.usno.navy.mil/cgi-bin/timer.pl';
var match = /<BR>(.+? UTC)/.match(ua.get(url));
say match[0] if match;</langsyntaxhighlight>
{{out}}
<pre>
Line 1,758 ⟶ 1,978:
</pre>
 
=={{header|Standard ML}}==
Done in PolyML, needs fetch and sed. Basically the same as the function in [[PPM_conversion_through_a_pipe#Standard_ML]].
 
<syntaxhighlight lang="standard ml">
val getTime = fn url =>
let
val fname = "/tmp/fConv" ^ (String.extract (Time.toString (Posix.ProcEnv.time()),7,NONE) );
val shellCommand = " fetch -o - \""^ url ^"\" | sed -ne 's/^.*alt=.Los Angeles:\\(.* (Daylight Saving)\\).*$/\\1/p' " ;
val me = ( Posix.FileSys.mkfifo
(fname,
Posix.FileSys.S.flags [ Posix.FileSys.S.irusr,Posix.FileSys.S.iwusr ]
) ;
Posix.Process.fork ()
)
in
if (Option.isSome me) then
let
val fin =TextIO.openIn fname
in
( Posix.Process.sleep (Time.fromReal 0.5) ;
TextIO.inputLine fin before
(TextIO.closeIn fin ; OS.FileSys.remove fname )
)
end
else
( OS.Process.system ( shellCommand ^ " > " ^ fname ^ " 2>&1 " ) ;
SOME "" before OS.Process.exit OS.Process.success
)
end;
 
print ( valOf (getTime "http://www.time.org"));
</syntaxhighlight>
output
<syntaxhighlight lang="sml">11:12 AM (PDT) (Daylight Saving)</syntaxhighlight>
 
=={{header|Tcl}}==
===<tt>http</tt> and regular expressions===
<langsyntaxhighlight lang="tcl">package require http
 
set request [http::geturl "http://tycho.usno.navy.mil/cgi-bin/timer.pl"]
if {[regexp -line {<BR>(.* UTC)} [http::data $request] --> utc]} {
puts $utc
}</langsyntaxhighlight>
 
===curl(1) and list operations===
Considering the web resource returns tabular data wrapped in a <tt>&lt;PRE&gt;</tt> tag, you can use Tcl's list processing commands to process its contents.
<langsyntaxhighlight lang="tcl">set data [exec curl -s http://tycho.usno.navy.mil/cgi-bin/timer.pl]
puts [lrange [lsearch -glob -inline [split $data <BR>] *UTC*] 0 3]</langsyntaxhighlight>
 
=={{header|ToffeeScript}}==
<syntaxhighlight lang="coffeescript">e, page = require('request').get! 'http://tycho.usno.navy.mil/cgi-bin/timer.pl'
l = line for line in page.body.split('\n') when line.indexOf('UTC')>0
console.log l.substr(4,l.length-20)</syntaxhighlight>
 
=={{header|TUSCRIPT}}==
<langsyntaxhighlight lang="tuscript">
$$ MODE TUSCRIPT
SET time = REQUEST ("http://tycho.usno.navy.mil/cgi-bin/timer.pl")
SET utc = FILTER (time,":*UTC*:",-)
</syntaxhighlight>
</lang>
 
 
=={{header|ToffeeScript}}==
<lang coffeescript>e, page = require('request').get! 'http://tycho.usno.navy.mil/cgi-bin/timer.pl'
l = line for line in page.body.split('\n') when line.indexOf('UTC')>0
console.log l.substr(4,l.length-20)</lang>
 
=={{header|TXR}}==
Line 1,794 ⟶ 2,047:
If the web page changes too much, the query will fail to match. TXR will print the word "false" and terminate with a failed exit status. This is preferrable to finding a false positive match and printing a wrong result. (E.g. any random garbage that happened to be in a line of HTML accidentally containing the string UTC).
 
<langsyntaxhighlight lang="txr">@(next @(open-command "wget -c http://tycho.usno.navy.mil/cgi-bin/timer.pl -O - 2> /dev/null"))
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final"//EN>
<html>
Line 1,812 ⟶ 2,065:
@MO-@DD @HH:@MM:@SS @PM @TZ
@ (end)
@(end)</langsyntaxhighlight>
 
Sample run:<pre>$ txr navytime.txr
Line 1,832 ⟶ 2,085:
Skip stuff until a line beginning with <code>&lt;BR&gt;</code> has some stuff before "UTC", and capture that stuff:
 
<langsyntaxhighlight lang="txr">@(next @(open-command "wget -c http://tycho.usno.navy.mil/cgi-bin/timer.pl -O - 2> /dev/null"))
@(skip)
<BR>@time@\ UTC@(skip)
@(output)
@time
@(end)</langsyntaxhighlight>
 
=={{header|UNIX Shell}}==
This solution uses 'curl' and the standard POSIX command 'sed'.
 
<langsyntaxhighlight lang="bash">#!/bin/sh
curl -s http://tycho.usno.navy.mil/cgi-bin/timer.pl |
sed -ne 's/^<BR>\(.* UTC\).*$/\1/p'</langsyntaxhighlight>
 
This solution uses tcsh, wget and awk
 
<langsyntaxhighlight lang="tcsh">#!/usr/bin/tcsh -f
set page = `wget -q -O- "http://tycho.usno.navy.mil/cgi-bin/timer.pl"`
echo `awk -v s="${page[22]}" 'BEGIN{print substr(s,5,length(s))}'` ${page[23]} ${page[24]}</langsyntaxhighlight>
 
=={{header|Ursala}}==
This works by launching the wget command in a separate process and capturing its output.
The program is compiled to an executable command.
<langsyntaxhighlight Ursalalang="ursala">#import std
#import cli
 
Line 1,864 ⟶ 2,117:
<.file$[contents: --<''>]>+ -+
@hm skip/*4+ ~=(9%cOi&)-~l*+ *~ ~&K3/'UTC',
(ask bash)/0+ -[wget -O - http://tycho.usno.navy.mil/cgi-bin/timer.pl]-!+-</langsyntaxhighlight>
Here is a bash session.
<pre>$ whatime
Line 1,873 ⟶ 2,126:
=={{header|VBA}}==
Note For this example I altered the VBScript
<langsyntaxhighlight lang="vb">Rem add Microsoft VBScript Regular Expression X.X to your Tools References
 
Function GetUTC() As String
Line 1,910 ⟶ 2,163:
Debug.Print ReturnValue
MsgBox (ReturnValue)
End Sub</langsyntaxhighlight>
 
 
Line 1,921 ⟶ 2,174:
 
=={{header|VBScript}}==
<langsyntaxhighlight lang="vb">Function GetUTC() As String
Url = "http://tycho.usno.navy.mil/cgi-bin/timer.pl"
With CreateObject("MSXML2.XMLHTTP.6.0")
Line 1,950 ⟶ 2,203:
 
WScript.StdOut.Write GetUTC
WScript.StdOut.WriteLine</langsyntaxhighlight>
 
{{Out}}
Line 1,958 ⟶ 2,211:
Apr. 21, 21:02:03 UTC Universal Time
</pre>
 
 
 
=={{header|Visual Basic .NET}}==
New, .NET way with StringReader:
<langsyntaxhighlight lang="vbnet">Imports System.Net
Imports System.IO
Dim client As WebClient = New WebClient()
Line 1,974 ⟶ 2,225:
Console.WriteLine(time(0))
End If
End While</langsyntaxhighlight>
 
Alternative, old fashioned way using VB "Split" function:
<langsyntaxhighlight lang="vbnet">Imports System.Net
Dim client As WebClient = New WebClient()
Dim content As String = client.DownloadString("http://tycho.usno.navy.mil/cgi-bin/timer.pl")
Line 1,986 ⟶ 2,237:
Console.WriteLine(time(0))
End If
Next</langsyntaxhighlight>
 
=={{header|V (Vlang)}}==
<syntaxhighlight lang="Zig">
import net.http
import net.html
 
fn main() {
resp := http.get("https://www.utctime.net") or {println(err) exit(-1)}
html_doc := html.parse(resp.body)
utc := html_doc.get_tag("table").str().split("UTC</td><td>")[1].split("</td>")[0]
rfc_850 := html_doc.get_tag("table").str().split("RFC 850</td><td>")[1].split("</td>")[0]
println(utc)
println(rfc_850)
}
</syntaxhighlight>
 
{{out}}
<pre>
2023-06-06T12:08:01Z
Tuesday, 06-Jun-23 12:08:01 UTC
</pre>
 
=={{header|Wren}}==
{{libheader|libcurl}}
{{libheader|Wren-pattern}}
An embedded program so we can ask the C host to download the page for us. This task's talk page is being used for this purpose as the original URL no longer works.
 
The code is based in part on the C example though, as we don't have regex, we use our Pattern module to identify the first occurrence of a UTC date/time after the site notice.
<syntaxhighlight lang="wren">/* Web_scraping.wren */
 
import "./pattern" for Pattern
 
var CURLOPT_URL = 10002
var CURLOPT_FOLLOWLOCATION = 52
var CURLOPT_WRITEFUNCTION = 20011
var CURLOPT_WRITEDATA = 10001
 
var BUFSIZE = 16384 * 4
 
foreign class Buffer {
construct new(size) {}
 
// returns buffer contents as a string
foreign value
}
 
foreign class Curl {
construct easyInit() {}
 
foreign easySetOpt(opt, param)
 
foreign easyPerform()
 
foreign easyCleanup()
}
 
var buffer = Buffer.new(BUFSIZE)
var curl = Curl.easyInit()
curl.easySetOpt(CURLOPT_URL, "https://rosettacode.org/wiki/Talk:Web_scraping")
curl.easySetOpt(CURLOPT_FOLLOWLOCATION, 1)
curl.easySetOpt(CURLOPT_WRITEFUNCTION, 0) // write function to be supplied by C
curl.easySetOpt(CURLOPT_WRITEDATA, buffer)
 
curl.easyPerform()
curl.easyCleanup()
 
var html = buffer.value
var ix = html.indexOf("(UTC)")
ix = html.indexOf("(UTC)", ix + 1) // skip the site notice
if (ix == -1) {
System.print("UTC time not found.")
return
}
var p = Pattern.new("/d/d:/d/d, #12/d +1/a =4/d")
var m = p.find(html[(ix - 30).max(0)...ix])
System.print(m.text)</syntaxhighlight>
<br>
We now embed this in the following C program, compile and run it.
<syntaxhighlight lang="c">/* gcc Web_scraping.c -o Web_scraping -lcurl -lwren -lm */
 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <curl/curl.h>
#include "wren.h"
 
/* C <=> Wren interface functions */
 
char *url, *read_file, *write_file;
 
size_t bufsize;
size_t lr = 0;
 
size_t filterit(void *ptr, size_t size, size_t nmemb, void *stream) {
if ((lr + size*nmemb) > bufsize) return bufsize;
memcpy(stream+lr, ptr, size * nmemb);
lr += size * nmemb;
return size * nmemb;
}
 
void C_bufferAllocate(WrenVM* vm) {
bufsize = (int)wrenGetSlotDouble(vm, 1);
wrenSetSlotNewForeign(vm, 0, 0, bufsize);
}
 
void C_curlAllocate(WrenVM* vm) {
CURL** pcurl = (CURL**)wrenSetSlotNewForeign(vm, 0, 0, sizeof(CURL*));
*pcurl = curl_easy_init();
}
 
void C_value(WrenVM* vm) {
const char *s = (const char *)wrenGetSlotForeign(vm, 0);
wrenSetSlotString(vm, 0, s);
}
 
void C_easyPerform(WrenVM* vm) {
CURL* curl = *(CURL**)wrenGetSlotForeign(vm, 0);
curl_easy_perform(curl);
}
 
void C_easyCleanup(WrenVM* vm) {
CURL* curl = *(CURL**)wrenGetSlotForeign(vm, 0);
curl_easy_cleanup(curl);
}
 
void C_easySetOpt(WrenVM* vm) {
CURL* curl = *(CURL**)wrenGetSlotForeign(vm, 0);
CURLoption opt = (CURLoption)wrenGetSlotDouble(vm, 1);
if (opt < 10000) {
long lparam = (long)wrenGetSlotDouble(vm, 2);
curl_easy_setopt(curl, opt, lparam);
} else if (opt < 20000) {
if (opt == CURLOPT_WRITEDATA) {
char *buffer = (char *)wrenGetSlotForeign(vm, 2);
curl_easy_setopt(curl, opt, buffer);
} else if (opt == CURLOPT_URL) {
const char *url = wrenGetSlotString(vm, 2);
curl_easy_setopt(curl, opt, url);
}
} else if (opt < 30000) {
if (opt == CURLOPT_WRITEFUNCTION) {
curl_easy_setopt(curl, opt, &filterit);
}
}
}
 
WrenForeignClassMethods bindForeignClass(WrenVM* vm, const char* module, const char* className) {
WrenForeignClassMethods methods;
methods.allocate = NULL;
methods.finalize = NULL;
if (strcmp(module, "main") == 0) {
if (strcmp(className, "Buffer") == 0) {
methods.allocate = C_bufferAllocate;
} else if (strcmp(className, "Curl") == 0) {
methods.allocate = C_curlAllocate;
}
}
return methods;
}
 
WrenForeignMethodFn bindForeignMethod(
WrenVM* vm,
const char* module,
const char* className,
bool isStatic,
const char* signature) {
if (strcmp(module, "main") == 0) {
if (strcmp(className, "Buffer") == 0) {
if (!isStatic && strcmp(signature, "value") == 0) return C_value;
} else if (strcmp(className, "Curl") == 0) {
if (!isStatic && strcmp(signature, "easySetOpt(_,_)") == 0) return C_easySetOpt;
if (!isStatic && strcmp(signature, "easyPerform()") == 0) return C_easyPerform;
if (!isStatic && strcmp(signature, "easyCleanup()") == 0) return C_easyCleanup;
}
}
return NULL;
}
 
static void writeFn(WrenVM* vm, const char* text) {
printf("%s", text);
}
 
void errorFn(WrenVM* vm, WrenErrorType errorType, const char* module, const int line, const char* msg) {
switch (errorType) {
case WREN_ERROR_COMPILE:
printf("[%s line %d] [Error] %s\n", module, line, msg);
break;
case WREN_ERROR_STACK_TRACE:
printf("[%s line %d] in %s\n", module, line, msg);
break;
case WREN_ERROR_RUNTIME:
printf("[Runtime Error] %s\n", msg);
break;
}
}
 
char *readFile(const char *fileName) {
FILE *f = fopen(fileName, "r");
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
rewind(f);
char *script = malloc(fsize + 1);
fread(script, 1, fsize, f);
fclose(f);
script[fsize] = 0;
return script;
}
 
static void loadModuleComplete(WrenVM* vm, const char* module, WrenLoadModuleResult result) {
if( result.source) free((void*)result.source);
}
 
WrenLoadModuleResult loadModule(WrenVM* vm, const char* name) {
WrenLoadModuleResult result = {0};
if (strcmp(name, "random") != 0 && strcmp(name, "meta") != 0) {
result.onComplete = loadModuleComplete;
char fullName[strlen(name) + 6];
strcpy(fullName, name);
strcat(fullName, ".wren");
result.source = readFile(fullName);
}
return result;
}
 
int main(int argc, char **argv) {
WrenConfiguration config;
wrenInitConfiguration(&config);
config.writeFn = &writeFn;
config.errorFn = &errorFn;
config.bindForeignClassFn = &bindForeignClass;
config.bindForeignMethodFn = &bindForeignMethod;
config.loadModuleFn = &loadModule;
WrenVM* vm = wrenNewVM(&config);
const char* module = "main";
const char* fileName = "Web_scraping.wren";
char *script = readFile(fileName);
WrenInterpretResult result = wrenInterpret(vm, module, script);
switch (result) {
case WREN_RESULT_COMPILE_ERROR:
printf("Compile Error!\n");
break;
case WREN_RESULT_RUNTIME_ERROR:
printf("Runtime Error!\n");
break;
case WREN_RESULT_SUCCESS:
break;
}
wrenFreeVM(vm);
free(script);
return 0;
}</syntaxhighlight>
 
{{out}}
<pre>
20:53, 20 August 2008
</pre>
 
=={{header|Xidel}}==
http://videlibri.sourceforge.net/xidel.html
<syntaxhighlight lang="sh">$ xidel -s "https://www.rosettacode.org/wiki/Talk:Web_scraping" -e '
//p[1]/text()[last()]
'
20:53, 20 August 2008 (UTC)
 
$ xidel -s "https://www.rosettacode.org/wiki/Talk:Web_scraping" -e '
x:parse-dateTime(//p[1]/text()[last()],'hh:nn, dd mmmm yyyy')
'
2008-08-20T20:53:00
 
$ xidel -s "https://www.rosettacode.org/wiki/Talk:Web_scraping" -e '
adjust-dateTime-to-timezone(
x:parse-dateTime(//p[1]/text()[last()],'hh:nn, dd mmmm yyyy'),
dayTimeDuration('PT0H')
)
'
$ xidel -s "https://www.rosettacode.org/wiki/Talk:Web_scraping" -e '
x:parse-dateTime(//p[1]/text()[last()],'hh:nn, dd mmmm yyyy')
! adjust-dateTime-to-timezone(.,dayTimeDuration('PT0H'))
'
$ xidel -s "https://www.rosettacode.org/wiki/Talk:Web_scraping" -e '
x:parse-dateTime(//p[1]/text()[last()],'hh:nn, dd mmmm yyyy')
=> adjust-dateTime-to-timezone(dayTimeDuration('PT0H'))
'
2008-08-20T20:53:00Z</syntaxhighlight>
 
=={{header|zkl}}==
<langsyntaxhighlight lang="zkl">const HOST="tycho.usno.navy.mil", PORT=80, dir="/cgi-bin/timer.pl";
get:="GET %s HTTP/1.0\r\nHost: %s:%s\r\n\r\n".fmt(dir,HOST,PORT);
server:=Network.TCPClientSocket.connectTo(HOST,PORT);
Line 2,002 ⟶ 2,537:
re:=RegExp(0'|.*(\d\d:\d\d:\d\d)|); // get time
re.search(line);
re.matched[1].println();</langsyntaxhighlight>
{{out}}
<pre>
9,476

edits