URL parser
You are encouraged to solve this task according to the task description, using any language you may know.
URL are very common strings with a simple syntax:
scheme://[username:password@]domain[:port]/path?query_string#fragment_id
This task (which has nothing to do with URL encoding or URL decoding) is to parse a well-formed URL to retrieve the relevant information scheme, domain, path,...
According to the standards, the characters [!*'();:@&=+$,/?%#[]] only need to be percent-encoded in case of possible confusion. So warn the splits and regular expressions. Note also that, the path, query and fragment are case sensitive, even if the scheme and domain are not.
The way the returned information is provided (set of variables, array, structured, record, object,...) is language dependent and left to the programer, but the code should be clear enough to reuse.
Extra credit is given for clear errors diagnostic.
- Here is the official standard: https://tools.ietf.org/html/rfc3986,
- and here a simpler BNF http://www.w3.org/Addressing/URL/5_URI_BNF.html.
Test cases
According to T. Berners-Lee
foo://example.com:8042/over/there?name=ferret#nose should parse into:
- scheme = foo
- domain = example.com
- port = :8042
- path = over/there
- query = name=ferret
- fragment = nose
urn:example:animal:ferret:nose should parse into:
- scheme = urn
- path = example:animal:ferret:nose
other must parse include:
- jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true
- ftp://ftp.is.co.za/rfc/rfc1808.txt
- http://www.ietf.org/rfc/rfc2396.txt#header1
- ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two
- mailto:John.Doe@example.com
- news:comp.infosystems.www.servers.unix
- tel:+1-816-555-1212
- telnet://192.0.2.16:80/
- urn:oasis:names:specification:docbook:dtd:xml:4.1.2
Elixir
<lang elixir>test_cases = [
"foo://example.com:8042/over/there?name=ferret#nose", "urn:example:animal:ferret:nose", "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true", "ftp://ftp.is.co.za/rfc/rfc1808.txt", "http://www.ietf.org/rfc/rfc2396.txt#header1", "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two", "mailto:John.Doe@example.com", "news:comp.infosystems.www.servers.unix", "tel:+1-816-555-1212", "telnet://192.0.2.16:80/", "urn:oasis:names:specification:docbook:dtd:xml:4.1.2", "ssh://alice@example.com", "https://bob:pass@example.com/place", "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64"
]
Enum.each(test_cases, fn str ->
IO.puts "\n#{str}" IO.inspect URI.parse(str)
end)</lang>
- Output:
foo://example.com:8042/over/there?name=ferret#nose %URI{authority: "example.com:8042", fragment: "nose", host: "example.com", path: "/over/there", port: 8042, query: "name=ferret", scheme: "foo", userinfo: nil} urn:example:animal:ferret:nose %URI{authority: nil, fragment: nil, host: nil, path: "example:animal:ferret:nose", port: nil, query: nil, scheme: "urn", userinfo: nil} jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true %URI{authority: nil, fragment: nil, host: nil, path: "mysql://test_user:ouupppssss@localhost:3306/sakila", port: nil, query: "profileSQL=true", scheme: "jdbc", userinfo: nil} ftp://ftp.is.co.za/rfc/rfc1808.txt %URI{authority: "ftp.is.co.za", fragment: nil, host: "ftp.is.co.za", path: "/rfc/rfc1808.txt", port: 21, query: nil, scheme: "ftp", userinfo: nil} http://www.ietf.org/rfc/rfc2396.txt#header1 %URI{authority: "www.ietf.org", fragment: "header1", host: "www.ietf.org", path: "/rfc/rfc2396.txt", port: 80, query: nil, scheme: "http", userinfo: nil} ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two %URI{authority: "2001:db8::7", fragment: nil, host: "2001:db8::7", path: "/c=GB", port: 389, query: "objectClass=one&objectClass=two", scheme: "ldap", userinfo: nil} mailto:John.Doe@example.com %URI{authority: nil, fragment: nil, host: nil, path: "John.Doe@example.com", port: nil, query: nil, scheme: "mailto", userinfo: nil} news:comp.infosystems.www.servers.unix %URI{authority: nil, fragment: nil, host: nil, path: "comp.infosystems.www.servers.unix", port: nil, query: nil, scheme: "news", userinfo: nil} tel:+1-816-555-1212 %URI{authority: nil, fragment: nil, host: nil, path: "+1-816-555-1212", port: nil, query: nil, scheme: "tel", userinfo: nil} telnet://192.0.2.16:80/ %URI{authority: "192.0.2.16:80", fragment: nil, host: "192.0.2.16", path: "/", port: 80, query: nil, scheme: "telnet", userinfo: nil} urn:oasis:names:specification:docbook:dtd:xml:4.1.2 %URI{authority: nil, fragment: nil, host: nil, path: "oasis:names:specification:docbook:dtd:xml:4.1.2", port: nil, query: nil, scheme: "urn", userinfo: nil} ssh://alice@example.com %URI{authority: "alice@example.com", fragment: nil, host: "example.com", path: nil, port: nil, query: nil, scheme: "ssh", userinfo: "alice"} https://bob:pass@example.com/place %URI{authority: "bob:pass@example.com", fragment: nil, host: "example.com", path: "/place", port: 443, query: nil, scheme: "https", userinfo: "bob:pass"} http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64 %URI{authority: "example.com", fragment: nil, host: "example.com", path: "/", port: 80, query: "a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64", scheme: "http", userinfo: nil}
Go
This uses Go's standard net/url package. The source code for this package (excluding tests) is in a single file of ~720 lines. <lang go>package main
import ( "fmt" "log" "net" "net/url" )
func main() { for _, in := range []string{ "foo://example.com:8042/over/there?name=ferret#nose", "urn:example:animal:ferret:nose", "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true", "ftp://ftp.is.co.za/rfc/rfc1808.txt", "http://www.ietf.org/rfc/rfc2396.txt#header1", "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two", "mailto:John.Doe@example.com", "news:comp.infosystems.www.servers.unix", "tel:+1-816-555-1212", "telnet://192.0.2.16:80/", "urn:oasis:names:specification:docbook:dtd:xml:4.1.2",
"ssh://alice@example.com", "https://bob:pass@example.com/place", "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64", } { fmt.Println(in) u, err := url.Parse(in) if err != nil { log.Println(err) continue } if in != u.String() { fmt.Printf("Note: reassmebles as %q\n", u) } printURL(u) } }
func printURL(u *url.URL) { fmt.Println(" Scheme:", u.Scheme) if u.Opaque != "" { fmt.Println(" Opaque:", u.Opaque) } if u.User != nil { fmt.Println(" Username:", u.User.Username()) if pwd, ok := u.User.Password(); ok { fmt.Println(" Password:", pwd) } } if u.Host != "" { if host, port, err := net.SplitHostPort(u.Host); err == nil { fmt.Println(" Host:", host) fmt.Println(" Port:", port) } else { fmt.Println(" Host:", u.Host) } } if u.Path != "" { fmt.Println(" Path:", u.Path) } if u.RawQuery != "" { fmt.Println(" RawQuery:", u.RawQuery) m, err := url.ParseQuery(u.RawQuery) if err == nil { for k, v := range m { fmt.Printf(" Key: %q Values: %q\n", k, v) } } } if u.Fragment != "" { fmt.Println(" Fragment:", u.Fragment) } }</lang>
- Output:
foo://example.com:8042/over/there?name=ferret#nose Scheme: foo Host: example.com Port: 8042 Path: /over/there RawQuery: name=ferret Key: "name" Values: ["ferret"] Fragment: nose urn:example:animal:ferret:nose Scheme: urn Opaque: example:animal:ferret:nose jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true Scheme: jdbc Opaque: mysql://test_user:ouupppssss@localhost:3306/sakila RawQuery: profileSQL=true Key: "profileSQL" Values: ["true"] ftp://ftp.is.co.za/rfc/rfc1808.txt Scheme: ftp Host: ftp.is.co.za Path: /rfc/rfc1808.txt http://www.ietf.org/rfc/rfc2396.txt#header1 Scheme: http Host: www.ietf.org Path: /rfc/rfc2396.txt Fragment: header1 ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two Scheme: ldap Host: [2001:db8::7] Path: /c=GB RawQuery: objectClass=one&objectClass=two Key: "objectClass" Values: ["one" "two"] mailto:John.Doe@example.com Scheme: mailto Opaque: John.Doe@example.com news:comp.infosystems.www.servers.unix Scheme: news Opaque: comp.infosystems.www.servers.unix tel:+1-816-555-1212 Scheme: tel Opaque: +1-816-555-1212 telnet://192.0.2.16:80/ Scheme: telnet Host: 192.0.2.16 Port: 80 Path: / urn:oasis:names:specification:docbook:dtd:xml:4.1.2 Scheme: urn Opaque: oasis:names:specification:docbook:dtd:xml:4.1.2 ssh://alice@example.com Scheme: ssh Username: alice Host: example.com https://bob:pass@example.com/place Scheme: https Username: bob Password: pass Host: example.com Path: /place http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64 Scheme: http Host: example.com Path: / RawQuery: a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64 Key: "a" Values: ["1"] Key: "b" Values: ["2 2"] Key: "c" Values: ["3" "4"] Key: "d" Values: ["encoded"]
J
As most errors are contextual (e.g. invalid authority, invalid path, unrecognized scheme), we shall defer error testing to the relevant consumers. This might offend some on the grounds of temporary safety, but consumers already bear responsibility to parse and validate their relevant uri element(s).
Our parsing strategy is fixed format recursive descent. (Please do not criticize this on efficiency grounds without first investigating the implementations of other parsers.)
Implementation:
<lang J>split=:1 :0
({. ; ] }.~ 1+[)~ i.&m
)
uriparts=:3 :0
'server fragment'=. '#' split y 'sa query'=. '?' split server 'scheme authpath'=. ':' split sa scheme;authpath;query;fragment
)
queryparts=:3 :0
(0<#y)#<;._1 '?',y
)
authpathparts=:3 :0
if. '//' -: 2{.y do. split=. <;.1 y (}.1{::split);;2}.split else. ;y end.
)
authparts=:3 :0
if. '@' e. y do. 'userinfo hostport'=. '@' split y else. hostport=. y [ userinfo=. end. if. '[' = {.hostport do. 'host_t port_t'=. ']' split hostport assert. (0=#port_t)+.':'={.port_t (':' split userinfo),(host_t,']');}.port_t else. (':' split userinfo),':' split hostport end.
)
taskparts=:3 :0
'scheme authpath querystring fragment'=. uriparts y 'auth path'=. authpathparts authpath 'user creds host port'=. authparts auth query=. queryparts querystring export=. ;:'scheme user creds host port path query fragment' (#~ 0<#@>@{:"1) (,. do each) export
)</lang>
Task examples:
<lang j> taskparts 'foo://example.com:8042/over/there?name=ferret#nose' ┌────────┬─────────────┐ │scheme │foo │ ├────────┼─────────────┤ │host │example.com │ ├────────┼─────────────┤ │port │8042 │ ├────────┼─────────────┤ │path │/over/there │ ├────────┼─────────────┤ │query │┌───────────┐│ │ ││name=ferret││ │ │└───────────┘│ ├────────┼─────────────┤ │fragment│nose │ └────────┴─────────────┘
taskparts 'urn:example:animal:ferret:nose'
┌──────┬──────────────────────────┐ │scheme│urn │ ├──────┼──────────────────────────┤ │path │example:animal:ferret:nose│ └──────┴──────────────────────────┘
taskparts 'jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true'
┌──────┬──────────────────────────────────────────────────┐ │scheme│jdbc │ ├──────┼──────────────────────────────────────────────────┤ │path │mysql://test_user:ouupppssss@localhost:3306/sakila│ ├──────┼──────────────────────────────────────────────────┤ │query │┌───────────────┐ │ │ ││profileSQL=true│ │ │ │└───────────────┘ │ └──────┴──────────────────────────────────────────────────┘
taskparts 'ftp://ftp.is.co.za/rfc/rfc1808.txt'
┌──────┬────────────────┐ │scheme│ftp │ ├──────┼────────────────┤ │host │ftp.is.co.za │ ├──────┼────────────────┤ │path │/rfc/rfc1808.txt│ └──────┴────────────────┘
taskparts 'http://www.ietf.org/rfc/rfc2396.txt#header1'
┌────────┬────────────────┐ │scheme │http │ ├────────┼────────────────┤ │host │www.ietf.org │ ├────────┼────────────────┤ │path │/rfc/rfc2396.txt│ ├────────┼────────────────┤ │fragment│header1 │ └────────┴────────────────┘
taskparts 'ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two'
┌──────┬─────────────────────────────────┐ │scheme│ldap │ ├──────┼─────────────────────────────────┤ │host │[2001:db8::7] │ ├──────┼─────────────────────────────────┤ │path │/c=GB │ ├──────┼─────────────────────────────────┤ │query │┌───────────────────────────────┐│ │ ││objectClass=one&objectClass=two││ │ │└───────────────────────────────┘│ └──────┴─────────────────────────────────┘
taskparts 'mailto:John.Doe@example.com'
┌──────┬────────────────────┐ │scheme│mailto │ ├──────┼────────────────────┤ │path │John.Doe@example.com│ └──────┴────────────────────┘
taskparts 'news:comp.infosystems.www.servers.unix'
┌──────┬─────────────────────────────────┐ │scheme│news │ ├──────┼─────────────────────────────────┤ │path │comp.infosystems.www.servers.unix│ └──────┴─────────────────────────────────┘
taskparts 'tel:+1-816-555-1212'
┌──────┬───────────────┐ │scheme│tel │ ├──────┼───────────────┤ │path │+1-816-555-1212│ └──────┴───────────────┘
taskparts 'telnet://192.0.2.16:80/'
┌──────┬──────────┐ │scheme│telnet │ ├──────┼──────────┤ │host │192.0.2.16│ ├──────┼──────────┤ │port │80 │ ├──────┼──────────┤ │path │/ │ └──────┴──────────┘
taskparts 'urn:oasis:names:specification:docbook:dtd:xml:4.1.2'
┌──────┬───────────────────────────────────────────────┐ │scheme│urn │ ├──────┼───────────────────────────────────────────────┤ │path │oasis:names:specification:docbook:dtd:xml:4.1.2│ └──────┴───────────────────────────────────────────────┘</lang>
Note that the path
of the example jdbc
uri is itself a uri which may be parsed:
<lang J> taskparts 'mysql://test_user:ouupppssss@localhost:3306/sakila' ┌──────┬──────────┐ │scheme│mysql │ ├──────┼──────────┤ │user │test_user │ ├──────┼──────────┤ │pass │ouupppssss│ ├──────┼──────────┤ │host │localhost │ ├──────┼──────────┤ │port │3306 │ ├──────┼──────────┤ │path │/sakila │ └──────┴──────────┘</lang>
Also, examples borrowed from the go
implementation:
<lang J> taskparts 'ssh://alice@example.com' ┌──────┬───────────┐ │scheme│ssh │ ├──────┼───────────┤ │user │alice │ ├──────┼───────────┤ │host │example.com│ └──────┴───────────┘
taskparts 'https://bob:pass@example.com/place'
┌──────┬───────────┐ │scheme│https │ ├──────┼───────────┤ │user │bob │ ├──────┼───────────┤ │creds │pass │ ├──────┼───────────┤ │host │example.com│ ├──────┼───────────┤ │path │/place │ └──────┴───────────┘
taskparts 'http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64'
┌──────┬───────────────────────────────────────────┐ │scheme│http │ ├──────┼───────────────────────────────────────────┤ │host │example.com │ ├──────┼───────────────────────────────────────────┤ │path │/ │ ├──────┼───────────────────────────────────────────┤ │query │┌─────────────────────────────────────────┐│ │ ││a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64││ │ │└─────────────────────────────────────────┘│ └──────┴───────────────────────────────────────────┘</lang>
Note that escape decoding is left to the consumer (as well as decoding things like '+' as a replacement for the space character and determining the absolute significance of relative paths and the details of ip address parsing and so on...). This seems like a good match to the hierarchical nature of uri parsing. See URL decoding for an implementation of escape decoding.
Note that taskparts
was engineered specifically for the requirements of this task -- in idiomatic use you should instead expect to call the relevant ____parts routines directly as illustrated by the first four lines of taskparts
.
Note that w3c recommends a handling for query strings which differs from that of RFC-3986. For example, the use of ;
as replacement for the &
delimiter, or the use of the query element name as the query element value when the =
delimiter is omitted from the name/value pair. We do not implement that here, as it's not a part of this task. But that sort of implementation could be achieved by replacing the definition of queryparts
. And, of course, other treatments of query strings are also possible, should that become necessary...
Julia
This solution uses Julia's URIParser package. The detailview
function shows all of the non-empty components of the URI
object created by this parser. No attempt is made to further parse more complex components, e.g. query or userinfo. Error detection is limited to indicating whether a string is parsable as a URI and providing a hint as to whether the URI
is valid (according to this package's isvalid
function).
<lang Julia>
using URIParser
const FIELDS = names(URI)
function detailview(uri::URI, indentlen::Int=4)
indent = " "^indentlen s = String[] for f in FIELDS d = string(getfield(uri, f)) !isempty(d) || continue f != :port || d != "0" || continue push!(s, @sprintf("%s%s: %s", indent, string(f), d)) end join(s, "\n")
end
test = ["foo://example.com:8042/over/there?name=ferret#nose",
"urn:example:animal:ferret:nose", "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true", "ftp://ftp.is.co.za/rfc/rfc1808.txt", "http://www.ietf.org/rfc/rfc2396.txt#header1", "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two", "mailto:John.Doe@example.com", "news:comp.infosystems.www.servers.unix", "tel:+1-816-555-1212", "telnet://192.0.2.16:80/", "urn:oasis:names:specification:docbook:dtd:xml:4.1.2", "This is not a URI!", "ssh://alice@example.com", "https://bob:pass@example.com/place", "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64"]
isfirst = true for st in test
if isfirst isfirst = false else println() end println("Attempting to parse\n \"", st, "\" as a URI:") uri = try URI(st) catch println("URIParser failed to parse this URI, is it OK?") continue end print("This URI is parsable ") if isvalid(uri) println("and appears to be valid.") else println("but may be invalid.") end println(detailview(uri))
end </lang>
- Output:
Attempting to parse "foo://example.com:8042/over/there?name=ferret#nose" as a URI: This URI is parsable but may be invalid. schema: foo host: example.com port: 8042 path: /over/there query: name=ferret fragment: nose specifies_authority: true Attempting to parse "urn:example:animal:ferret:nose" as a URI: This URI is parsable and appears to be valid. schema: urn path: example:animal:ferret:nose specifies_authority: false Attempting to parse "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true" as a URI: This URI is parsable but may be invalid. schema: jdbc path: mysql://test_user:ouupppssss@localhost:3306/sakila query: profileSQL=true specifies_authority: false Attempting to parse "ftp://ftp.is.co.za/rfc/rfc1808.txt" as a URI: This URI is parsable and appears to be valid. schema: ftp host: ftp.is.co.za path: /rfc/rfc1808.txt specifies_authority: true Attempting to parse "http://www.ietf.org/rfc/rfc2396.txt#header1" as a URI: This URI is parsable and appears to be valid. schema: http host: www.ietf.org path: /rfc/rfc2396.txt fragment: header1 specifies_authority: true Attempting to parse "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two" as a URI: This URI is parsable and appears to be valid. schema: ldap host: 2001:db8::7 path: /c=GB query: objectClass=one&objectClass=two specifies_authority: true Attempting to parse "mailto:John.Doe@example.com" as a URI: This URI is parsable and appears to be valid. schema: mailto path: John.Doe@example.com specifies_authority: false Attempting to parse "news:comp.infosystems.www.servers.unix" as a URI: This URI is parsable and appears to be valid. schema: news path: comp.infosystems.www.servers.unix specifies_authority: false Attempting to parse "tel:+1-816-555-1212" as a URI: This URI is parsable and appears to be valid. schema: tel path: +1-816-555-1212 specifies_authority: false Attempting to parse "telnet://192.0.2.16:80/" as a URI: This URI is parsable and appears to be valid. schema: telnet host: 192.0.2.16 port: 80 path: / specifies_authority: true Attempting to parse "urn:oasis:names:specification:docbook:dtd:xml:4.1.2" as a URI: This URI is parsable and appears to be valid. schema: urn path: oasis:names:specification:docbook:dtd:xml:4.1.2 specifies_authority: false Attempting to parse "This is not a URI!" as a URI: URIParser failed to parse this URI, is it OK? Attempting to parse "ssh://alice@example.com" as a URI: This URI is parsable but may be invalid. schema: ssh host: example.com userinfo: alice specifies_authority: true Attempting to parse "https://bob:pass@example.com/place" as a URI: This URI is parsable and appears to be valid. schema: https host: example.com path: /place userinfo: bob:pass specifies_authority: true Attempting to parse "http://example.com/?a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64" as a URI: This URI is parsable and appears to be valid. schema: http host: example.com path: / query: a=1&b=2+2&c=3&c=4&d=%65%6e%63%6F%64%65%64 specifies_authority: true
Racket
Links: url
structure in Racket documentation.
<lang racket>#lang racket/base (require racket/match net/url) (define (debug-url-string U)
(match-define (url s u h p pa? (list (path/param pas prms) ...) q f) (string->url U)) (printf "URL: ~s~%" U) (printf "-----~a~%" (make-string (string-length (format "~s" U)) #\-)) (when #t (printf "scheme: ~s~%" s)) (when u (printf "user: ~s~%" u)) (when h (printf "host: ~s~%" h)) (when p (printf "port: ~s~%" p)) ;; From documentation link in text: ;; > For Unix paths, the root directory is not included in `path'; ;; > its presence or absence is implicit in the path-absolute? flag. (printf "path-absolute?: ~s~%" pa?) (printf "path bits: ~s~%" pas) ;; prms will often be a list of lists. this will print iff ;; one of the inner lists is not null (when (memf pair? prms) (printf "param bits: ~s [interleaved with path bits]~%" prms)) (unless (null? q) (printf "query: ~s~%" q)) (when f (printf "fragment: ~s~%" f)) (newline))
(for-each
debug-url-string '("foo://example.com:8042/over/there?name=ferret#nose" "urn:example:animal:ferret:nose" "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true" "ftp://ftp.is.co.za/rfc/rfc1808.txt" "http://www.ietf.org/rfc/rfc2396.txt#header1" "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two" "mailto:John.Doe@example.com" "news:comp.infosystems.www.servers.unix" "tel:+1-816-555-1212" "telnet://192.0.2.16:80/" "urn:oasis:names:specification:docbook:dtd:xml:4.1.2"))</lang>
- Output:
URL: "foo://example.com:8042/over/there?name=ferret#nose" --------------------------------------------------------- scheme: "foo" host: "example.com" port: 8042 path-absolute?: #t path bits: ("over" "there") query: ((name . "ferret")) fragment: "nose" URL: "urn:example:animal:ferret:nose" ------------------------------------- scheme: "urn" path-absolute?: #f path bits: ("example:animal:ferret:nose") URL: "jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true" ------------------------------------------------------------------------------ scheme: "jdbc" path-absolute?: #f path bits: ("mysql:" "" "test_user:ouupppssss@localhost:3306" "sakila") query: ((profileSQL . "true")) URL: "ftp://ftp.is.co.za/rfc/rfc1808.txt" ----------------------------------------- scheme: "ftp" host: "ftp.is.co.za" path-absolute?: #t path bits: ("rfc" "rfc1808.txt") URL: "http://www.ietf.org/rfc/rfc2396.txt#header1" -------------------------------------------------- scheme: "http" host: "www.ietf.org" path-absolute?: #t path bits: ("rfc" "rfc2396.txt") fragment: "header1" URL: "ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two" ---------------------------------------------------------------- scheme: "ldap" host: "[2001" path-absolute?: #f path bits: ("db8::7]" "c=GB") query: ((objectClass . "one") (objectClass . "two"))
IPv6 URL address parses incorrectly. See issue https://github.com/plt/racket/issues/980
URL: "mailto:John.Doe@example.com" ---------------------------------- scheme: "mailto" path-absolute?: #f path bits: ("John.Doe@example.com") URL: "news:comp.infosystems.www.servers.unix" --------------------------------------------- scheme: "news" path-absolute?: #f path bits: ("comp.infosystems.www.servers.unix") URL: "tel:+1-816-555-1212" -------------------------- scheme: "tel" path-absolute?: #f path bits: ("+1-816-555-1212") URL: "telnet://192.0.2.16:80/" ------------------------------ scheme: "telnet" host: "192.0.2.16" port: 80 path-absolute?: #t path bits: ("") URL: "urn:oasis:names:specification:docbook:dtd:xml:4.1.2" ---------------------------------------------------------- scheme: "urn" path-absolute?: #f path bits: ("oasis:names:specification:docbook:dtd:xml:4.1.2")
Tcl
Tcllib's uri package already knows how to decompose many kinds of URIs. The implementation is a a quite readable example of this kind of parsing. For this task, we'll use it directly.
Schemes can be added with uri::register, but the rules for this task assume HTTP-style decomposition for unknown schemes, which is done below by reaching into the documented interfaces $::uri::schemes and uri::SplitHttp.
For some URI types (such as urn, news, mailto), this provides more information than the task description demands, which is simply to parse them all as HTTP URIs.
The uri package doesn't presently handle IPv6 syntx as used in the example: a bug and patch will be submitted presently ..
<lang Tcl>package require uri package require uri::urn
- a little bit of trickery to format results:
proc pdict {d} {
array set \t $d parray \t
}
proc parse_uri {uri} {
regexp {^(.*?):(.*)$} $uri -> scheme rest if {$scheme in $::uri::schemes} { # uri already knows how to split it: set parts [uri::split $uri] } else { # parse as though it's http: set parts [uri::SplitHttp $rest] dict set parts scheme $scheme } dict filter $parts value ?* ;# omit empty sections
}
set tests {
foo://example.com:8042/over/there?name=ferret#nose urn:example:animal:ferret:nose jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true ftp://ftp.is.co.za/rfc/rfc1808.txt http://www.ietf.org/rfc/rfc2396.txt#header1 ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two mailto:John.Doe@example.com news:comp.infosystems.www.servers.unix tel:+1-816-555-1212 telnet://192.0.2.16:80/ urn:oasis:names:specification:docbook:dtd:xml:4.1.2
}
foreach uri $tests {
puts \n$uri pdict [parse_uri $uri]
}</lang>
- Output:
foo://example.com:8042/over/there?name=ferret#nose (fragment) = nose (host) = example.com (path) = over/there (port) = 8042 (query) = name=ferret (scheme) = foo urn:example:animal:ferret:nose (nid) = example (nss) = animal:ferret:nose (scheme) = urn jdbc:mysql://test_user:ouupppssss@localhost:3306/sakila?profileSQL=true (path) = mysql://test_user:ouupppssss@localhost:3306/sakila (query) = profileSQL=true (scheme) = jdbc ftp://ftp.is.co.za/rfc/rfc1808.txt (host) = ftp.is.co.za (path) = rfc/rfc1808.txt (scheme) = ftp http://www.ietf.org/rfc/rfc2396.txt#header1 (fragment) = header1 (host) = www.ietf.org (path) = rfc/rfc2396.txt (scheme) = http ldap://[2001:db8::7]/c=GB?objectClass=one&objectClass=two (host) = [2001 (scheme) = ldap mailto:John.Doe@example.com (host) = example.com (scheme) = mailto (user) = John.Doe news:comp.infosystems.www.servers.unix (newsgroup-name) = comp.infosystems.www.servers.unix (scheme) = news tel:+1-816-555-1212 (path) = +1-816-555-1212 (scheme) = tel telnet://192.0.2.16:80/ (host) = 192.0.2.16 (port) = 80 (scheme) = telnet urn:oasis:names:specification:docbook:dtd:xml:4.1.2 (nid) = oasis (nss) = names:specification:docbook:dtd:xml:4.1.2 (scheme) = urn