URL decoding
You are encouraged to solve this task according to the task description, using any language you may know.
This task (the reverse of URL encoding) is to provide a function or mechanism to convert a url-encoded string into its original unencoded form.
Example
The encoded string "http%3A%2F%2Ffoo%20bar%2F
" should be reverted to the unencoded form "http://foo bar/
".
AutoHotkey
<lang AutoHotkey>encURL := "http%3A%2F%2Ffoo%20bar%2F" SetFormat, Integer, hex Loop Parse, encURL
If A_LoopField = `% reading := 2, read := "" else if reading { read .= A_LoopField, --reading if not reading out .= Chr("0x" . read) } else out .= A_LoopField
MsgBox % out ; http://foo bar/ </lang>
- syntax: GAWK -f URL_DECODING.AWK
BEGIN {
str = "http%3A%2F%2Ffoo%20bar%2F" # "http://foo bar/" printf("%s\n",str) while (match(str,/%/)) { L = substr(str,1,RSTART-1) # chars to left of "%" M = substr(str,RSTART+1,2) # 2 chars to right of "%" R = substr(str,RSTART+3) # chars to right of "%xx" str = sprintf("%s%c%s",L,hex2dec(M),R) } printf("%s\n",str) exit(0)
} function hex2dec(s, num) {
num = index("0123456789ABCDEF",toupper(substr(s,length(s)))) - 1 sub(/.$/,"",s) return num + (length(s) ? 16*hex2dec(s) : 0)
} </lang>
output:
http%3A%2F%2Ffoo%20bar%2F http://foo bar/
C
<lang c>#include <stdio.h>
- include <string.h>
inline int ishex(int x) { return (x >= '0' && x <= '9') || (x >= 'a' && x <= 'f') || (x >= 'A' && x <= 'F'); }
int decode(char *s, char *dec) { char *o, *end = s + strlen(s); int c;
for (o = dec; s <= end; o++) { c = *s++; if (c == '+') c = ' '; else if (c == '%' && ( !ishex(*s++) || !ishex(*s++) || !sscanf(s - 2, "%2x", &c))) return -1;
if (dec) *o = c; }
return o - dec; }
int main() { char url[] = "http%3A%2F%2ffoo+bar%2fabcd"; char out[sizeof(url)];
printf("length: %d\n", decode(url, 0)); puts(decode(url, out) < 0 ? "bad string" : out);
return 0; }</lang>
Delphi
<lang Delphi>program URLEncoding;
{$APPTYPE CONSOLE}
uses IdURI;
begin
Writeln(TIdURI.URLDecode('http%3A%2F%2Ffoo%20bar%2F'));
end.</lang>
Go
<lang go>package main
import ( "os" "fmt" "http" )
func main() { url, err := http.URLUnescape("http%3A%2F%2Ffoo%20bar%2F") if err != nil { fmt.Println(err) os.Exit(1) } fmt.Println(url) }</lang>
Icon and Unicon
<lang Icon>link hexcvt
procedure main() ue := "http%3A%2F%2Ffoo%20bar%2F" ud := decodeURL(ue) | stop("Improperly encoded string ",image(ue)) write("encoded = ",image(ue)) write("decoded = ",image(ue)) end
procedure decodeURL(s) #: decode URL/URI encoded data static de initial { # build lookup table for everything
de := table() every de[hexstring(ord(c := !string(&ascii)),2)] := c }
c := "" s ? until pos(0) do # decode every %xx or fail
c ||:= if ="%" then \de[move(2)] | fail else move(1)
return c end</lang>
Output:
encoded = "http%3A%2F%2Ffoo%20bar%2F" decoded = "http://foo bar/"
J
J does not have a native urldecode (until version 7 when jhs includes a jurldecode).
Here is an implementation:
<lang j>require'strings convert' urldecode=: rplc&(;"_1&a."2(,:tolower)'%',.hfd i.#a.)</lang>
Example use:
<lang j> urldecode 'http%3A%2F%2Ffoo%20bar%2F' http://foo bar/</lang>
Note that a minor efficiency improvement is possible, by eliminating duplicated escape codes: <lang j>urldecode=: rplc&(~.,/;"_1&a."2(,:tolower)'%',.hfd i.#a.)</lang>
Java
<lang java>import java.io.UnsupportedEncodingException; import java.net.URLDecoder;
public class Main {
public static void main(String[] args) throws UnsupportedEncodingException { String encoded = "http%3A%2F%2Ffoo%20bar%2F"; String normal = URLDecoder.decode(encoded, "utf-8"); System.out.println(normal); }
}</lang>
Output:
http://foo bar/
JavaScript
<lang javascript>decodeURIComponent("http%3A%2F%2Ffoo%20bar%2F")</lang>
NetRexx
<lang NetRexx>/* NetRexx */ options replace format comments java crossref savelog symbols nobinary
url = [ -
'http%3A%2F%2Ffoo%20bar%2F', - 'mailto%3A%22Ivan%20Aim%22%20%3Civan%2Eaim%40email%2Ecom%3E', - '%6D%61%69%6C%74%6F%3A%22%49%72%6D%61%20%55%73%65%72%22%20%3C%69%72%6D%61%2E%75%73%65%72%40%6D%61%69%6C%2E%63%6F%6D%3E' - ]
loop u_ = 0 to url.length - 1
say url[u_] say DecodeURL(url[u_]) say end u_
return
method DecodeURL(arg) public static
Parse arg encoded decoded = PCT = '%'
loop label e_ while encoded.length() > 0 parse encoded head (PCT) +1 code +2 tail decoded = decoded || head select when code.strip('T').length() = 2 & code.datatype('X') then do code = code.x2c() decoded = decoded || code end when code.strip('T').length() \= 0 then do decoded = decoded || PCT tail = code || tail end otherwise do nop end end encoded = tail end e_
return decoded
</lang>
Output:
http%3A%2F%2Ffoo%20bar%2F http://foo bar/ mailto%3A%22Ivan%20Aim%22%20%3Civan%2Eaim%40email%2Ecom%3E mailto:"Ivan Aim" <ivan.aim@email.com> %6D%61%69%6C%74%6F%3A%22%49%72%6D%61%20%55%73%65%72%22%20%3C%69%72%6D%61%2E%75%73%65%72%40%6D%61%69%6C%2E%63%6F%6D%3E mailto:"Irma User" <irma.user@mail.com>
Objective-C
<lang objc>NSString *encoded = @"http%3A%2F%2Ffoo%20bar%2F"; NSString *normal = [encoded stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; NSLog(@"%@", normal);</lang>
Perl
<lang Perl>#!/usr/bin/perl -w use strict ; use URI::Escape ;
my $encoded = "http%3A%2F%2Ffoo%20bar%2F" ; my $unencoded = uri_unescape( $encoded ) ; print "The unencoded string is $unencoded !\n" ;</lang>
Perl 6
<lang Perl 6>use v6;
my $url = "http%3A%2F%2Ffoo%20bar%2F";
my regex url {
[ <text=&text> [\% <hex=&hex>]+ ]+ <text2=&text>?
}
my regex hex {
\w\w
}
my regex text {
\w+
}
$url ~~ /<url=&url>/;
my $dec_url; for $<url>.caps {
if .key eq "hex" {
$dec_url ~= :10("0x" ~ .value).chr;
} else {
$dec_url ~= .value;
}
}
say $dec_url;</lang>
PHP
<lang php><?php $encoded = "http%3A%2F%2Ffoo%20bar%2F"; $unencoded = rawurldecode($encoded); echo "The unencoded string is $unencoded !\n"; ?></lang>
PicoLisp
: (ht:Pack (chop "http%3A%2F%2Ffoo%20bar%2F")) -> "http://foo bar/"
PureBasic
<lang PureBasic>URL$ = URLDecoder("http%3A%2F%2Ffoo%20bar%2F")
Debug URL$ ; http://foo bar/</lang>
Python
<lang Python>import urllib print urllib.unquote("http%3A%2F%2Ffoo%20bar%2F")</lang>
Retro
This is provided by the casket library (used for web app development).
<lang Retro>create buffer 32000 allot
{{
create bit 5 allot : extract ( $c-$a ) drop @+ bit ! @+ bit 1+ ! bit ; : render ( $c-$n ) dup '+ = [ drop 32 ] ifTrue dup 13 = [ drop 32 ] ifTrue dup 10 = [ drop 32 ] ifTrue dup '% = [ extract hex toNumber decimal ] ifTrue ; : <decode> ( $-$ ) repeat @+ 0; render ^buffer'add again ;
---reveal---
: decode ( $- ) buffer ^buffer'set <decode> drop ;
}}
"http%3A%2F%2Ffoo%20bar%2F" decode buffer puts</lang>
REXX
<lang REXX>/* Rexx */
Do
X = 0 url. = X = X + 1; url.0 = X; url.X = 'http%3A%2F%2Ffoo%20bar%2F' X = X + 1; url.0 = X; url.X = 'mailto%3A%22Ivan%20Aim%22%20%3Civan%2Eaim%40email%2Ecom%3E' X = X + 1; url.0 = X; url.X = '%6D%61%69%6C%74%6F%3A%22%49%72%6D%61%20%55%73%65%72%22%20%3C%69%72%6D%61%2E%75%73%65%72%40%6D%61%69%6C%2E%63%6F%6D%3E'
Do u_ = 1 to url.0 Say url.u_ Say DecodeURL(url.u_) Say End u_
Return
End Exit
DecodeURL: Procedure Do
Parse Arg encoded decoded = PCT = '%'
Do label e_ while encoded~length() > 0 Parse Var encoded head (PCT) +1 code +2 tail decoded = decoded || head Select when code~strip('T')~length() = 2 & code~datatype('X') then Do code = code~x2c() decoded = decoded || code End when code~strip('T')~length() \= 0 then Do decoded = decoded || PCT tail = code || tail End otherwise Do Nop End End encoded = tail End e_
Return decoded
End Exit </lang>
Output:
http%3A%2F%2Ffoo%20bar%2F http://foo bar/ mailto%3A%22Ivan%20Aim%22%20%3Civan%2Eaim%40email%2Ecom%3E mailto:"Ivan Aim" <ivan.aim@email.com> %6D%61%69%6C%74%6F%3A%22%49%72%6D%61%20%55%73%65%72%22%20%3C%69%72%6D%61%2E%75%73%65%72%40%6D%61%69%6C%2E%63%6F%6D%3E mailto:"Irma User" <irma.user@mail.com>
Ruby
Use any one of CGI.unescape
or URI.decode_www_form_component
. These methods also convert "+" to " ".
<lang ruby>require 'cgi' puts CGI.unescape("http%3A%2F%2Ffoo%20bar%2F")
- => "http://foo bar/"</lang>
<lang ruby>require 'uri' puts URI.decode_www_form_component("http%3A%2F%2Ffoo%20bar%2F")
- => "http://foo bar/"</lang>
URI.unescape
(alias URI.unencode
) still works. URI.unescape
is obsolete since Ruby 1.9.2 because of problems with its sibling URI.escape
.
Scala
<lang scala>import java.net._ val encoded="http%3A%2F%2Ffoo%20bar%2F" val decoded=URLDecoder.decode(encoded, "UTF-8") println(decoded) // -> http://foo bar/</lang>
Tcl
This code is careful to ensure that any untoward metacharacters in the input string still do not cause any problems. <lang tcl>proc urlDecode {str} {
set specialMap {"[" "%5B" "]" "%5D"} set seqRE {%([0-9a-fA-F]{2})} set replacement {[format "%c" [scan "\1" "%2x"]]} set modStr [regsub -all $seqRE [string map $specialMap $str] $replacement] return [encoding convertfrom utf-8 [subst -nobackslash -novariable $modStr]]
}</lang> Demonstrating: <lang tcl>puts [urlDecode "http%3A%2F%2Ffoo%20bar%2F"]</lang> Output:
http://foo bar/
TUSCRIPT
<lang tuscript> $$ MODE TUSCRIPT url_encoded="http%3A%2F%2Ffoo%20bar%2F" BUILD S_TABLE hex=":%><:><2<>2<%:" hex=STRINGS (url_encoded,hex), hex=SPLIT(hex) hex=DECODE (hex,hex) url_decoded=SUBSTITUTE(url_encoded,":%><2<>2<%:",0,0,hex) PRINT "encoded: ", url_encoded PRINT "decoded: ", url_decoded </lang> Output:
encoded: http%3A%2F%2Ffoo%20bar%2F decoded: http://foo bar/