URL encoding
You are encouraged to solve this task according to the task description, using any language you may know.
- Task
Provide a function or mechanism to convert a provided string into URL encoding representation.
In URL encoding, special characters, control characters and extended characters are converted into a percent symbol followed by a two digit hexadecimal code, So a space character encodes into %20 within the string.
For the purposes of this task, every character except 0-9, A-Z and a-z requires conversion, so the following characters all require conversion by default:
- ASCII control codes (Character ranges 00-1F hex (0-31 decimal) and 7F (127 decimal).
- ASCII symbols (Character ranges 32-47 decimal (20-2F hex))
- ASCII symbols (Character ranges 58-64 decimal (3A-40 hex))
- ASCII symbols (Character ranges 91-96 decimal (5B-60 hex))
- ASCII symbols (Character ranges 123-126 decimal (7B-7E hex))
- Extended characters with character codes of 128 decimal (80 hex) and above.
- Example
The string "http://foo bar/
" would be encoded as "http%3A%2F%2Ffoo%20bar%2F
".
- Variations
- Lowercase escapes are legal, as in "
http%3a%2f%2ffoo%20bar%2f
". - Some standards give different rules: RFC 3986, Uniform Resource Identifier (URI): Generic Syntax, section 2.3, says that "-._~" should not be encoded. HTML 5, section 4.10.22.5 URL-encoded form data, says to preserve "-._*", and to encode space " " to "+". The options below provide for utilization of an exception string, enabling preservation (non encoding) of particular characters to meet specific standards.
- Options
It is permissible to use an exception string (containing a set of symbols that do not need to be converted). However, this is an optional feature and is not a requirement of this task.
- Related tasks
Ada
<lang Ada>with AWS.URL; with Ada.Text_IO; use Ada.Text_IO; procedure Encode is
Normal : constant String := "http://foo bar/";
begin
Put_Line (AWS.URL.Encode (Normal));
end Encode;</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
AutoHotkey
<lang AutoHotkey>MsgBox, % UriEncode("http://foo bar/")
- Modified from http://goo.gl/0a0iJq
UriEncode(Uri) { VarSetCapacity(Var, StrPut(Uri, "UTF-8"), 0) StrPut(Uri, &Var, "UTF-8") f := A_FormatInteger SetFormat, IntegerFast, H While Code := NumGet(Var, A_Index - 1, "UChar") If (Code >= 0x30 && Code <= 0x39 ; 0-9 || Code >= 0x41 && Code <= 0x5A ; A-Z || Code >= 0x61 && Code <= 0x7A) ; a-z Res .= Chr(Code) Else Res .= "%" . SubStr(Code + 0x100, -1) SetFormat, IntegerFast, %f% Return, Res }</lang>
Apex
<lang apex>EncodingUtil.urlEncode('http://foo bar/', 'UTF-8')</lang>
http%3A%2F%2Ffoo+bar%2F
AppleScript
<lang AppleScript>AST URL encode "http://foo bar/"</lang>
- Output:
"http%3A%2F%2Ffoo%20bar%2F"
Arturo
<lang arturo>url: "http://foo bar/"
print [encodeUrl url]</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
AWK
<lang awk>BEGIN { for (i = 0; i <= 255; i++) ord[sprintf("%c", i)] = i }
- Encode string with application/x-www-form-urlencoded escapes.
function escape(str, c, len, res) { len = length(str) res = "" for (i = 1; i <= len; i++) { c = substr(str, i, 1); if (c ~ /[0-9A-Za-z]/) #if (c ~ /[-._*0-9A-Za-z]/) res = res c #else if (c == " ") # res = res "+" else res = res "%" sprintf("%02X", ord[c]) } return res }
- Escape every line of input.
{ print escape($0) }</lang>
The array ord[]
uses idea from Character codes#AWK.
To follow the rules for HTML 5, uncomment the two lines that convert " " to "+", and use the regular expression that preserves "-._*".
BBC BASIC
<lang bbcbasic> PRINT FNurlencode("http://foo bar/")
END DEF FNurlencode(url$) LOCAL c%, i% WHILE i% < LEN(url$) i% += 1 c% = ASCMID$(url$, i%) IF c%<&30 OR c%>&7A OR c%>&39 AND c%<&41 OR c%>&5A AND c%<&61 THEN url$ = LEFT$(url$,i%-1) + "%" + RIGHT$("0"+STR$~c%,2) + MID$(url$,i%+1) ENDIF ENDWHILE = url$</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Bracmat
<lang bracmat>( ( encode
= encoded exceptions octet string . !arg:(?exceptions.?string) & :?encoded & @( !string : ? ( %@?octet ? & !encoded ( !octet : ( ~<0:~>9 | ~<A:~>Z | ~<a:~>z ) | @(!exceptions:? !octet ?) & !octet | "%" d2x$(asc$!octet) ) : ?encoded & ~ ) ) | str$!encoded ) & out$"without exceptions:
"
& out$(encode$(."http://foo bar/")) & out$(encode$(."mailto:Ivan")) & out$(encode$(."Aim <ivan.aim@email.com>")) & out$(encode$(."mailto:Irma")) & out$(encode$(."User <irma.user@mail.com>")) & out$(encode$(."http://foo.bar.com/~user-name/_subdir/*~.html")) & out$"
with RFC 3986 rules: "
& out$(encode$("-._~"."http://foo bar/")) & out$(encode$("-._~"."mailto:Ivan")) & out$(encode$("-._~"."Aim <ivan.aim@email.com>")) & out$(encode$("-._~"."mailto:Irma")) & out$(encode$("-._~"."User <irma.user@mail.com>")) & out$(encode$("-._~"."http://foo.bar.com/~user-name/_subdir/*~.html"))
); </lang>
- Output:
without exceptions: http%3A%2F%2Ffoo%20bar%2F mailto%3AIvan Aim%20%3Civan%2Eaim%40email%2Ecom%3E mailto%3AIrma User%20%3Cirma%2Euser%40mail%2Ecom%3E http%3A%2F%2Ffoo%2Ebar%2Ecom%2F%7Euser%2Dname%2F%5Fsubdir%2F%2A%7E%2Ehtml with RFC 3986 rules: http%3A%2F%2Ffoo%20bar%2F mailto%3AIvan Aim%20%3Civan.aim%40email.com%3E mailto%3AIrma User%20%3Cirma.user%40mail.com%3E http%3A%2F%2Ffoo.bar.com%2F~user-name%2F_subdir%2F%2A~.html
C
<lang c>#include <stdio.h>
- include <ctype.h>
char rfc3986[256] = {0}; char html5[256] = {0};
/* caller responsible for memory */ void encode(const char *s, char *enc, char *tb) { for (; *s; s++) { if (tb[*s]) sprintf(enc, "%c", tb[*s]); else sprintf(enc, "%%%02X", *s); while (*++enc); } }
int main() { const char url[] = "http://foo bar/"; char enc[(strlen(url) * 3) + 1];
int i; for (i = 0; i < 256; i++) { rfc3986[i] = isalnum(i)||i == '~'||i == '-'||i == '.'||i == '_' ? i : 0; html5[i] = isalnum(i)||i == '*'||i == '-'||i == '.'||i == '_' ? i : (i == ' ') ? '+' : 0; }
encode(url, enc, rfc3986); puts(enc);
return 0; }</lang>
C++
using Qt 4.6 as a library <lang cpp>#include <QByteArray>
- include <iostream>
int main( ) {
QByteArray text ( "http://foo bar/" ) ; QByteArray encoded( text.toPercentEncoding( ) ) ; std::cout << encoded.data( ) << '\n' ; return 0 ;
}</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
C#
<lang c sharp>using System;
namespace URLEncode {
internal class Program { private static void Main(string[] args) { Console.WriteLine(Encode("http://foo bar/")); }
private static string Encode(string uri) { return Uri.EscapeDataString(uri); } }
}</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Clojure
Using Java's URLEncoder: <lang clojure>(import 'java.net.URLEncoder) (URLEncoder/encode "http://foo bar/" "UTF-8")</lang>
- Output:
"http%3A%2F%2Ffoo+bar%2F"
ColdFusion
Common Lisp
<lang lisp>(defun needs-encoding-p (char)
(not (digit-char-p char 36)))
(defun encode-char (char)
(format nil "%~2,'0X" (char-code char)))
(defun url-encode (url)
(apply #'concatenate 'string (map 'list (lambda (char) (if (needs-encoding-p char) (encode-char char) (string char))) url)))
(url-encode "http://foo bar/")</lang>
- Output:
"http%3A%2F%2Ffoo%20bar%2F"
D
<lang d>import std.stdio, std.uri;
void main() {
writeln(encodeComponent("http://foo bar/"));
}</lang>
http%3A%2F%2Ffoo%20bar%2F
Elixir
<lang elixir>iex(1)> URI.encode("http://foo bar/", &URI.char_unreserved?/1) "http%3A%2F%2Ffoo%20bar%2F"</lang>
Erlang
Built in, http_uri:encode/1
accepts lists and binary:
1> http_uri:encode("http://foo bar/"). "http%3A%2F%2Ffoo%20bar%2F"
Unicode
If you are URL encoding unicode data though http_uri:encode/1
will produce incorrect results:
1> http_uri:encode("étanchéité d une terrasse"). "étanchéité%20d%20une%20terrasse"
You should use the built-in (non-documented) edoc_lib:escape_uri/1
instead:
1> edoc_lib:escape_uri("étanchéité d une terrasse"). "%c3%a9tanch%c3%a9it%c3%a9%20d%20une%20terrasse"
And for binary you will need to take care and use unicode:characters_to_{list,binary}/1
:
1> unicode:characters_to_binary(edoc_lib:escape_uri(unicode:characters_to_list(<<"étanchéité d une terrasse"/utf8>>))). <<"%c3%a9tanch%c3%a9it%c3%a9%20d%20une%20terrasse">>
F#
<lang fsharp>open System
[<EntryPoint>] let main args =
printfn "%s" (Uri.EscapeDataString(args.[0])) 0</lang>
- Output:
>URLencoding.exe "http://foo bar/" http%3A%2F%2Ffoo%20bar%2F
Factor
Factor's built-in URL encoder doesn't encode :
or /
. However, we can write our own predicate quotation that tells (url-encode)
what characters to exclude.
<lang factor>USING: combinators.short-circuit unicode urls.encoding.private ;
- my-url-encode ( str -- encoded )
[ { [ alpha? ] [ "-._~" member? ] } 1|| ] (url-encode) ;
"http://foo bar/" my-url-encode print</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Free Pascal
<lang pascal>function urlEncode(data: string): AnsiString; var
ch: AnsiChar;
begin
Result := ; for ch in data do begin if ((Ord(ch) < 65) or (Ord(ch) > 90)) and ((Ord(ch) < 97) or (Ord(ch) > 122)) then begin Result := Result + '%' + IntToHex(Ord(ch), 2); end else Result := Result + ch; end;
end;</lang>
Go
<lang go>package main
import (
"fmt" "net/url"
)
func main() {
fmt.Println(url.QueryEscape("http://foo bar/"))
}</lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
Groovy
<lang groovy> def normal = "http://foo bar/" def encoded = URLEncoder.encode(normal, "utf-8") println encoded </lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
Haskell
<lang Haskell>import qualified Data.Char as Char import Text.Printf
encode :: Char -> String encode c
| c == ' ' = "+" | Char.isAlphaNum c || c `elem` "-._~" = [c] | otherwise = printf "%%%02X" c
urlEncode :: String -> String urlEncode = concatMap encode
main :: IO () main = putStrLn $ urlEncode "http://foo bar/"</lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
Icon and Unicon
<lang Icon>link hexcvt
procedure main() write("text = ",image(u := "http://foo bar/")) write("encoded = ",image(ue := encodeURL(u))) end
procedure encodeURL(s) #: encode data for inclusion in a URL/URI static en initial { # build lookup table for everything
en := table() every en[c := !string(~(&digits++&letters))] := "%"||hexstring(ord(c),2) every /en[c := !string(&cset)] := c }
every (c := "") ||:= en[!s] # re-encode everything return c end </lang>
- Output:
text = "http://foo bar/" encoded = "http%3A%2F%2Ffoo%20bar%2F"
J
J has a urlencode in the gethttp package, but this task requires that all non-alphanumeric characters be encoded.
Here's an implementation that does that:
<lang j>require'strings convert' urlencode=: rplc&((#~2|_1 47 57 64 90 96 122 I.i.@#)a.;"_1'%',.hfd i.#a.)</lang>
Example use:
<lang j> urlencode 'http://foo bar/' http%3A%2F%2Ffoo%20bar%2F</lang>
Java
The built-in URLEncoder in Java converts the space " " into a plus-sign "+" instead of "%20": <lang java>import java.io.UnsupportedEncodingException; import java.net.URLEncoder;
public class Main {
public static void main(String[] args) throws UnsupportedEncodingException { String normal = "http://foo bar/"; String encoded = URLEncoder.encode(normal, "utf-8"); System.out.println(encoded); }
}</lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
JavaScript
Confusingly, there are 3 different URI encoding functions in JavaScript: escape()
, encodeURI()
, and encodeURIComponent()
. Each of them encodes a different set of characters. See this article and this article for more information and comparisons.
<lang javascript>var normal = 'http://foo/bar/';
var encoded = encodeURIComponent(normal);</lang>
jq
jq has a built-in function, @uri, for "percent-encoding". It preserves the characters that RFC 3968 mandates be preserved, but also preserves the following five characters: !'()*
To address the task requirement, therefore, we can first use @uri and then convert the exceptional characters. (For versions of jq with regex support, this could be done using gsub, but here we perform the translation directly.)
Note that @uri also converts multibyte characters to multi-triplets, e.g. <lang jq>"á" | @uri</lang> produces: "%C3%A1" <lang jq>def url_encode:
# The helper function checks whether the input corresponds to one of the characters: !'()* def recode: . as $c | [33,39,40,41,42] | index($c); def hex: if . < 10 then 48 + . else 55 + . end; @uri | explode # 37 ==> "%", 50 ==> "2" | map( if recode then (37, 50, ((. - 32) | hex)) else . end ) | implode;</lang>
Examples:
<lang jq>"http://foo bar/" | @uri</lang> produces: "http%3A%2F%2Ffoo%20bar%2F"
<lang jq>"http://foo bar/" | @uri == url_encode</lang> produces: true
To contrast the difference between "@uri" and "url_encode", we compare the characters that are unaltered:
<lang jq>[range(0;1024) | [.] | implode | if @uri == . then . else empty end] | join(null)</lang> produces: "!'()*-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"
<lang jq>[range(0;1024) | [.] | implode | if url_encode == . then . else empty end] | join(null)</lang> produces: "-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"
Julia
<lang Julia> //version 1.0.1 import HTTP.URIs: escapeuri
dcd = "http://foo bar/" enc = escapeuri(dcd)
println(dcd, " => ", enc) </lang>
- Output:
http://foo bar/ => http%3A%2F%2Ffoo%20bar%2F
Kotlin
<lang scala>// version 1.1.2
import java.net.URLEncoder
fun main(args: Array<String>) {
val url = "http://foo bar/" println(URLEncoder.encode(url, "utf-8")) // note: encodes space to + not %20
}</lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
langur
The following should work with non-ASCII characters as well (assuming that's valid), with the s2b() function returning UTF-8 bytes from a string, and foldfrom() adding them all. The :X02 is an interpolation modifier, generating a hexadecimal code of at least 2 characters, padded with zeroes.
<lang langur>val .urlEncode = f(.s) replace(
.s, re/[^A-Za-z0-9]/, f(.s2) foldfrom(f(.s3, .b) $"\.s3;%\.b:X02;", ZLS, s2b(.s2)),
)
val .original = "https://some website.com/"
writeln .original writeln .urlEncode(.original)</lang>
From 0.6.8 to 0.8.3, you would use :X2 to pad with zeroes. With 0.8.4+, not starting the number with a 0 would cause it to pad with spaces.
- Output:
https://some website.com/ https%3A%2F%2Fsome%20website%2Ecom%2F
Lasso
<lang Lasso>bytes('http://foo bar/') -> encodeurl</lang> -> http%3A%2F%2Ffoo%20bar%2F
Liberty BASIC
<lang lb>
dim lookUp$( 256)
for i =0 to 256 lookUp$( i) ="%" +dechex$( i) next i
string$ ="http://foo bar/"
print "Supplied string '"; string$; "'" print "As URL '"; url$( string$); "'"
end
function url$( i$)
for j =1 to len( i$) c$ =mid$( i$, j, 1) if instr( "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", c$) then url$ =url$ +c$ else url$ =url$ +lookUp$( asc( c$)) end if next j
end function </lang>
Supplied string 'http://foo bar/' As URL 'http%3A%2F%2Ffoo%20bar%2F'
Lingo
Lingo implements old-school URL encoding (with spaces encoded as "+") out of the box:
<lang lingo>put urlencode("http://foo bar/") -- "http%3a%2f%2ffoo+bar%2f"</lang>
For RFC 3986 URL encoding (i.e. without the "+" fuss) a custom function is needed - which might call the above function and then just replace all "+" with "%20".
LiveCode
<lang LiveCode>urlEncode("http://foo bar/") -- http%3A%2F%2Ffoo+bar%2F</lang>
Lua
<lang lua>function encodeChar(chr) return string.format("%%%X",string.byte(chr)) end
function encodeString(str) local output, t = string.gsub(str,"[^%w]",encodeChar) return output end
-- will print "http%3A%2F%2Ffoo%20bar%2F" print(encodeString("http://foo bar/"))</lang>
M2000 Interpreter
<lang M2000 Interpreter> Module Checkit {
Function decodeUrl$(a$) { DIM a$() a$()=Piece$(a$, "%") if len(a$())=1 then =str$(a$):exit k=each(a$(),2) acc$=str$(a$(0)) While k { acc$+=str$(Chr$(Eval("0x"+left$(a$(k^),2)))+Mid$(a$(k^),3)) } =string$(acc$ as utf8dec) } Group Parse$ { all$, c=1 tc$="" Enum UrlType {None=0, RFC3986, HTML5} variation TypeData=("","-._~","-._*") Function Next { .tc$<=mid$(.all$,.c,1) .c++ =.tc$<>"" } Value { =.tc$ } Function DecodeOne$ { if .tc$="" then exit if .tc$ ~"[A-Za-z0-9]" then =.tc$ : exit If .tc$=" " Then =if$(.variation=.HTML5->"+","%20") :exit if instr(.TypeData#val$(.variation),.tc$)>0 then =.tc$ :exit ="%"+hex$(asc(.tc$), 1) } Function Decode$ { acc$="" .c<=1 While .Next() { acc$+=.DecodeOne$() } =acc$ } Set () { \\ using optional argument var=.None Read a$, ? var a$=chr$(string$(a$ as utf8enc)) .variation<=var .all$<=a$ .c<=1 } } \\ MAIN Parse$()="http://foo bar/" Print Quote$(Parse.Decode$()) Parse.variation=Parse.HTML5 Print Quote$(Parse.Decode$()) Parse.variation=Parse.RFC3986 Print Quote$(Parse.Decode$()) Parse$(Parse.RFC3986) ={mailto:"Irma User" <irma.user@mail.com>} Print Quote$(Parse.Decode$()) Parse$(Parse.RFC3986) ={http://foo.bar.com/~user-name/_subdir/*~.html} m=each(Parse.UrlType) while m { Parse.variation=eval(m) Print Quote$(Parse.Decode$()) Print decodeUrl$(Parse.Decode$()) }
} CheckIt </lang>
"http%3A%2F%2Ffoo%20bar%2F" "http%3A%2F%2Ffoo+bar%2F" "mailto%3A%22Irma%20User%22%20%3Cirma.user%40mail.com%3E" "http%3A%2F%2Ffoo%2Ebar%2Ecom%2F%7Euser%2Dname%2F%5Fsubdir%2F%2A%7E%2Ehtml" "http%3A%2F%2Ffoo.bar.com%2F~user-name%2F_subdir%2F%2A~.html" "http%3A%2F%2Ffoo.bar.com%2F%7Euser-name%2F_subdir%2F*%7E.html"
Maple
<lang maple>URL:-Escape("http://foo bar/");</lang>
- Output:
"http%3A%2F%2Ffoo%20bar%2F"
Mathematica
<lang mathematica>URLEncoding[url_] :=
StringReplace[url, x : Except[ Join[CharacterRange["0", "9"], CharacterRange["a", "z"], CharacterRange["A", "Z"]]] :> StringJoin[("%" ~~ #) & /@ IntegerString[ToCharacterCode[x, "UTF8"], 16]]]</lang>
Example use:
<lang mathematica>URLEncoding["http://foo bar/"]</lang>
- Output:
http%3a%2f%2ffoo%20bar%2f
MATLAB / Octave
<lang MATLAB>function u = urlencoding(s) u = ; for k = 1:length(s), if isalnum(s(k)) u(end+1) = s(k); else u=[u,'%',dec2hex(s(k)+0)]; end; end end</lang> Usage:
octave:3> urlencoding('http://foo bar/') ans = http%3A%2F%2Ffoo%20bar%2F
NetRexx
<lang NetRexx>/* NetRexx */ options replace format comments java crossref symbols nobinary
/* -------------------------------------------------------------------------- */
testcase() say say 'RFC3986' testcase('RFC3986') say say 'HTML5' testcase('HTML5') say return
/* -------------------------------------------------------------------------- */ method encode(url, varn) public static
variation = varn.upper opts = opts['RFC3986'] = '-._~' opts['HTML5'] = '-._*'
rp = loop while url.length > 0 parse url tc +1 url select when tc.datatype('A') then do rp = rp || tc end when tc == ' ' then do if variation = 'HTML5' then rp = rp || '+' else rp = rp || '%' || tc.c2x end otherwise do if opts[variation].pos(tc) > 0 then do rp = rp || tc end else do rp = rp || '%' || tc.c2x end end end end
return rp
/* -------------------------------------------------------------------------- */ method testcase(variation = ) public static
url = [ - 'http://foo bar/' - , 'mailto:"Ivan Aim" <ivan.aim@email.com>' - , 'mailto:"Irma User" <irma.user@mail.com>' - , 'http://foo.bar.com/~user-name/_subdir/*~.html' - ]
loop i_ = 0 to url.length - 1 say url[i_] say encode(url[i_], variation) end i_
return
</lang>
- Output:
http://foo bar/ http%3A%2F%2Ffoo%20bar%2F mailto:"Ivan Aim" <ivan.aim@email.com> mailto%3A%22Ivan%20Aim%22%20%3Civan%2Eaim%40email%2Ecom%3E mailto:"Irma User" <irma.user@mail.com> mailto%3A%22Irma%20User%22%20%3Cirma%2Euser%40mail%2Ecom%3E http://foo.bar.com/~user-name/_subdir/*~.html http%3A%2F%2Ffoo%2Ebar%2Ecom%2F%7Euser%2Dname%2F%5Fsubdir%2F%2A%7E%2Ehtml RFC3986 http://foo bar/ http%3A%2F%2Ffoo%20bar%2F mailto:"Ivan Aim" <ivan.aim@email.com> mailto%3A%22Ivan%20Aim%22%20%3Civan.aim%40email.com%3E mailto:"Irma User" <irma.user@mail.com> mailto%3A%22Irma%20User%22%20%3Cirma.user%40mail.com%3E http://foo.bar.com/~user-name/_subdir/*~.html http%3A%2F%2Ffoo.bar.com%2F~user-name%2F_subdir%2F%2A~.html HTML5 http://foo bar/ http%3A%2F%2Ffoo+bar%2F mailto:"Ivan Aim" <ivan.aim@email.com> mailto%3A%22Ivan+Aim%22+%3Civan.aim%40email.com%3E mailto:"Irma User" <irma.user@mail.com> mailto%3A%22Irma+User%22+%3Cirma.user%40mail.com%3E http://foo.bar.com/~user-name/_subdir/*~.html http%3A%2F%2Ffoo.bar.com%2F%7Euser-name%2F_subdir%2F*%7E.html
NewLISP
<lang NewLISP>;; simple encoder
(define (url-encode str)
(replace {([^a-zA-Z0-9])} str (format "%%%2X" (char $1)) 0))
(url-encode "http://foo bar/")</lang>
Nim
<lang nim>import cgi
echo encodeUrl("http://foo/bar/")</lang>
Oberon-2
<lang oberon2> MODULE URLEncoding; IMPORT
Out := NPCT:Console, ADT:StringBuffer, URI := URI:String;
VAR
encodedUrl: StringBuffer.StringBuffer;
BEGIN
encodedUrl := NEW(StringBuffer.StringBuffer,512); URI.AppendEscaped("http://foo bar/","",encodedUrl); Out.String(encodedUrl.ToString());Out.Ln
END URLEncoding. </lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Objeck
<lang objeck> use FastCgi;
bundle Default {
class UrlEncode { function : Main(args : String[]) ~ Nil { url := "http://foo bar/"; UrlUtility->Encode(url)->PrintLine(); } }
} </lang>
Objective-C
<lang objc>NSString *normal = @"http://foo bar/"; NSString *encoded = [normal stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; NSLog(@"%@", encoded);</lang>
The Core Foundation function CFURLCreateStringByAddingPercentEscapes()
provides more options.
<lang objc>NSString *normal = @"http://foo bar/"; NSString *encoded = [normal stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet alphanumericCharacterSet]]; NSLog(@"%@", encoded);</lang>
For encoding for various parts of the URL, the allowed character sets [NSCharacterSet URLUserAllowedCharacterSet]
, [NSCharacterSet URLPasswordAllowedCharacterSet]
, [NSCharacterSet URLHostAllowedCharacterSet]
, [NSCharacterSet URLPathAllowedCharacterSet]
, [NSCharacterSet URLQueryAllowedCharacterSet]
, or [NSCharacterSet URLFragmentAllowedCharacterSet]
are provided.
OCaml
Using the library ocamlnet from the interactive loop:
<lang ocaml>$ ocaml
- #use "topfind";;
- #require "netstring";;
- Netencoding.Url.encode "http://foo bar/" ;;
- : string = "http%3A%2F%2Ffoo+bar%2F"</lang>
ooRexx
The solution shown at Rexx (version 1) is a valid ooRexx program.
version 2 uses constructs not supported by ooRexx:
url.=; and $ as variable symbol
Perl
<lang perl>sub urlencode {
my $s = shift; $s =~ s/([^-A-Za-z0-9_.!~*'() ])/sprintf("%%%02X", ord($1))/eg; $s =~ tr/ /+/; return $s;
}
print urlencode('http://foo bar/')."\n"; </lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
<lang perl>use URI::Escape;
my $s = 'http://foo/bar/'; print uri_escape($s);</lang>
Use standard CGI module: <lang perl>use 5.10.0; use CGI;
my $s = 'http://foo/bar/'; say $s = CGI::escape($s); say $s = CGI::unescape($s);</lang>
Perl 6
<lang perl6>my $url = 'http://foo bar/';
say $url.subst(/<-alnum>/, *.ord.fmt("%%%02X"), :g);</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Phix
<lang Phix>-- -- demo\rosetta\encode_url.exw -- =========================== -- function nib(integer b)
return b+iff(b<=9?'0':'A'-10)
end function
function encode_url(string s, string exclusions="", integer spaceplus=0) string res = ""
for i=1 to length(s) do integer ch = s[i] if ch=' ' and spaceplus then ch = '+' elsif not find(ch,exclusions) and (ch<'0' or (ch>'9' and ch<'A') or (ch>'Z' and ch<'a') or ch>'z') then res &= '%' res &= nib(floor(ch/#10)) ch = nib(and_bits(ch,#0F)) end if res &= ch end for return res
end function
printf(1,"%s\n",{encode_url("http://foo bar/")})</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
PHP
<lang php><?php
$s = 'http://foo/bar/';
$s = rawurlencode($s);
?></lang>
There is also urlencode()
, which also encodes spaces as "+" signs
PicoLisp
<lang PicoLisp>(de urlEncodeTooMuch (Str)
(pack (mapcar '((C) (if (or (>= "9" C "0") (>= "Z" (uppc C) "A")) C (list '% (hex (char C))) ) ) (chop Str) ) ) )</lang>
Test:
: (urlEncodeTooMuch "http://foo bar/") -> "http%3A%2F%2Ffoo%20bar%2F"
Pike
<lang Pike>Protocols.HTTP.uri_encode( "http://foo bar/" );</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Powershell
<lang Powershell> [uri]::EscapeDataString('http://foo bar/')
http%3A%2F%2Ffoo%20bar%2F </lang>
PureBasic
<lang PureBasic>URL$ = URLEncoder("http://foo bar/")</lang>
Python
<lang python>import urllib
s = 'http://foo/bar/'
s = urllib.quote(s)</lang>
There is also urllib.quote_plus()
, which also encodes spaces as "+" signs
R
R has a built-in
<lang R>URLencode("http://foo bar/")</lang>
function, but it doesn't fully follow RFC guidelines, so we have to use another R package to accomplish the task:
<lang R> library(RCurl) curlEscape("http://foo bar/") </lang>
Racket
<lang racket>
- lang racket
(require net/uri-codec) (uri-encode "http://foo bar/") </lang>
REALbasic
Using the built-in encoding method, which doesn't permit exceptions: <lang vb>
Dim URL As String = "http://foo bar/" URL = EncodeURLComponent(URL) Print(URL)
</lang>
With optional exceptions. A "ParamArray" is an array of zero or more additional arguments passed by the caller: <lang vb> Function URLEncode(Data As String, ParamArray Exceptions() As String) As String
Dim buf As String For i As Integer = 1 To Data.Len Dim char As String = Data.Mid(i, 1) Select Case Asc(char) Case 48 To 57, 65 To 90, 97 To 122, 45, 46, 95 buf = buf + char Else If Exceptions.IndexOf(char) > -1 Then buf = buf + char Else buf = buf + "%" + Left(Hex(Asc(char)) + "00", 2) End If End Select Next Return buf
End Function
'usage Dim s As String = URLEncode("http://foo bar/") ' no exceptions Dim t As String = URLEncode("http://foo bar/", "!", "?", ",") ' with exceptions
</lang>
REXX
version 1
<lang REXX>/* Rexx */ do
call testcase say say RFC3986 call testcase RFC3986 say say HTML5 call testcase HTML5 say return
end exit
/* -------------------------------------------------------------------------- */ encode: procedure do
parse arg url, varn . parse upper var varn variation drop RFC3986 HTML5 opts. = opts.RFC3986 = '-._~' opts.HTML5 = '-._*'
rp = do while length(url) > 0 parse var url tc +1 url select when datatype(tc, 'A') then do rp = rp || tc end when tc == ' ' then do if variation = HTML5 then rp = rp || '+' else rp = rp || '%' || c2x(tc) end otherwise do if pos(tc, opts.variation) > 0 then do rp = rp || tc end else do rp = rp || '%' || c2x(tc) end end end end
return rp
end exit
/* -------------------------------------------------------------------------- */ testcase: procedure do
parse arg variation X = 0 url. = X = X + 1; url.0 = X; url.X = 'http://foo bar/' X = X + 1; url.0 = X; url.X = 'mailto:"Ivan Aim" <ivan.aim@email.com>' X = X + 1; url.0 = X; url.X = 'mailto:"Irma User" <irma.user@mail.com>' X = X + 1; url.0 = X; url.X = 'http://foo.bar.com/~user-name/_subdir/*~.html'
do i_ = 1 to url.0 say url.i_ say encode(url.i_, variation) end i_
return
end </lang>
- Output:
http://foo bar/ http%3A%2F%2Ffoo%20bar%2F mailto:"Ivan Aim" <ivan.aim@email.com> mailto%3A%22Ivan%20Aim%22%20%3Civan%2Eaim%40email%2Ecom%3E mailto:"Irma User" <irma.user@mail.com> mailto%3A%22Irma%20User%22%20%3Cirma%2Euser%40mail%2Ecom%3E http://foo.bar.com/~user-name/_subdir/*~.html http%3A%2F%2Ffoo%2Ebar%2Ecom%2F%7Euser%2Dname%2F%5Fsubdir%2F%2A%7E%2Ehtml RFC3986 http://foo bar/ http%3A%2F%2Ffoo%20bar%2F mailto:"Ivan Aim" <ivan.aim@email.com> mailto%3A%22Ivan%20Aim%22%20%3Civan.aim%40email.com%3E mailto:"Irma User" <irma.user@mail.com> mailto%3A%22Irma%20User%22%20%3Cirma.user%40mail.com%3E http://foo.bar.com/~user-name/_subdir/*~.html http%3A%2F%2Ffoo.bar.com%2F~user-name%2F_subdir%2F%2A~.html HTML5 http://foo bar/ http%3A%2F%2Ffoo+bar%2F mailto:"Ivan Aim" <ivan.aim@email.com> mailto%3A%22Ivan+Aim%22+%3Civan.aim%40email.com%3E mailto:"Irma User" <irma.user@mail.com> mailto%3A%22Irma+User%22+%3Cirma.user%40mail.com%3E http://foo.bar.com/~user-name/_subdir/*~.html http%3A%2F%2Ffoo.bar.com%2F%7Euser-name%2F_subdir%2F*%7E.html
version 2
<lang rexx>/*REXX program encodes a URL text, blanks ──► +, preserves -._* and -._~ */ url.=; url.1= 'http://foo bar/'
url.2= 'mailto:"Ivan Aim" <ivan.aim@email.com>' url.3= 'mailto:"Irma User" <irma.user@mail.com>' url.4= 'http://foo.bar.com/~user-name/_subdir/*~.html' do j=1 while url.j\==; say say ' original: ' url.j say ' encoded: ' URLencode(url.j) end /*j*/
exit /*stick a fork in it, we're all done. */ /*──────────────────────────────────────────────────────────────────────────────────────*/ URLencode: procedure; parse arg $,,z; t1= '-._~' /*get args, null Z.*/
skip=0; t2= '-._*' do k=1 for length($); _=substr($, k, 1) /*get a character. */ if skip\==0 then do; skip=skip-1 /*skip t1 or t2 ? */ iterate /*skip a character.*/ end select when datatype(_, 'A') then z=z || _ /*is alphanumeric ?*/ when _==' ' then z=z'+' /*is it a blank ?*/ when substr($, k, 4)==t1 |, /*is it t1 or t2 ?*/ substr($, k, 4)==t2 then do; skip=3 /*skip 3 characters*/ z=z || substr($, k, 4) end otherwise z=z'%'c2x(_) /*special character*/ end /*select*/ end /*k*/ return z</lang>
output when using the default input:
original: http://foo bar/ encoded: http%3A%2F%2Ffoo+bar%2F original: mailto:"Ivan Aim" <ivan.aim@email.com> encoded: mailto%3A%22Ivan+Aim%22+%3Civan%2Eaim%40email%2Ecom%3E original: mailto:"Irma User" <irma.user@mail.com> encoded: mailto%3A%22Irma+User%22+%3Cirma%2Euser%40mail%2Ecom%3E original: http://foo.bar.com/~user-name/_subdir/*~.html encoded: http%3A%2F%2Ffoo%2Ebar%2Ecom%2F%7Euser%2Dname%2F%5Fsubdir%2F%2A%7E%2Ehtml
Ruby
CGI.escape
encodes all characters except '-.0-9A-Z_a-z'.
<lang ruby>require 'cgi' puts CGI.escape("http://foo bar/").gsub("+", "%20")
- => "http%3A%2F%2Ffoo%20bar%2F"</lang>
Programs should not call URI.escape
(alias URI.encode
), because it fails to encode some characters. URI.escape
is obsolete since Ruby 1.9.2.
URI.encode_www_form_component
is a new method from Ruby 1.9.2. It obeys HTML 5 and encodes all characters except '-.0-9A-Z_a-z' and '*'.
<lang ruby>require 'uri' puts URI.encode_www_form_component("http://foo bar/").gsub("+", "%20")
- => "http%3A%2F%2Ffoo%20bar%2F"</lang>
Run BASIC
<lang runbasic>urlIn$ = "http://foo bar/"
for i = 1 to len(urlIn$)
a$ = mid$(urlIn$,i,1) if (a$ >= "0" and a$ <= "9") _ or (a$ >= "A" and a$ <= "Z") _ or (a$ >= "a" and a$ <= "z") then url$ = url$ + a$ else url$ = url$ + "%"+dechex$(asc(a$))
next i print urlIn$;" -> ";url$</lang>
http://foo bar/ -> http%3A%2F%2Ffoo%20bar%2F
Rust
<lang Rust>const INPUT: &str = "http://foo bar/"; const MAX_CHAR_VAL: u32 = std::char::MAX as u32;
fn main() {
let mut buff = [0; 4]; println!("{}", INPUT.chars() .map(|ch| { match ch as u32 { 0 ..= 47 | 58 ..= 64 | 91 ..= 96 | 123 ..= MAX_CHAR_VAL => { ch.encode_utf8(&mut buff); buff[0..ch.len_utf8()].iter().map(|&byte| format!("%{:X}", byte)).collect::<String>() }, _ => ch.to_string(), } }) .collect::<String>() );
}</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Scala
<lang scala>import java.net.{URLDecoder, URLEncoder}
import scala.compat.Platform.currentTime
object UrlCoded extends App {
val original = """http://foo bar/""" val encoded: String = URLEncoder.encode(original, "UTF-8")
assert(encoded == "http%3A%2F%2Ffoo+bar%2F", s"Original: $original not properly encoded: $encoded")
val percentEncoding = encoded.replace("+", "%20") assert(percentEncoding == "http%3A%2F%2Ffoo%20bar%2F", s"Original: $original not properly percent-encoded: $percentEncoding")
assert(URLDecoder.decode(encoded, "UTF-8") == URLDecoder.decode(percentEncoding, "UTF-8"))
println(s"Successfully completed without errors. [total ${currentTime - executionStart} ms]")
}</lang>
Seed7
The library encoding.s7i defines functions to handle URL respectively percent encoding. The function toPercentEncoded encodes every character except 0-9, A-Z, a-z and the characters '-', '.', '_', '~'. The function toUrlEncoded works like toPercentEncoded and additionally encodes a space with '+'. Both functions work for byte sequences (characters beyond '\255\' raise the exception RANGE_ERROR). To encode Unicode characters it is necessary to convert them to UTF-8 with striToUtf8 before.<lang seed7>$ include "seed7_05.s7i";
include "encoding.s7i";
const proc: main is func
begin writeln(toPercentEncoded("http://foo bar/")); writeln(toUrlEncoded("http://foo bar/"));
end func;</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F http%3A%2F%2Ffoo+bar%2F
Sidef
<lang ruby>func urlencode(str) {
str.gsub!(%r"([^-A-Za-z0-9_.!~*'() ])", {|a| "%%%02X" % a.ord}); str.gsub!(' ', '+'); return str;
}
say urlencode('http://foo bar/');</lang>
- Output:
http%3A%2F%2Ffoo+bar%2F
Tcl
<lang tcl># Encode all except "unreserved" characters; use UTF-8 for extended chars.
- See http://tools.ietf.org/html/rfc3986 §2.4 and §2.5
proc urlEncode {str} {
set uStr [encoding convertto utf-8 $str] set chRE {[^-A-Za-z0-9._~\n]}; # Newline is special case! set replacement {%[format "%02X" [scan "\\\0" "%c"]]} return [string map {"\n" "%0A"} [subst [regsub -all $chRE $uStr $replacement]]]
}</lang> Demonstrating: <lang tcl>puts [urlEncode "http://foo bar/"]</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F%E2%82%AC
TUSCRIPT
<lang tuscript> $$ MODE TUSCRIPT text="http://foo bar/" BUILD S_TABLE spez_char="::>/:</::<%:" spez_char=STRINGS (text,spez_char) LOOP/CLEAR c=spez_char c=ENCODE(c,hex),c=concat("%",c),spez_char=APPEND(spez_char,c) ENDLOOP url_encoded=SUBSTITUTE(text,spez_char,0,0,spez_char) print "text: ", text PRINT "encoded: ", url_encoded </lang>
- Output:
text: http://foo bar/ encoded: http%3A%2F%2Ffoo%20bar%2F
UNIX Shell
unfortunately ksh does not support "'$c"
syntax
<lang bash>function urlencode
{
typeset decoded=$1 encoded= rest= c=
typeset rest2= bug='rest2=${rest}'
if [[ -z ${BASH_VERSION} ]]; then # bug /usr/bin/sh HP-UX 11.00 typeset _decoded='xyz%26xyz' rest="${_decoded#?}" c="${_decoded%%${rest}}" if (( ${#c} != 1 )); then typeset qm='????????????????????????????????????????????????????????????????????????' typeset bug='(( ${#rest} > 0 )) && typeset -L${#rest} rest2="${qm}" || rest2=${rest}' fi fi
rest="${decoded#?}" eval ${bug} c="${decoded%%${rest2}}" decoded="${rest}"
while [[ -n ${c} ]]; do case ${c} in [-a-zA-z0-9.]) ;; ' ') c='+' ;; *) c=$(printf "%%%02X" "'$c") ;; esac
encoded="${encoded}${c}"
rest="${decoded#?}" eval ${bug} c="${decoded%%${rest2}}" decoded="${rest}" done
if [[ -n ${BASH_VERSION:-} ]]; then \echo -E "${encoded}" else print -r -- "${encoded}" fi } </lang>
VBScript
<lang VBScript>Function UrlEncode(url) For i = 1 To Len(url) n = Asc(Mid(url,i,1)) If (n >= 48 And n <= 57) Or (n >= 65 And n <= 90) _ Or (n >= 97 And n <= 122) Then UrlEncode = UrlEncode & Mid(url,i,1) Else ChrHex = Hex(Asc(Mid(url,i,1)))
For j = 0 to (Len(ChrHex) / 2) - 1
UrlEncode = UrlEncode & "%" & Mid(ChrHex,(2*j) + 1,2)
Next
End If Next End Function
WScript.Echo UrlEncode("http://foo baré/")</lang>
- Output:
http%3A%2F%2Ffoo%20bar%C3%A9%2F
XPL0
<lang XPL0>code Text=12; string 0; \use zero-terminated strings
func Encode(S0); \Encode URL string and return its address char S0; char HD, S1(80); \BEWARE: very temporary string space returned int C, I, J; [HD:= "0123456789ABCDEF"; \hex digits I:= 0; J:= 0; repeat C:= S0(I); I:= I+1;
if C>=^0 & C<=^9 ! C>=^A & C<=^Z ! C>=^a & C<=^z ! C=0 then [S1(J):= C; J:= J+1] \simply pass char to S1 else [S1(J):= ^%; J:= J+1; \encode char into S1 S1(J):= HD(C>>4); J:= J+1; S1(J):= HD(C&$0F); J:= J+1; ];
until C=0; return S1; ];
Text(0, Encode("http://foo bar/"))</lang>
- Output:
http%3A%2F%2Ffoo%20bar%2F
Yabasic
<lang Yabasic>sub encode_url$(s$, exclusions$, spaceplus)
local res$, i, ch$
for i=1 to len(s$) ch$ = mid$(s$, i, 1) if ch$ = " " and spaceplus then ch$ = "+" elsif not instr(esclusions$, ch$) and (ch$ < "0" or (ch$ > "9" and ch$ < "A") or (ch$ > "Z" and ch$ < "a") or ch$ > "z") then res$ = res$ + "%" ch$ = upper$(hex$(asc(ch$))) end if res$ = res$ + ch$ next i return res$
end sub
print encode_url$("http://foo bar/")</lang>
zkl
Using lib cURL: <lang zkl>var CURL=Import("zklCurl"); CURL.urlEncode("http://foo bar/") //--> "http%3A%2F%2Ffoo%20bar%2F"</lang>
- Programming Tasks
- Solutions by Programming Task
- String manipulation
- Encodings
- Ada
- AWS
- AutoHotkey
- Apex
- AppleScript
- AppleScript Toolbox
- Arturo
- AWK
- BBC BASIC
- Bracmat
- C
- C++
- C sharp
- Clojure
- ColdFusion
- Common Lisp
- D
- Elixir
- Erlang
- F Sharp
- Factor
- Free Pascal
- Go
- Groovy
- Haskell
- Icon
- Unicon
- Icon Programming Library
- J
- Java
- JavaScript
- Jq
- Julia
- Kotlin
- Langur
- Lasso
- Liberty BASIC
- Lingo
- LiveCode
- Lua
- M2000 Interpreter
- Maple
- Mathematica
- MATLAB
- Octave
- NetRexx
- NewLISP
- Nim
- Oberon-2
- Objeck
- Objective-C
- OCaml
- OoRexx
- Perl
- Perl 6
- Phix
- PHP
- PicoLisp
- Pike
- Powershell
- PureBasic
- Python
- R
- Racket
- REALbasic
- REXX
- Ruby
- Run BASIC
- Rust
- Scala
- Seed7
- Sidef
- Tcl
- TUSCRIPT
- UNIX Shell
- VBScript
- XPL0
- Yabasic
- Zkl