Tokenize a string
You are encouraged to solve this task according to the task description, using any language you may know.
Separate the string "Hello,How,Are,You,Today" by commas into an array (or list) so that each element of it stores a different word. Display the words to the 'user', in the simplest manner possible, separated by a period. To simplify, you may display a trailing period.
Ada
<ada>
with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_Io; use Ada.Text_Io; procedure Parse_Commas is Source_String : String := "Hello,How,Are,You,Today"; Index_List : array(1..256) of Natural; Next_Index : Natural := 1; begin Index_List(Next_Index) := 1; while Index_List(Next_Index) < Source_String'Last loop Next_Index := Next_Index + 1; Index_List(Next_Index) := 1 + Index(Source_String(Index_List(Next_Index - 1)..Source_String'Last), ","); if Index_List(Next_Index) = 1 then Index_List(Next_Index) := Source_String'Last + 2; end if; Put(Source_String(Index_List(Next_Index - 1)..Index_List(Next_Index)-2) & "."); end loop; end Parse_Commas;
</Ada>
ALGOL 68
main:( OP +:= = (REF FLEX[]STRING list, STRING item)VOID:( HEAP [LWB list: UPB list+1]STRING out; out[LWB list: UPB list]:=list; out[UPB out]:=item; list := out ); PROC split = (STRING string, sep)[]STRING:( FLEX[1:0]STRING out; INT start := 1, pos; WHILE string in string(sep, pos, string[start:]) DO out +:= string[start:start+pos-2]; start +:= pos + UPB sep - 1 OD; IF start > LWB string THEN out +:= string[start:] FI; out ); printf(($g"."$, split("Hello,How,Are,You,Today",","),$l$)) )
Output:
Hello.How.Are.You.Today.
C
This example uses the strtok() function to separate the tokens. This function is destructive (replacing token separators with '\0'), so we have to make a copy of the string (using strdup()) before tokenizing. strdup() is not part of ANSI C, but is available on most platforms. It can easily be implemented with a combination of strlen(), malloc(), and strcpy().
#include<string.h> #include<stdio.h> #include<stdlib.h> int main(void) { char *a[5]; const char *s="Hello,How,Are,You,Today"; int n=0, nn; char *ds=strdup(s); a[n]=strtok(ds, ","); while(a[n] && n<5) a[++n]=strtok(NULL, ","); for(nn=0; nn<n; ++nn) printf("%s.", a[nn]); putchar('\n'); free(ds); return 0; }
C#
string str = "Hello,How,Are,You,Today"; string[] strings = str.Split(','); foreach (string s in strings) { Console.WriteLine (s + "."); }
C++
This is not the most efficient method as it involves redundant copies in the background, but it is very easy to use. In most cases it will be a good choice as long as it is not used as an inner loop in a performance critical system.
Note doxygen tags in comments before function, describing details of interface.
#include <string> #include <vector> /// \brief convert input string into vector of string tokens /// /// \note consecutive delimiters will be treated as single delimiter /// \note delimiters are _not_ included in return data /// /// \param input string to be parsed /// \param delims list of delimiters. std::vector<std::string> tokenize_str(const std::string & str, const std::string & delims=", \t") { using namespace std; // Skip delims at beginning, find start of first token string::size_type lastPos = str.find_first_not_of(delims, 0); // Find next delimiter @ end of token string::size_type pos = str.find_first_of(delims, lastPos); // output vector vector<string> tokens; while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the vector. tokens.push_back(str.substr(lastPos, pos - lastPos)); // Skip delims. Note the "not_of". this is beginning of token lastPos = str.find_first_not_of(delims, pos); // Find next delimiter at end of token. pos = str.find_first_of(delims, lastPos); } return tokens; }
here is sample usage code:
#include <iostream> int main() { using namespace std; string s("Hello,How,Are,You,Today"); vector<string> v(tokenize_str(s)); for (unsigned i = 0; i < v.size(); i++) cout << v[i] << "."; cout << endl; return 0; }
D
writefln( "Hello,How,Are,You,Today".split(",").join(".") );
E
".".rjoin("Hello,How,Are,You,Today".split(","))
Forth
There is no standard string split routine, but it is easily written. The results are saved temporarily to the dictionary.
: split ( str len separator len -- tokens count ) here >r 2swap begin 2dup 2, \ save this token ( addr len ) 2over search \ find next separator while dup negate here 2 cells - +! \ adjust last token length 2over nip /string \ start next search past separator repeat 2drop 2drop r> here over - ( tokens length ) dup negate allot \ reclaim dictionary 2 cells / ; \ turn byte length into token count : .tokens ( tokens count -- ) 1 ?do dup 2@ type ." ." cell+ cell+ loop 2@ type ; s" Hello,How,Are,You,Today" s" ," split .tokens \ Hello.How.Are.You.Today
Fortran
PROGRAM Example CHARACTER(23) :: str = "Hello,How,Are,You,Today" CHARACTER(5) :: word(5) INTEGER :: pos1 = 1, pos2, n = 0, i DO pos2 = INDEX(str(pos1:), ",") IF (pos2 == 0) THEN n = n + 1 word(n) = str(pos1:) EXIT END IF n = n + 1 word(n) = str(pos1:pos1+pos2-2) pos1 = pos2+pos1 END DO DO i = 1, n WRITE(*,"(2A)", ADVANCE="NO") TRIM(word(i)), "." END DO END PROGRAM Example
Haskell
The necessary operations are unfortunately not in the standard library (yet), but simple to write:
splitBy :: (a -> Bool) -> [a] -> [[a]] splitBy _ [] = [] splitBy f list = first : splitBy f (dropWhile f rest) where (first, rest) = break f list splitRegex :: Regex -> String -> [String] joinWith :: [a] -> [[a]] -> [a] joinWith d xs = concat $ List.intersperse d xs -- "concat $ intersperse" can be replaced with "intercalate" from the Data.List in GHC 6.8 and later
putStrLn $ joinWith "." $ splitBy (== ',') $ "Hello,How,Are,You,Today" -- using regular expression to split: import Text.Regex putStrLn $ joinWith "." $ splitRegex (mkRegex ',') $ "Hello,How,Are,You,Today"
Groovy
println 'Hello,How,Are,You,Today'.split(',').join('.')
Io
"Hello,How,Are,You,Today" split(",") join(".") println
J
s=: 'Hello,How,Are,You,Today' ] t=: <;._1 ',',s +-----+---+---+---+-----+ |Hello|How|Are|You|Today| +-----+---+---+---+-----+ ; t,&.>'.' Hello.How.Are.You.Today. '.' (I.','=s)}s NB. two steps combined Hello.How.Are.You.Today
Java
There are multiple ways to tokenize a String in Java. The first is by splitting the String into an array of Strings, and the other way is to use StringTokenizer with a delimiter. The second way given here will skip any empty tokens. So if two commas are given in line, there will be an empty string in the array given by the split function, but no empty string with the StringTokenizer object.
String toTokenize = "Hello,How,Are,You,Today"; //First way String word[] = toTokenize.split(","); for(int i=0; i<word.length; i++) { System.out.print(word[i] + "."); } //Second way StringTokenizer tokenizer = new StringTokenizer(toTokenize, ","); while(tokenizer.hasMoreTokens()) { System.out.print(tokenizer.nextToken() + "."); }
JavaScript
alert( "Hello,How,Are,You,Today".split(",").join(".") );
MAXScript
output = "" for word in (filterString "Hello,How,Are,You,Today" ",") do ( output += (word + ".") ) format "%\n" output
OCaml
To split on a single-character separator: <ocaml>let rec split_char sep str =
try let i = String.index str sep in String.sub str 0 i :: split_char sep (String.sub str (i+1) (String.length str - i - 1)) with Not_found -> [str]</ocaml>
Splitting on a string separator using the regular expressions library: <ocaml>#load "str.cma";; let split_str sep str =
Str.split (Str.regexp_string sep) str</ocaml>
There is already a library function for joining: <ocaml>String.concat sep strings</ocaml>
Perl
As a one liner without a trailing period, and most efficient way of doing it as you don't have to define an array.
print join('.', split(/,/, "Hello,How,Are,You,Today"));
If you needed to keep an array for later use, again no trailing period
my @words = split(/,/, "Hello,How,Are,You,Today"); print join('.', @words);
If you really want a trailing period, here is an example
my @words = split(/,/, "Hello,How,Are,You,Today"); print $_.'.' for (@words);
PHP
<?php $str = 'Hello,How,Are,You,Today'; echo implode('.', explode(',', $str)); ?>
Pop11
The natural solution in Pop11 uses lists.
There are built in libraries for tokenising strings, illustrated below, along with code that the user could create for the task.
First show the use of sysparse_string to break up a string and make a list of strings.
;;; Make a list of strings from a string using space as separator lvars list; sysparse_string('the cat sat on the mat') -> list; ;;; print the list of strings list => ** [the cat sat on the mat]
By giving it an extra parameter 'true' we can make it recognize numbers and produce a list of strings and numbers
lvars list; sysparse_string('one 1 two 2 three 3 four 4', true) -> list; ;;; print the list of strings and numbers list => ** [one 1 two 2 three 3 four 4] ;;; check that first item is a string and second an integer isstring(list(1))=> ** <true> isinteger(list(2))=> ** <true>
Now show some uses of the built in procedure sys_parse_string, which allows more options:
;;; Make pop-11 print strings with quotes true -> pop_pr_quotes; ;;; ;;; Create a string of tokens using comma as token separator lvars str='Hello,How,Are,You,Today'; ;;; ;;; Make a list of strings by applying sys_parse_string ;;; to str, using the character `,` as separator (the default ;;; separator, if none is provided, is the space character). lvars strings; [% sys_parse_string(str, `,`) %] -> strings; ;;; ;;; print the list of strings strings => ** ['Hello' 'How' 'Are' 'You' 'Today']
If {% ... %} were used instead of [% ... %] the result would be a vector (i.e. array) of strings rather than a list of strings.
{% sys_parse_string(str, `,`) %} -> strings; ;;; print the vector strings => ** {'Hello' 'How' 'Are' 'You' 'Today'}
It is also possible to give sys_parse_string a 'conversion' procedure, which is applied to each of the tokens. E.g. it could be used to produce a vector of numbers, using the conversion procedure 'strnumber', which converts a string to a number:
lvars numbers; {% sys_parse_string('100 101 102 103 99.9 99.999', strnumber) %} -> numbers; ;;; the result is a vector containing integers and floats, ;;; which can be printed thus: numbers => ** {100 101 102 103 99.9 99.999}
Using lower level pop-11 facilities to tokenise the string:
;;; Declare and initialize variables lvars str='Hello,How,Are,You,Today'; ;;; Iterate over string lvars ls = [], i, j = 1; for i from 1 to length(str) do ;;; If comma if str(i) = `,` then ;;; Prepend word (substring) to list cons(substring(j, i - j, str), ls) -> ls; i + 1 -> j; endif; endfor; ;;; Prepend final word (if needed) if j <= length(str) then cons(substring(j, length(str) - j + 1, str), ls) -> ls; endif; ;;; Reverse the list rev(ls) -> ls;
Since the task requires to use array we convert list to array
;;; Put list elements and lenght on the stack destlist(ls); ;;; Build a vector from them lvars ar = consvector(); ;;; Display in a loop, putting trailing period for i from 1 to length(ar) do printf(ar(i), '%s.'); endfor; printf('\n');
We could use list directly for printing:
for i in ls do printf(i, '%s.'); endfor;
so the conversion to vector is purely to satisfy task formulation.
Python
text = "Hello,How,Are,You,Today" tokens = text.split(',') print '.'.join(tokens)
If you want to print each word on its own line:
for token in tokens: print token
or
print "\n".join(tokens)
or the one liner
print '.'.join('Hello,How,Are,You,Today'.split(','))
Raven
'Hello,How,Are,You,Today' ',' split '.' join print
Ruby
string = "Hello,How,Are,You,Today".split(',') string.each do |w| print "#{w}." end
puts "Hello,How,Are,You,Today".split(',').join('.')
Seed7
var array string: tokens is 0 times "";
tokens := split("Hello,How,Are,You,Today", ",");
Smalltalk
|array | array := 'Hello,How,Are,You,Today' subStrings: $,. array fold: [:concatenation :string | concatenation, '.', string ]
Some implementations also have a join: convenience method that allows the following shorter solution:
('Hello,How,Are,You,Today' subStrings: $,) join: '.'
The solution displaying a trailing period would be:
|array | array := 'Hello,How,Are,You,Today' subStrings: $,. array inject: into: [:concatenation :string | concatenation, string, '.' ]
Standard ML
val splitter = String.tokens (fn c => c = #","); val main = (String.concatWith ".") o splitter;
Test:
- main "Hello,How,Are,You,Today" val it = "Hello.How.Are.You.Today" : string
Tcl
Generating a list form a string by splitting on a comma:
split string ,
Joining the elements of a list by a period:
join list .
Thus the whole thing would look like this:
puts [join [split "Hello,How,Are,You,Today" ,] .]
If you'd like to retain the list in a variable with the name "words", it would only be marginally more complex:
puts [join [set words [split "Hello,How,Are,You,Today" ,]] .]
UnixPipes
rtoken() { (IFS=\ read A B ; echo $A; test -n "$B" && (echo $B | token) ) }
tokens() { IFS=, read A ; echo $A | rtoken }
echo "Hello,How,Are,You" | tokens