Tokenize a string: Difference between revisions

Content added Content deleted
(→‎{{header|Java}}: Updated the for-loop to use the proper variable name (words - not word).)
Line 875: Line 875:
;;</lang>
;;</lang>


But both of these will process extraneous String.sub (so one string alloc). For N tokens there will be (N - 2) unneeded allocs. To resolve this here is a version which first gets the indices, and then extracts the tokens:
But both of these will process extraneous String.sub (so one string alloc) to generate the "rest of the string" each time to pass to the next call. For N tokens there will be (N - 2) unneeded allocs. To resolve this here is a version which keeps track of the index in the string we will look next:


<lang ocaml>let split_char sep str =
<lang ocaml>let split_char sep str =
let rec indices acc i =
let string_index_from i =
try Some (String.index_from str i sep)
try
with Not_found -> None
let i = succ(String.index_from str i sep) in
indices (i::acc) i
with Not_found ->
(String.length str + 1) :: acc
in
in
let is = indices [0] 0 in
let rec aux i acc = match string_index_from i with
let rec aux acc = function
| Some i' ->
let w = String.sub str i (i' - i) in
| last::start::tl ->
let w = String.sub str start (last-start-1) in
aux (succ i') (w::acc)
| None ->
aux (w::acc) (start::tl)
let w = String.sub str i (String.length str - i) in
| _ -> acc
List.rev (w::acc)
in
in
aux [] is</lang>
aux 0 []</lang>


Splitting on a string separator using the regular expressions library:
Splitting on a string separator using the regular expressions library: