Tokenize a string: Difference between revisions

Content added Content deleted
(adding gap)
Line 555: Line 555:
{{works with|Java|1.0+}}
{{works with|Java|1.0+}}


There are multiple ways to tokenize a String in Java.
There are multiple ways to tokenize a String in Java. The first is by splitting the String into an array of Strings, and the other way is to use StringTokenizer with a delimiter. The second way given here will skip any empty tokens. So if two commas are given in line, there will be an empty string in the array given by the split function, but no empty string with the StringTokenizer object.


The first is by splitting the String into an array of Strings. The separator is actually a regular expression so you could do very powerful things with this, but make sure to escape any characters with special meaning in regex.

{{works with|Java|1.4+}}
<lang java5>String toTokenize = "Hello,How,Are,You,Today";
<lang java5>String toTokenize = "Hello,How,Are,You,Today";


//First way
String words[] = toTokenize.split(",");//splits on one comma, multiple commas yield multiple splits
String words[] = toTokenize.split(",");//splits on one comma, multiple commas yield multiple splits
//toTokenize.split(",+") if you want to ignore empty fields
for(int i=0; i<word.length; i++) {
for(int i=0; i<word.length; i++) {
System.out.print(word[i] + ".");
System.out.print(word[i] + ".");
}</lang>
}

The other way is to use StringTokenizer. It will skip any empty tokens. So if two commas are given in line, there will be an empty string in the array given by the split function, but no empty string with the StringTokenizer object. This method takes more code to use, but allows you to get tokens incrementally instead of all at once.

{{works with|Java|1.0+}}
<lang java5>String toTokenize = "Hello,How,Are,You,Today";


//Second way
StringTokenizer tokenizer = new StringTokenizer(toTokenize, ",");
StringTokenizer tokenizer = new StringTokenizer(toTokenize, ",");
while(tokenizer.hasMoreTokens()) {
while(tokenizer.hasMoreTokens()) {
System.out.print(tokenizer.nextToken() + ".");
System.out.print(tokenizer.nextToken() + ".");
}

//Third way
String words[] = toTokenize.split(",+");//split on one or more commas, multiple commas yield one split
for(int i=0; i<word.length; i++) {
System.out.print(word[i] + ".");
}</lang>
}</lang>