Tokenize a string: Difference between revisions
No edit summary |
(Ada) |
||
Line 1: | Line 1: | ||
{{task}} |
{{task}} |
||
Separate the string "Hello,How,Are,You,Today" by commas into an array so that each index of the array stores a different word. Display the words to the 'user', in the simplest manner possible, separated by a period. To simplify, you may display a trailing period. |
Separate the string "Hello,How,Are,You,Today" by commas into an array so that each index of the array stores a different word. Display the words to the 'user', in the simplest manner possible, separated by a period. To simplify, you may display a trailing period. |
||
==[[Ada]]== |
|||
[[Category:Ada]] |
|||
with Ada.Strings.Fixed; use Ada.Strings.Fixed; |
|||
with Ada.Text_Io; use Ada.Text_Io; |
|||
procedure Parse_Commas is |
|||
Source_String : String := "Hello,How,Are,You,Today"; |
|||
Index_List : array(1..256) of Natural; |
|||
Next_Index : Natural := 1; |
|||
begin |
|||
Index_List(Next_Index) := 1; |
|||
while Index_List(Next_Index) < Source_String'Last loop |
|||
Next_Index := Next_Index + 1; |
|||
Index_List(Next_Index) := 1 + Index(Source_String(Index_List(Next_Index - 1)..Source_String'Last), ","); |
|||
if Index_List(Next_Index) = 1 then |
|||
Index_List(Next_Index) := Source_String'Last + 2; |
|||
end if; |
|||
Put(Source_String(Index_List(Next_Index - 1)..Index_List(Next_Index)-2) & "."); |
|||
end loop; |
|||
end Parse_Commas; |
|||
==[[C plus plus|C++]]== |
==[[C plus plus|C++]]== |
Revision as of 04:08, 15 February 2007
You are encouraged to solve this task according to the task description, using any language you may know.
Separate the string "Hello,How,Are,You,Today" by commas into an array so that each index of the array stores a different word. Display the words to the 'user', in the simplest manner possible, separated by a period. To simplify, you may display a trailing period.
Ada
with Ada.Strings.Fixed; use Ada.Strings.Fixed; with Ada.Text_Io; use Ada.Text_Io; procedure Parse_Commas is Source_String : String := "Hello,How,Are,You,Today"; Index_List : array(1..256) of Natural; Next_Index : Natural := 1; begin Index_List(Next_Index) := 1; while Index_List(Next_Index) < Source_String'Last loop Next_Index := Next_Index + 1; Index_List(Next_Index) := 1 + Index(Source_String(Index_List(Next_Index - 1)..Source_String'Last), ","); if Index_List(Next_Index) = 1 then Index_List(Next_Index) := Source_String'Last + 2; end if; Put(Source_String(Index_List(Next_Index - 1)..Index_List(Next_Index)-2) & "."); end loop; end Parse_Commas;
C++
Standard: ANSI C++
Compiler: GCC g++ (GCC) 3.4.4 (cygming special)
Library: STL
This is not the most efficient method as it involves redundant copies in the background, but it is very easy to use. In most cases it will be a good choice as long as it is not used as an inner loop in a performance critical system.
Note doxygen tags in comments before function, describing details of interface.
#include <string> #include <vector> /// \brief convert input string into vector of string tokens /// /// \note consecutive delimiters will be treated as single delimiter /// \note delimiters are _not_ included in return data /// /// \param input string to be parsed /// \param delims list of delimiters. std::vector<std::string> tokenize_str(const std::string & str, const std::string & delims=", \t") { using namespace std; // Skip delims at beginning. string::size_type lastPos = str.find_first_not_of(delims, 0); // Find first "non-delimiter". string::size_type pos = str.find_first_of(delims, lastPos); // output vector vector<string> tokens; while (string::npos != pos || string::npos != lastPos) { // Found a token, add it to the vector. tokens.push_back(str.substr(lastPos, pos - lastPos)); // Skip delims. Note the "not_of" lastPos = str.find_first_not_of(delims, pos); // Find next "non-delimiter" pos = str.find_first_of(delims, lastPos); } return tokens; }
here is sample usage code:
#include <iostream> int main() { using namespace std; string s("Hello,How,Are,You,Today"); vector<string> v(tokenize_str(s)); for (unsigned i = 0; i < v.size(); i++) cout << v[i] << "."; cout << endl; return 0; }
Java
Compiler: JDK 1.0 and up
There is multiple way to tokenized a string in Java. The first with a split the String into an array of String, and the other way to give a Enumerator. The second way given here will skip any empty token. So if two commas are given in line, there will be an empty string in the array given by the split function but no empty string with the StringTokenizer object.
String toTokenize = "Hello,How,Are,You,Today"; //First way String word[] = toTokenize.split(","); for(int i=0; i<word.length; i++) { System.out.print(word[i] + "."); } //Second way StringTokenizer tokenizer = new StringTokenizer(toTokenize, ","); while(tokenizer.hasMoreTokens()) { System.out.print(tokenizer.nextToken() + "."); }
JavaScript
Interpreter: Firefox 2.0
var str = "Hello,How,Are,You,Today"; var tokens = str.split(","); alert( tokens.join(".") );
Perl
Interpreter: Perl any 5.X
As a one liner without a trailing period, and most efficient way of doing it as you don't have to define an array.
print join('.', split(/,/, "Hello,How,Are,You,Today"));
If you needed to keep an array for later use, again no trailing period
my @words = split(/,/, "Hello,How,Are,You,Today"); print join('.', @words);
If you really want a trailing period, here is an example
my @words = split(/,/, "Hello,How,Are,You,Today"); print $_.'.' for (@words);
Python
Interpreter: Python 2.5
words = "Hello,How,Are,You,Today".split(',') for word in words: print word
This prints each word on its own line. If we want to follow the task specification strictly, we join the array elements with a dot, then print the resulting string:
print '.'.join("Hello,How,Are,You,Today".split(','))
Or replace the '.' with '\n' and do:
print "\n".join("Hello,How,Are,You,Today".split(','))
Ruby
string = "Hello,How,Are,You,Today".split(',') string.each do |w| print "#{w}." end
Tcl
Generating a list form a string by splitting on a comma:
split string ,
Joining the elements of a list by a period:
join list .
Thus the whole thing would look like this:
puts [join [split "Hello,How,Are,You,Today" ,] .]
If you'd like to retain the list in a variable with the name "words", it would only be marginally more complex:
puts [join [set words [split "Hello,How,Are,You,Today" ,]] .]