Talk:Tokenize a string: Difference between revisions

Line 12:
As for the phrase 'domain-specific language' it has little real use as it is so widely applied, and should not be confused with a [[wp:Alphabetical_list_of_programming_languages|programming language]] in this context --[[User:Paddy3118|Paddy3118]] 05:27, 21 May 2009 (UTC)
::Moreover, <tt>tr</tt> is surely part of the [[:Category:UnixPipes]] paradigm, and can be also used in shell scripts as part of the almost omnipresent coreutils package (see e.g. [[Change string case#UnixPipes]]). --[[User:ShinTakezou|ShinTakezou]] 11:34, 21 May 2009 (UTC)
Revisiting this old thread ... I have no problem with tr. But whether it's a programming language or not, the tr example doesn't even come close to meeting the task description. There are 2 steps. First, identifying the tokens by the delimiters, and placing them in a structured data entity for further processing. Second, processing the structured data to produce the desired output.
 
The tr example does neither of these. It's one-step text substitution that acts only on delimiters, not tokens. And it operates blindly even on a token-free string. For example, the string ",,,," contains zero tokens. Step 1 should produce an empty data structure, and Step 2 should output an empty string, since there are no tokens to separate with a period. Instead, tr will blindly produce "....".
 
As a Snobol user, I'm sensitive to languages where strings are a primary datatype and often stand in for arrays or other structures. But delimiter substitution is simply not tokenization. --[[User:Snoman|Snoman]] 20:15, 23 July 2010 (UTC)
 
==Unix Pipes and the shell==
Anonymous user