Tokenize a string: Difference between revisions

← Older edit

Tokenize a string (view source)

Revision as of 14:45, 14 February 2024

4,905 bytes added , 3 months ago

m

→‎{{header|Wren}}: Changed to Wren S/H

PureFox

9,476

edits

Revision as of 15:16, 26 April 2023 (view source) Lanky79 (talk \| contribs) (→‎{{header\|EMal}}) ← Older edit		Latest revision as of 14:45, 14 February 2024 (view source) PureFox (talk \| contribs) m (→‎{{header\|Wren}}: Changed to Wren S/H)
(15 intermediate revisions by 9 users not shown)
Line 1,003: NEXT PRINT</syntaxhighlight> ==={{header\|Chipmunk Basic}}=== Solutions [[#Applesoft BASIC\|Applesoft BASIC]] and [[#Commodore BASIC\|Commodore BASIC]] work without changes. ==={{header\|Commodore BASIC}}=== Based on the AppleSoft BASIC version. <syntaxhighlight lang="commodorebasic">10 REM TOKENIZE A STRING ... ROSETTACODE.ORG ~~10 REM TOKENIZE A STRING ... ROSETTACODE.ORG~~ 20 T$ = "HELLO,HOW,ARE,YOU,TODAY" 30 GOSUB 200, TOKENIZE Line 1,023 ⟶ 1,025: 260 N = N + 1 270 NEXT L 280 RETURN</syntaxhighlight> ~~</syntaxhighlight>~~ ==={{header\|FreeBASIC}}=== <syntaxhighlight lang="freebasic">sub tokenize( instring as string, tokens() as string, sep as string ) Line 1,058 ⟶ 1,060: Print Left$(array$, (Len(array$) - 1))</syntaxhighlight> ==={{header\|~~PowerBASIC~~MSX Basic}}=== The [[#Commodore BASIC\|Commodore BASIC]] solution works without any changes. ==={{header\|PowerBASIC}}=== PowerBASIC has a few keywords that make parsing strings trivial: <code>PARSE</code>, <code>PARSE$</code>, and <code>PARSECOUNT</code>. (<code>PARSE$</code>, not shown here, is for extracting tokens one at a time, while <code>PARSE</code> extracts all tokens at once into an array. <code>PARSECOUNT</code> returns the number of tokens found.) Line 1,195 ⟶ 1,199: =={{header\|BQN}}== Uses a splitting idiom from bqncrate. <syntaxhighlight lang="bqn">Split ← ~~Split←~~(~~(⊢-˜~~+`×¬)⊸-∘=~~⊔⊢)~~ ⊔ ⊢ ~~(⊢-˜+`×¬)∘=⊔⊢~~ ∾⟜'.'⊸∾´ ',' Split "Hello,How,Are,You,Today"</syntaxhighlight> {{out}} ~~"Hello.How.Are.You.Today"</syntaxhighlight>~~ <pre>"Hello.How.Are.You.Today"</pre> =={{header\|Bracmat}}== Line 1,657 ⟶ 1,662: </syntaxhighlight> =={{header\|~~Dyalect~~dt}}== <syntaxhighlight lang="dt">"Hello,How,Are,You,Today" "," split "." join pl</syntaxhighlight> =={{header\|Dyalect}}== <syntaxhighlight lang="dyalect">var str = "Hello,How,Are,You,Today" var strings = str.Split(',') print(values: strings, separator: ".")</syntaxhighlight> {{out}} <pre>Hello.How.Are.You.Today</pre> Line 1,674 ⟶ 1,679: =={{header\|E}}== <syntaxhighlight lang="e">".".rjoin("Hello,How,Are,You,Today".split(","))</syntaxhighlight> =={{header\|EasyLang}}== <syntaxhighlight lang="easylang"> s$ = "Hello,How,Are,You,Today" a$[] = strsplit s$ "," for s$ in a$[] write s$ & "." . </syntaxhighlight> =={{header\|Elena}}== ELENA 46.x: <syntaxhighlight lang="elena">import system'routines; import extensions; Line 1,682 ⟶ 1,696: public program() { ~~var~~auto string := "Hello,How,Are,You,Today"; string.splitBy:(",").forEach::(s) { console.print(s,".") Line 2,088 ⟶ 2,102: =={{header\|JavaScript}}== <syntaxhighlight lang="javascript">console.log( ~~{{works with\|Firefox\|2.0}}~~ "Hello,How,Are,You,Today" .split(",") .join(".") );</syntaxhighlight>A more advanced program to tokenise strings:<syntaxhighlight lang="javascript" line="1"> const Tokeniser = (function () { const numberRegex = /-?(\d+\.d+\|\d+\.\|\.\d+\|\d+)((e\|E)(\+\|-)?\d+)?/g; return { settings: { operators: ["<", ">", "=", "+", "-", "*", "/", "?", "!"], separators: [",", ".", ";", ":", " ", "\t", "\n"], groupers: ["(", ")", "[", "]", "{", "}", '"', '"', "'", "'"], keepWhiteSpacesAsTokens: false, trimTokens: true }, isNumber: function (value) { if (typeof value === "number") { return true; } else if (typeof value === "string") { return numberRegex.test(value); } return false; }, closeGrouper: function (grouper) { if (this.settings.groupers.includes(grouper)) { return this.settings.groupers[this.settings.groupers.indexOf(grouper) + 1]; } return null; }, tokenType: function (char) { if (this.settings.operators.includes(char)) { return "operator"; } else if (this.settings.separators.includes(char)) { return "separator"; } else if (this.settings.groupers.includes(char)) { return "grouper"; } return "other"; }, parseString: function (str) { if (typeof str !== "string") { if (str === null) { return "null"; } if (typeof str === "object") { str = JSON.stringify(str); } else { str = str.toString(); } } let tokens = [], _tempToken = ""; for (let i = 0; i < str.length; i++) { if (this.tokenType(_tempToken) !== this.tokenType(str[i]) \|\| this.tokenType(str[i]) === "separator") { if (_tempToken.trim() !== "") { tokens.push(this.settings.trimTokens ? _tempToken.trim() : _tempToken); } else if (this.settings.keepWhiteSpacesAsTokens) { tokens.push(_tempToken); } _tempToken = str[i]; if (this.tokenType(_tempToken) === "separator") { if (_tempToken.trim() !== "") { tokens.push(this.settings.trimTokens ? _tempToken.trim() : _tempToken); } else if (this.settings.keepWhiteSpacesAsTokens) { tokens.push(_tempToken); } _tempToken = ""; } } else { _tempToken += str[i]; } } if (_tempToken.trim() !== "") { tokens.push(this.settings.trimTokens ? _tempToken.trim() : _tempToken); } else if (this.settings.keepWhiteSpacesAsTokens) { tokens.push(_tempToken); } return tokens.filter((token) => token !== ""); } }; })(); </syntaxhighlight>Output:<syntaxhighlight lang="javascript"> Tokeniser.parseString("Hello,How,Are,You,Today"); // -> ['Hello', ',', 'How', ',', 'Are', ',', 'You', ',', 'Today'] ~~<syntaxhighlight lang="javascript">alert( "Hello,How,Are,You,Today".split(",").join(".") );</syntaxhighlight>~~ </syntaxhighlight> =={{header\|jq}}== Line 2,130 ⟶ 2,225: "Hello.How.Are.You.Today" </pre> {{works with\|ngn/k}}<syntaxhighlight lang=K>","\"Hello,How,Are,You,Today" ("Hello" "How" "Are" "You" "Today")</syntaxhighlight> =={{header\|Klingphix}}== Line 2,397 ⟶ 2,499: <syntaxhighlight lang="maxima">l: split("Hello,How,Are,You,Today", ",")$ printf(true, "~{~a~^.~}~%", l)$</syntaxhighlight> A slightly different way <syntaxhighlight lang="maxima"> split("Hello,How,Are,You,Today",",")$ simplode(%,"."); </syntaxhighlight> {{out}} <pre> "Hello.How.Are.You.Today" </pre> =={{header\|MAXScript}}== Line 2,605 ⟶ 2,717: +-----+---+---+---+-----+ u:=front content (cart t `.) Hello.How.Are.You.Today</syntaxhighlight> Or as a one-liner: <syntaxhighlight lang="nial"> front content (cart (s eachall = `, cut s) `.) </syntaxhighlight> =={{header\|Nim}}== Line 3,359 ⟶ 3,478: see substr("Hello,How,Are,You,Today", ",", ".") </syntaxhighlight> =={{header\|RPL}}== The program below fully complies with the task requirements, e.g. the input string is converted to a list of words, then the list is converted to a string. {{works with\|Halcyon Calc\|4.2.8}} {\| class="wikitable" ! RPL code ! Comment \|- \| ≪ "}" + "{" SWAP + STR→ 1 OVER SIZE '''FOR''' j DUP j GET →STR 2 OVER SIZE 1 - SUB j SWAP PUT '''NEXT''' "" 1 3 PICK SIZE '''FOR''' j OVER j GET + '''IF''' OVER SIZE j ≠ '''THEN''' "." + '''END''' '''NEXT''' SWAP DROP ≫ '<span style="color:blue">'''TOKNZ'''</span>' STO \| <span style="color:blue">'''TOKNZ'''</span> ''<span style="color:grey">( "word,word" → "word.word" )</span> '' convert string into list (words being between quotes) loop for each list item convert it to a string, remove quotes at beginning and end loop for each list item add item to output string if not last item, append "." clean stack return output string \|} "Hello,How,Are,You,Today" <span style="color:blue">'''TOKNZ'''</span> </pre> '''Output:''' <span style="color:grey"> 1:</span> "Hello.How.Are.You.Today" If direct string-to-string conversion is allowed, then this one-liner for HP-48+ will do the job: ≪ 1 OVER SIZE '''FOR''' j '''IF''' DUP j DUP SUB "," == '''THEN''' j "." REPL '''END NEXT''' ≫ '<span style="color:blue">'''TOKNZ'''</span>' STO =={{header\|Ruby}}== Line 3,859 ⟶ 4,016: =={{header\|Wren}}== <syntaxhighlight lang="~~ecmascript~~wren">var s = "Hello,How,Are,You,Today" var t = s.split(",").join(".") + "." System.print(t)</syntaxhighlight>