Compiler/lexical analyzer: Difference between revisions

Content added Content deleted
m (→‎{{header|ALGOL 68}}: more helpful diagnostics)
m (syntax highlighting fixup automation)
Line 158: Line 158:
For example, the following two program fragments are equivalent, and should produce the same token stream except for the line and column positions:
For example, the following two program fragments are equivalent, and should produce the same token stream except for the line and column positions:


* <lang c>if ( p /* meaning n is prime */ ) {
* <syntaxhighlight lang="c">if ( p /* meaning n is prime */ ) {
print ( n , " " ) ;
print ( n , " " ) ;
count = count + 1 ; /* number of primes found so far */
count = count + 1 ; /* number of primes found so far */
}</lang>
}</syntaxhighlight>
* <lang c>if(p){print(n," ");count=count+1;}</lang>
* <syntaxhighlight lang="c">if(p){print(n," ");count=count+1;}</syntaxhighlight>


;Complete list of token names
;Complete list of token names
Line 237: Line 237:
| style="vertical-align:top" |
| style="vertical-align:top" |
Test Case 1:
Test Case 1:
<lang c>/*
<syntaxhighlight lang="c">/*
Hello world
Hello world
*/
*/
print("Hello, World!\n");</lang>
print("Hello, World!\n");</syntaxhighlight>


| style="vertical-align:top" |
| style="vertical-align:top" |
Line 255: Line 255:
| style="vertical-align:top" |
| style="vertical-align:top" |
Test Case 2:
Test Case 2:
<lang c>/*
<syntaxhighlight lang="c">/*
Show Ident and Integers
Show Ident and Integers
*/
*/
phoenix_number = 142857;
phoenix_number = 142857;
print(phoenix_number, "\n");</lang>
print(phoenix_number, "\n");</syntaxhighlight>


| style="vertical-align:top" |
| style="vertical-align:top" |
Line 280: Line 280:
| style="vertical-align:top" |
| style="vertical-align:top" |
Test Case 3:
Test Case 3:
<lang c>/*
<syntaxhighlight lang="c">/*
All lexical tokens - not syntactically correct, but that will
All lexical tokens - not syntactically correct, but that will
have to wait until syntax analysis
have to wait until syntax analysis
Line 301: Line 301:
/* character literal */ '\n'
/* character literal */ '\n'
/* character literal */ '\\'
/* character literal */ '\\'
/* character literal */ ' '</lang>
/* character literal */ ' '</syntaxhighlight>


| style="vertical-align:top" |
| style="vertical-align:top" |
Line 344: Line 344:
| style="vertical-align:top" |
| style="vertical-align:top" |
Test Case 4:
Test Case 4:
<lang c>/*** test printing, embedded \n and comments with lots of '*' ***/
<syntaxhighlight lang="c">/*** test printing, embedded \n and comments with lots of '*' ***/
print(42);
print(42);
print("\nHello World\nGood Bye\nok\n");
print("\nHello World\nGood Bye\nok\n");
print("Print a slash n - \\n.\n");</lang>
print("Print a slash n - \\n.\n");</syntaxhighlight>


| style="vertical-align:top" |
| style="vertical-align:top" |
Line 388: Line 388:


=={{header|Ada}}==
=={{header|Ada}}==
<lang ada>with Ada.Text_IO, Ada.Streams.Stream_IO, Ada.Strings.Unbounded, Ada.Command_Line,
<syntaxhighlight lang="ada">with Ada.Text_IO, Ada.Streams.Stream_IO, Ada.Strings.Unbounded, Ada.Command_Line,
Ada.Exceptions;
Ada.Exceptions;
use Ada.Strings, Ada.Strings.Unbounded, Ada.Streams, Ada.Exceptions;
use Ada.Strings, Ada.Strings.Unbounded, Ada.Streams, Ada.Exceptions;
Line 648: Line 648:
when error : others => IO.Put_Line("Error: " & Exception_Message(error));
when error : others => IO.Put_Line("Error: " & Exception_Message(error));
end Main;
end Main;
</syntaxhighlight>
</lang>
{{out}} Test case 3:
{{out}} Test case 3:
<pre>
<pre>
Line 691: Line 691:


As an addition, it emits a diagnostic if integer literals are too big.
As an addition, it emits a diagnostic if integer literals are too big.
<lang algol68>BEGIN
<syntaxhighlight lang="algol68">BEGIN
# implement C-like getchar, where EOF and EOLn are "characters" (-1 and 10 resp.). #
# implement C-like getchar, where EOF and EOLn are "characters" (-1 and 10 resp.). #
INT eof = -1, eoln = 10;
INT eof = -1, eoln = 10;
Line 828: Line 828:
OD;
OD;
output("End_Of_Input")
output("End_Of_Input")
END</lang>
END</syntaxhighlight>


=={{header|ALGOL W}}==
=={{header|ALGOL W}}==
<lang algolw>begin
<syntaxhighlight lang="algolw">begin
%lexical analyser %
%lexical analyser %
% Algol W strings are limited to 256 characters in length so we limit source lines %
% Algol W strings are limited to 256 characters in length so we limit source lines %
Line 1,124: Line 1,124:
while nextToken not = tEnd_of_input do writeToken;
while nextToken not = tEnd_of_input do writeToken;
writeToken
writeToken
end.</lang>
end.</syntaxhighlight>
{{out}} Test case 3:
{{out}} Test case 3:
<pre>
<pre>
Line 1,169: Line 1,169:
(One point of note: the C "EOF" pseudo-character is detected in the following code by looking for a negative number. That EOF has to be negative and the other characters non-negative is implied by the ISO C standard.)
(One point of note: the C "EOF" pseudo-character is detected in the following code by looking for a negative number. That EOF has to be negative and the other characters non-negative is implied by the ISO C standard.)


<lang ATS>(********************************************************************)
<syntaxhighlight lang="ats">(********************************************************************)
(* Usage: lex [INPUTFILE [OUTPUTFILE]]
(* Usage: lex [INPUTFILE [OUTPUTFILE]]
If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input
If INPUTFILE or OUTPUTFILE is "-" or missing, then standard input
Line 2,041: Line 2,041:
end
end


(********************************************************************)</lang>
(********************************************************************)</syntaxhighlight>


{{out}}
{{out}}
Line 2,082: Line 2,082:
=={{header|AWK}}==
=={{header|AWK}}==
Tested with gawk 4.1.1 and mawk 1.3.4.
Tested with gawk 4.1.1 and mawk 1.3.4.
<syntaxhighlight lang="awk">
<lang AWK>
BEGIN {
BEGIN {
all_syms["tk_EOI" ] = "End_of_input"
all_syms["tk_EOI" ] = "End_of_input"
Line 2,288: Line 2,288:
}
}
}
}
</syntaxhighlight>
</lang>
{{out|case=count}}
{{out|case=count}}
<b>
<b>
Line 2,325: Line 2,325:
=={{header|C}}==
=={{header|C}}==
Tested with gcc 4.81 and later, compiles warning free with -Wpedantic -pedantic -Wall -Wextra
Tested with gcc 4.81 and later, compiles warning free with -Wpedantic -pedantic -Wall -Wextra
<lang C>#include <stdlib.h>
<syntaxhighlight lang="c">#include <stdlib.h>
#include <stdio.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdarg.h>
Line 2,557: Line 2,557:
run();
run();
return 0;
return 0;
}</lang>
}</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 2,601: Line 2,601:
=={{header|C sharp|C#}}==
=={{header|C sharp|C#}}==
Requires C#6.0 because of the use of null coalescing operators.
Requires C#6.0 because of the use of null coalescing operators.
<lang csharp>
<syntaxhighlight lang="csharp">
using System;
using System;
using System.IO;
using System.IO;
Line 2,951: Line 2,951:
}
}
}
}
</syntaxhighlight>
</lang>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 2,995: Line 2,995:
=={{header|C++}}==
=={{header|C++}}==
Tested with GCC 9.3.0 (g++ -std=c++17)
Tested with GCC 9.3.0 (g++ -std=c++17)
<lang cpp>#include <charconv> // std::from_chars
<syntaxhighlight lang="cpp">#include <charconv> // std::from_chars
#include <fstream> // file_to_string, string_to_file
#include <fstream> // file_to_string, string_to_file
#include <functional> // std::invoke
#include <functional> // std::invoke
Line 3,380: Line 3,380:
});
});
}
}
</syntaxhighlight>
</lang>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 3,425: Line 3,425:
Using GnuCOBOL 2. By Steve Williams (with one change to get around a Rosetta Code code highlighter problem).
Using GnuCOBOL 2. By Steve Williams (with one change to get around a Rosetta Code code highlighter problem).


<lang cobol> >>SOURCE FORMAT IS FREE
<syntaxhighlight lang="cobol"> >>SOURCE FORMAT IS FREE
*> this code is dedicated to the public domain
*> this code is dedicated to the public domain
*> (GnuCOBOL) 2.3-dev.0
*> (GnuCOBOL) 2.3-dev.0
Line 3,831: Line 3,831:
end-if
end-if
.
.
end program lexer.</lang>
end program lexer.</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 3,873: Line 3,873:
Lisp has a built-in reader and you can customize the reader by modifying its readtable. I'm also using the Gray stream, which is an almost standard feature of Common Lisp, for counting lines and columns.
Lisp has a built-in reader and you can customize the reader by modifying its readtable. I'm also using the Gray stream, which is an almost standard feature of Common Lisp, for counting lines and columns.


<lang lisp>(defpackage #:lexical-analyzer
<syntaxhighlight lang="lisp">(defpackage #:lexical-analyzer
(:use #:cl #:sb-gray)
(:use #:cl #:sb-gray)
(:export #:main))
(:export #:main))
Line 4,086: Line 4,086:


(defun main ()
(defun main ()
(lex *standard-input*))</lang>
(lex *standard-input*))</syntaxhighlight>
{{out|case=test case 3}}
{{out|case=test case 3}}
<pre> 5 16 KEYWORD-PRINT
<pre> 5 16 KEYWORD-PRINT
Line 4,127: Line 4,127:
{{trans|ATS}}
{{trans|ATS}}


<lang Elixir>#!/bin/env elixir
<syntaxhighlight lang="elixir">#!/bin/env elixir
# -*- elixir -*-
# -*- elixir -*-


Line 4,595: Line 4,595:
end ## module Lex
end ## module Lex


Lex.main(System.argv)</lang>
Lex.main(System.argv)</syntaxhighlight>


{{out}}
{{out}}
Line 4,641: Line 4,641:




<lang lisp>#!/usr/bin/emacs --script
<syntaxhighlight lang="lisp">#!/usr/bin/emacs --script
;;
;;
;; The Rosetta Code lexical analyzer in GNU Emacs Lisp.
;; The Rosetta Code lexical analyzer in GNU Emacs Lisp.
Line 5,059: Line 5,059:
(scan-text t))
(scan-text t))


(main)</lang>
(main)</syntaxhighlight>




Line 5,105: Line 5,105:




<lang erlang>#!/bin/env escript
<syntaxhighlight lang="erlang">#!/bin/env escript
%%%-------------------------------------------------------------------
%%%-------------------------------------------------------------------


Line 5,610: Line 5,610:
%%% erlang-indent-level: 3
%%% erlang-indent-level: 3
%%% end:
%%% end:
%%%-------------------------------------------------------------------</lang>
%%%-------------------------------------------------------------------</syntaxhighlight>




Line 5,652: Line 5,652:
=={{header|Euphoria}}==
=={{header|Euphoria}}==
Tested with Euphoria 4.05.
Tested with Euphoria 4.05.
<lang euphoria>include std/io.e
<syntaxhighlight lang="euphoria">include std/io.e
include std/map.e
include std/map.e
include std/types.e
include std/types.e
Line 5,877: Line 5,877:
end procedure
end procedure


main(command_line())</lang>
main(command_line())</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 5,921: Line 5,921:
=={{header|Flex}}==
=={{header|Flex}}==
Tested with Flex 2.5.4.
Tested with Flex 2.5.4.
<syntaxhighlight lang="c">%{
<lang C>%{
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdlib.h>
Line 6,094: Line 6,094:
} while (tok != tk_EOI);
} while (tok != tk_EOI);
return 0;
return 0;
}</lang>
}</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 6,138: Line 6,138:
=={{header|Forth}}==
=={{header|Forth}}==
Tested with Gforth 0.7.3.
Tested with Gforth 0.7.3.
<lang Forth>CREATE BUF 0 , \ single-character look-ahead buffer
<syntaxhighlight lang="forth">CREATE BUF 0 , \ single-character look-ahead buffer
CREATE COLUMN# 0 ,
CREATE COLUMN# 0 ,
CREATE LINE# 1 ,
CREATE LINE# 1 ,
Line 6,260: Line 6,260:
THEN THEN ;
THEN THEN ;
: TOKENIZE BEGIN CONSUME AGAIN ;
: TOKENIZE BEGIN CONSUME AGAIN ;
TOKENIZE</lang>
TOKENIZE</syntaxhighlight>


{{out}}
{{out}}
Line 6,274: Line 6,274:


The author has placed this Fortran code in the public domain.
The author has placed this Fortran code in the public domain.
<syntaxhighlight lang="fortran">!!!
<lang Fortran>!!!
!!! An implementation of the Rosetta Code lexical analyzer task:
!!! An implementation of the Rosetta Code lexical analyzer task:
!!! https://rosettacode.org/wiki/Compiler/lexical_analyzer
!!! https://rosettacode.org/wiki/Compiler/lexical_analyzer
Line 7,352: Line 7,352:
end subroutine print_usage
end subroutine print_usage
end program lex</lang>
end program lex</syntaxhighlight>


{{out}}
{{out}}
Line 7,393: Line 7,393:
=={{header|FreeBASIC}}==
=={{header|FreeBASIC}}==
Tested with FreeBASIC 1.05
Tested with FreeBASIC 1.05
<lang FreeBASIC>enum Token_type
<syntaxhighlight lang="freebasic">enum Token_type
tk_EOI
tk_EOI
tk_Mul
tk_Mul
Line 7,679: Line 7,679:
print : print "Hit any to end program"
print : print "Hit any to end program"
sleep
sleep
system</lang>
system</syntaxhighlight>
{{out|case=test case 3}}
{{out|case=test case 3}}
<b>
<b>
Line 7,720: Line 7,720:
=={{header|Go}}==
=={{header|Go}}==
{{trans|FreeBASIC}}
{{trans|FreeBASIC}}
<lang go>package main
<syntaxhighlight lang="go">package main


import (
import (
Line 8,097: Line 8,097:
initLex()
initLex()
process()
process()
}</lang>
}</syntaxhighlight>


{{out}}
{{out}}
Line 8,140: Line 8,140:
=={{header|Haskell}}==
=={{header|Haskell}}==
Tested with GHC 8.0.2
Tested with GHC 8.0.2
<lang haskell>import Control.Applicative hiding (many, some)
<syntaxhighlight lang="haskell">import Control.Applicative hiding (many, some)
import Control.Monad.State.Lazy
import Control.Monad.State.Lazy
import Control.Monad.Trans.Maybe (MaybeT, runMaybeT)
import Control.Monad.Trans.Maybe (MaybeT, runMaybeT)
Line 8,444: Line 8,444:
where (Just t, s') = runState (runMaybeT lexer) s
where (Just t, s') = runState (runMaybeT lexer) s
(txt, _, _) = s'
(txt, _, _) = s'
</syntaxhighlight>
</lang>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 8,496: Line 8,496:
Global variables are avoided except for some constants that require initialization.
Global variables are avoided except for some constants that require initialization.


<syntaxhighlight lang="icon">#
<lang Icon>#
# The Rosetta Code lexical analyzer in Icon with co-expressions. Based
# The Rosetta Code lexical analyzer in Icon with co-expressions. Based
# upon the ATS implementation.
# upon the ATS implementation.
Line 8,994: Line 8,994:
procedure max(x, y)
procedure max(x, y)
return (if x < y then y else x)
return (if x < y then y else x)
end</lang>
end</syntaxhighlight>




Line 9,043: Line 9,043:
Implementation:
Implementation:


<lang J>symbols=:256#0
<syntaxhighlight lang="j">symbols=:256#0
ch=: {{1 0+x[symbols=: x (a.i.y)} symbols}}
ch=: {{1 0+x[symbols=: x (a.i.y)} symbols}}
'T0 token' =: 0 ch '%+-!(){};,<>=!|&'
'T0 token' =: 0 ch '%+-!(){};,<>=!|&'
Line 9,163: Line 9,163:
keep=. (tokens~:<,'''')*-.comments+.whitespace+.unknown*a:=values
keep=. (tokens~:<,'''')*-.comments+.whitespace+.unknown*a:=values
keep&#each ((1+lines),.columns);<names,.values
keep&#each ((1+lines),.columns);<names,.values
}}</lang>
}}</syntaxhighlight>


Test case 3:
Test case 3:


<syntaxhighlight lang="j">
<lang J>
flex=: {{
flex=: {{
'A B'=.y
'A B'=.y
Line 9,233: Line 9,233:
21 28 Integer 92
21 28 Integer 92
22 27 Integer 32
22 27 Integer 32
23 1 End_of_input </lang>
23 1 End_of_input </syntaxhighlight>


Here, it seems expedient to retain a structured representation of the lexical result. As shown, it's straightforward to produce a "pure" textual result for a hypothetical alternative implementation of the syntax analyzer, but the structured representation will be easier to deal with.
Here, it seems expedient to retain a structured representation of the lexical result. As shown, it's straightforward to produce a "pure" textual result for a hypothetical alternative implementation of the syntax analyzer, but the structured representation will be easier to deal with.


=={{header|Java}}==
=={{header|Java}}==
<lang java>
<syntaxhighlight lang="java">
// Translated from python source
// Translated from python source


Line 9,479: Line 9,479:
}
}
}
}
</syntaxhighlight>
</lang>


=={{header|JavaScript}}==
=={{header|JavaScript}}==
{{incorrect|Javascript|Please show output. Code is identical to [[Compiler/syntax_analyzer]] task}}
{{incorrect|Javascript|Please show output. Code is identical to [[Compiler/syntax_analyzer]] task}}
<lang javascript>
<syntaxhighlight lang="javascript">
/*
/*
Token: type, value, line, pos
Token: type, value, line, pos
Line 9,696: Line 9,696:
l.printTokens()
l.printTokens()
})
})
</syntaxhighlight>
</lang>


=={{header|Julia}}==
=={{header|Julia}}==
<lang julia>struct Tokenized
<syntaxhighlight lang="julia">struct Tokenized
startline::Int
startline::Int
startcol::Int
startcol::Int
Line 9,854: Line 9,854:
println(lpad(tok.startline, 3), lpad(tok.startcol, 5), lpad(tok.name, 18), " ", tok.value != nothing ? tok.value : "")
println(lpad(tok.startline, 3), lpad(tok.startcol, 5), lpad(tok.name, 18), " ", tok.value != nothing ? tok.value : "")
end
end
</lang>{{output}}<pre>
</syntaxhighlight>{{output}}<pre>
Line Col Name Value
Line Col Name Value
5 16 Keyword_print
5 16 Keyword_print
Line 9,893: Line 9,893:
=={{header|kotlin}}==
=={{header|kotlin}}==
{{trans|Java}}
{{trans|Java}}
<lang kotlin>// Input: command line argument of file to process or console input. A two or
<syntaxhighlight lang="kotlin">// Input: command line argument of file to process or console input. A two or
// three character console input of digits followed by a new line will be
// three character console input of digits followed by a new line will be
// checked for an integer between zero and twenty-five to select a fixed test
// checked for an integer between zero and twenty-five to select a fixed test
Line 10,566: Line 10,566:
System.exit(1)
System.exit(1)
} // try
} // try
} // main</lang>
} // main</syntaxhighlight>
{{out|case=test case 3: All Symbols}}
{{out|case=test case 3: All Symbols}}
<b>
<b>
Line 10,614: Line 10,614:


The first module is simply a table defining the names of tokens which don't have an associated value.
The first module is simply a table defining the names of tokens which don't have an associated value.
<lang Lua>-- module token_name (in a file "token_name.lua")
<syntaxhighlight lang="lua">-- module token_name (in a file "token_name.lua")
local token_name = {
local token_name = {
['*'] = 'Op_multiply',
['*'] = 'Op_multiply',
Line 10,643: Line 10,643:
['putc'] = 'Keyword_putc',
['putc'] = 'Keyword_putc',
}
}
return token_name</lang>
return token_name</syntaxhighlight>


This module exports a function <i>find_token</i>, which attempts to find the next valid token from a specified position in a source line.
This module exports a function <i>find_token</i>, which attempts to find the next valid token from a specified position in a source line.
<lang Lua>-- module lpeg_token_finder
<syntaxhighlight lang="lua">-- module lpeg_token_finder
local M = {} -- only items added to M will be public (via 'return M' at end)
local M = {} -- only items added to M will be public (via 'return M' at end)
local table, concat = table, table.concat
local table, concat = table, table.concat
Line 10,729: Line 10,729:
end
end
return M</lang>
return M</syntaxhighlight>


The <i>lexer</i> module uses <i>finder.find_token</i> to produce an iterator over the tokens in a source.
The <i>lexer</i> module uses <i>finder.find_token</i> to produce an iterator over the tokens in a source.
<lang Lua>-- module lexer
<syntaxhighlight lang="lua">-- module lexer
local M = {} -- only items added to M will publicly available (via 'return M' at end)
local M = {} -- only items added to M will publicly available (via 'return M' at end)
local string, io, coroutine, yield = string, io, coroutine, coroutine.yield
local string, io, coroutine, yield = string, io, coroutine, coroutine.yield
Line 10,811: Line 10,811:
-- M._INTERNALS = _ENV
-- M._INTERNALS = _ENV
return M
return M
</syntaxhighlight>
</lang>


This script uses <i>lexer.tokenize_text</i> to show the token sequence produced from a source text.
This script uses <i>lexer.tokenize_text</i> to show the token sequence produced from a source text.


<lang Lua>lexer = require 'lexer'
<syntaxhighlight lang="lua">lexer = require 'lexer'
format, gsub = string.format, string.gsub
format, gsub = string.format, string.gsub


Line 10,853: Line 10,853:
-- etc.
-- etc.
end
end
</syntaxhighlight>
</lang>


===Using only standard libraries===
===Using only standard libraries===
This version replaces the <i>lpeg_token_finder</i> module of the LPeg version with this <i>basic_token_finder</i> module, altering the <i>require</i> expression near the top of the <i>lexer</i> module accordingly. Tested with Lua 5.3.5. (Note that <i>select</i> is a standard function as of Lua 5.2.)
This version replaces the <i>lpeg_token_finder</i> module of the LPeg version with this <i>basic_token_finder</i> module, altering the <i>require</i> expression near the top of the <i>lexer</i> module accordingly. Tested with Lua 5.3.5. (Note that <i>select</i> is a standard function as of Lua 5.2.)


<lang lua>-- module basic_token_finder
<syntaxhighlight lang="lua">-- module basic_token_finder
local M = {} -- only items added to M will be public (via 'return M' at end)
local M = {} -- only items added to M will be public (via 'return M' at end)
local table, string = table, string
local table, string = table, string
Line 10,988: Line 10,988:


-- M._ENV = _ENV
-- M._ENV = _ENV
return M</lang>
return M</syntaxhighlight>


=={{header|M2000 Interpreter}}==
=={{header|M2000 Interpreter}}==
<syntaxhighlight lang="m2000 interpreter">
<lang M2000 Interpreter>
Module lexical_analyzer {
Module lexical_analyzer {
a$={/*
a$={/*
Line 11,247: Line 11,247:
}
}
lexical_analyzer
lexical_analyzer
</syntaxhighlight>
</lang>


{{out}}
{{out}}
Line 11,292: Line 11,292:




<lang Mercury>% -*- mercury -*-
<syntaxhighlight lang="mercury">% -*- mercury -*-
%
%
% Compile with maybe something like:
% Compile with maybe something like:
Line 12,022: Line 12,022:


:- func eof = int is det.
:- func eof = int is det.
eof = -1.</lang>
eof = -1.</syntaxhighlight>


{{out}}
{{out}}
Line 12,071: Line 12,071:
Tested with Nim v0.19.4. Both examples are tested against all programs in [[Compiler/Sample programs]].
Tested with Nim v0.19.4. Both examples are tested against all programs in [[Compiler/Sample programs]].
===Using string with regular expressions===
===Using string with regular expressions===
<lang nim>
<syntaxhighlight lang="nim">
import re, strformat, strutils
import re, strformat, strutils


Line 12,263: Line 12,263:


echo input.tokenize.output
echo input.tokenize.output
</syntaxhighlight>
</lang>
===Using stream with lexer library===
===Using stream with lexer library===
<lang nim>
<syntaxhighlight lang="nim">
import lexbase, streams
import lexbase, streams
from strutils import Whitespace
from strutils import Whitespace
Line 12,576: Line 12,576:
echo &"({l.lineNumber},{l.getColNumber l.bufpos + 1}) {l.error}"
echo &"({l.lineNumber},{l.getColNumber l.bufpos + 1}) {l.error}"
main()
main()
</syntaxhighlight>
</lang>


===Using nothing but system and strutils===
===Using nothing but system and strutils===
<lang nim>import strutils
<syntaxhighlight lang="nim">import strutils


type
type
Line 12,799: Line 12,799:
stdout.write('\n')
stdout.write('\n')
if token.kind == tokEnd:
if token.kind == tokEnd:
break</lang>
break</syntaxhighlight>


=={{header|ObjectIcon}}==
=={{header|ObjectIcon}}==
Line 12,809: Line 12,809:




<lang ObjectIcon># -*- ObjectIcon -*-
<syntaxhighlight lang="objecticon"># -*- ObjectIcon -*-
#
#
# The Rosetta Code lexical analyzer in Object Icon. Based upon the ATS
# The Rosetta Code lexical analyzer in Object Icon. Based upon the ATS
Line 13,306: Line 13,306:
write!([FileStream.stderr] ||| args)
write!([FileStream.stderr] ||| args)
exit(1)
exit(1)
end</lang>
end</syntaxhighlight>




Line 13,354: Line 13,354:
(Much of the extra complication in the ATS comes from arrays being a linear type (whose "views" need tending), and from values of linear type having to be local to any function using them. This limitation could have been worked around, and arrays more similar to OCaml arrays could have been used, but at a cost in safety and efficiency.)
(Much of the extra complication in the ATS comes from arrays being a linear type (whose "views" need tending), and from values of linear type having to be local to any function using them. This limitation could have been worked around, and arrays more similar to OCaml arrays could have been used, but at a cost in safety and efficiency.)


<lang OCaml>(*------------------------------------------------------------------*)
<syntaxhighlight lang="ocaml">(*------------------------------------------------------------------*)
(* The Rosetta Code lexical analyzer, in OCaml. Based on the ATS. *)
(* The Rosetta Code lexical analyzer, in OCaml. Based on the ATS. *)


Line 13,881: Line 13,881:
main ()
main ()


(*------------------------------------------------------------------*)</lang>
(*------------------------------------------------------------------*)</syntaxhighlight>


{{out}}
{{out}}
Line 13,924: Line 13,924:
Note: we do not print the line and token source code position for the simplicity.
Note: we do not print the line and token source code position for the simplicity.


<lang scheme>
<syntaxhighlight lang="scheme">
(import (owl parse))
(import (owl parse))


Line 14,048: Line 14,048:
(if (null? (cdr stream))
(if (null? (cdr stream))
(print 'End_of_input))))
(print 'End_of_input))))
</syntaxhighlight>
</lang>


==== Testing ====
==== Testing ====


Testing function:
Testing function:
<lang scheme>
<syntaxhighlight lang="scheme">
(define (translate source)
(define (translate source)
(let ((stream (try-parse token-parser (str-iter source) #t)))
(let ((stream (try-parse token-parser (str-iter source) #t)))
Line 14,059: Line 14,059:
(if (null? (force (cdr stream)))
(if (null? (force (cdr stream)))
(print 'End_of_input))))
(print 'End_of_input))))
</syntaxhighlight>
</lang>


====== Testcase 1 ======
====== Testcase 1 ======


<lang scheme>
<syntaxhighlight lang="scheme">
(translate "
(translate "
/*
/*
Line 14,069: Line 14,069:
*/
*/
print(\"Hello, World!\\\\n\");
print(\"Hello, World!\\\\n\");
")</lang>
")</syntaxhighlight>
{{Out}}
{{Out}}
<pre>
<pre>
Line 14,082: Line 14,082:
====== Testcase 2 ======
====== Testcase 2 ======


<lang scheme>
<syntaxhighlight lang="scheme">
(translate "
(translate "
/*
/*
Line 14,089: Line 14,089:
phoenix_number = 142857;
phoenix_number = 142857;
print(phoenix_number, \"\\\\n\");
print(phoenix_number, \"\\\\n\");
")</lang>
")</syntaxhighlight>
{{Out}}
{{Out}}
<pre>
<pre>
Line 14,108: Line 14,108:
====== Testcase 3 ======
====== Testcase 3 ======


<lang scheme>
<syntaxhighlight lang="scheme">
(translate "
(translate "
/*
/*
Line 14,132: Line 14,132:
/* character literal */ '\\\\'
/* character literal */ '\\\\'
/* character literal */ ' '
/* character literal */ ' '
")</lang>
")</syntaxhighlight>
{{Out}}
{{Out}}
<pre>
<pre>
Line 14,173: Line 14,173:
====== Testcase 4 ======
====== Testcase 4 ======


<lang scheme>
<syntaxhighlight lang="scheme">
(translate "
(translate "
/*** test printing, embedded \\\\n and comments with lots of '*' ***/
/*** test printing, embedded \\\\n and comments with lots of '*' ***/
Line 14,180: Line 14,180:
print(\"Print a slash n - \\\\\\\\n.\\\\n\");
print(\"Print a slash n - \\\\\\\\n.\\\\n\");
")
")
</syntaxhighlight>
</lang>
{{Out}}
{{Out}}
<pre>
<pre>
Line 14,203: Line 14,203:
=={{header|Perl}}==
=={{header|Perl}}==


<lang perl>#!/usr/bin/env perl
<syntaxhighlight lang="perl">#!/usr/bin/env perl


use strict;
use strict;
Line 14,342: Line 14,342:
($line, $col)
($line, $col)
}
}
}</lang>
}</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 14,385: Line 14,385:
===Alternate Perl Solution===
===Alternate Perl Solution===
Tested on perl v5.26.1
Tested on perl v5.26.1
<lang Perl>#!/usr/bin/perl
<syntaxhighlight lang="perl">#!/usr/bin/perl


use strict; # lex.pl - source to tokens
use strict; # lex.pl - source to tokens
Line 14,421: Line 14,421:
1 + $` =~ tr/\n//, 1 + length $` =~ s/.*\n//sr, $^R;
1 + $` =~ tr/\n//, 1 + length $` =~ s/.*\n//sr, $^R;
}
}
printf "%5d %7d %s\n", 1 + tr/\n//, 1, 'End_of_input';</lang>
printf "%5d %7d %s\n", 1 + tr/\n//, 1, 'End_of_input';</syntaxhighlight>


=={{header|Phix}}==
=={{header|Phix}}==
Line 14,428: Line 14,428:
form. If required, demo\rosetta\Compiler\extra.e (below) contains some code that achieves the latter.
form. If required, demo\rosetta\Compiler\extra.e (below) contains some code that achieves the latter.
Code to print the human readable forms is likewise kept separate from any re-usable parts.
Code to print the human readable forms is likewise kept separate from any re-usable parts.
<!--<lang Phix>(phixonline)-->
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">--
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\core.e
-- demo\rosetta\Compiler\core.e
Line 14,588: Line 14,588:
<span style="color: #008080;">return</span> <span style="color: #000000;">s</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">s</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</lang>-->
<!--</syntaxhighlight>-->
For running under pwa/p2js, we also have a "fake file/io" component:
For running under pwa/p2js, we also have a "fake file/io" component:
<!--<lang Phix>(phixonline)-->
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">--
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\js_io.e
-- demo\rosetta\Compiler\js_io.e
Line 14,692: Line 14,692:
<span style="color: #008080;">return</span> <span style="color: #000000;">EOF</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">EOF</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</lang>-->
<!--</syntaxhighlight>-->
The main lexer is also written to be reusable by later stages.
The main lexer is also written to be reusable by later stages.
<!--<lang Phix>(phixonline)-->
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">--
<span style="color: #000080;font-style:italic;">--
-- demo\\rosetta\\Compiler\\lex.e
-- demo\\rosetta\\Compiler\\lex.e
Line 14,881: Line 14,881:
<span style="color: #008080;">return</span> <span style="color: #000000;">toks</span>
<span style="color: #008080;">return</span> <span style="color: #000000;">toks</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</lang>-->
<!--</syntaxhighlight>-->
Optional: if you need human-readable output/input at each (later) stage, so you can use pipes
Optional: if you need human-readable output/input at each (later) stage, so you can use pipes
<!--<lang Phix>-->
<!--<syntaxhighlight lang="phix">-->
<span style="color: #000080;font-style:italic;">--
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\extra.e
-- demo\rosetta\Compiler\extra.e
Line 14,936: Line 14,936:
<span style="color: #008080;">return</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">n_type</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">left</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">right</span><span style="color: #0000FF;">}</span>
<span style="color: #008080;">return</span> <span style="color: #0000FF;">{</span><span style="color: #000000;">n_type</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">left</span><span style="color: #0000FF;">,</span> <span style="color: #000000;">right</span><span style="color: #0000FF;">}</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<span style="color: #008080;">end</span> <span style="color: #008080;">function</span>
<!--</lang>-->
<!--</syntaxhighlight>-->
Finally, a simple test driver for the specific task:
Finally, a simple test driver for the specific task:
<!--<lang Phix>(phixonline)-->
<!--<syntaxhighlight lang="phix">(phixonline)-->
<span style="color: #000080;font-style:italic;">--
<span style="color: #000080;font-style:italic;">--
-- demo\rosetta\Compiler\lex.exw
-- demo\rosetta\Compiler\lex.exw
Line 14,966: Line 14,966:
<span style="color: #000080;font-style:italic;">--main(command_line())</span>
<span style="color: #000080;font-style:italic;">--main(command_line())</span>
<span style="color: #000000;">main</span><span style="color: #0000FF;">({</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"test4.c"</span><span style="color: #0000FF;">})</span>
<span style="color: #000000;">main</span><span style="color: #0000FF;">({</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #000000;">0</span><span style="color: #0000FF;">,</span><span style="color: #008000;">"test4.c"</span><span style="color: #0000FF;">})</span>
<!--</lang>-->
<!--</syntaxhighlight>-->
{{out}}
{{out}}
<pre>
<pre>
Line 14,989: Line 14,989:
=={{header|Prolog}}==
=={{header|Prolog}}==


<lang prolog>/*
<syntaxhighlight lang="prolog">/*
Test harness for the analyzer, not needed if we are actually using the output.
Test harness for the analyzer, not needed if we are actually using the output.
*/
*/
Line 15,149: Line 15,149:


% anything else is an error
% anything else is an error
tok(_,_,L,P) --> { format(atom(Error), 'Invalid token at line ~d,~d', [L,P]), throw(Error) }.</lang>
tok(_,_,L,P) --> { format(atom(Error), 'Invalid token at line ~d,~d', [L,P]), throw(Error) }.</syntaxhighlight>
{{out}}
{{out}}
<pre>
<pre>
Line 15,190: Line 15,190:
=={{header|Python}}==
=={{header|Python}}==
Tested with Python 2.7 and 3.x
Tested with Python 2.7 and 3.x
<lang Python>from __future__ import print_function
<syntaxhighlight lang="python">from __future__ import print_function
import sys
import sys


Line 15,371: Line 15,371:


if tok == tk_EOI:
if tok == tk_EOI:
break</lang>
break</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 15,415: Line 15,415:
=={{header|QB64}}==
=={{header|QB64}}==
Tested with QB64 1.5
Tested with QB64 1.5
<lang vb>dim shared source as string, the_ch as string, tok as string, toktyp as string
<syntaxhighlight lang="vb">dim shared source as string, the_ch as string, tok as string, toktyp as string
dim shared line_n as integer, col_n as integer, text_p as integer, err_line as integer, err_col as integer, errors as integer
dim shared line_n as integer, col_n as integer, text_p as integer, err_line as integer, err_col as integer, errors as integer


Line 15,655: Line 15,655:
end
end
end sub
end sub
</syntaxhighlight>
</lang>
{{out|case=test case 3}}
{{out|case=test case 3}}
<b>
<b>
Line 15,695: Line 15,695:


=={{header|Racket}}==
=={{header|Racket}}==
<lang racket>
<syntaxhighlight lang="racket">
#lang racket
#lang racket
(require parser-tools/lex)
(require parser-tools/lex)
Line 15,851: Line 15,851:
"TEST 5"
"TEST 5"
(display-tokens (string->tokens test5))
(display-tokens (string->tokens test5))
</syntaxhighlight>
</lang>


=={{header|Raku}}==
=={{header|Raku}}==
Line 15,861: Line 15,861:
{{works with|Rakudo|2016.08}}
{{works with|Rakudo|2016.08}}


<lang perl6>grammar tiny_C {
<syntaxhighlight lang="raku" line>grammar tiny_C {
rule TOP { ^ <.whitespace>? <tokens> + % <.whitespace> <.whitespace> <eoi> }
rule TOP { ^ <.whitespace>? <tokens> + % <.whitespace> <.whitespace> <eoi> }


Line 15,954: Line 15,954:


my $tokenizer = tiny_C.parse(@*ARGS[0].IO.slurp);
my $tokenizer = tiny_C.parse(@*ARGS[0].IO.slurp);
parse_it( $tokenizer );</lang>
parse_it( $tokenizer );</syntaxhighlight>


{{out|case=test case 3}}
{{out|case=test case 3}}
Line 16,000: Line 16,000:




<lang ratfor>######################################################################
<syntaxhighlight lang="ratfor">######################################################################
#
#
# The Rosetta Code scanner in Ratfor 77.
# The Rosetta Code scanner in Ratfor 77.
Line 17,230: Line 17,230:
end
end


######################################################################</lang>
######################################################################</syntaxhighlight>




Line 17,336: Line 17,336:
The following code implements a configurable (from a symbol map and keyword map provided as parameters) lexical analyzer.
The following code implements a configurable (from a symbol map and keyword map provided as parameters) lexical analyzer.


<lang scala>
<syntaxhighlight lang="scala">
package xyz.hyperreal.rosettacodeCompiler
package xyz.hyperreal.rosettacodeCompiler


Line 17,597: Line 17,597:


}
}
</syntaxhighlight>
</lang>


=={{header|Scheme}}==
=={{header|Scheme}}==


<lang scheme>
<syntaxhighlight lang="scheme">
(import (scheme base)
(import (scheme base)
(scheme char)
(scheme char)
Line 17,798: Line 17,798:
(display-tokens (lexer (cadr (command-line))))
(display-tokens (lexer (cadr (command-line))))
(display "Error: provide program filename\n"))
(display "Error: provide program filename\n"))
</syntaxhighlight>
</lang>


{{out}}
{{out}}
Line 17,816: Line 17,816:




<lang SML>(*------------------------------------------------------------------*)
<syntaxhighlight lang="sml">(*------------------------------------------------------------------*)
(* The Rosetta Code lexical analyzer, in Standard ML. Based on the ATS
(* The Rosetta Code lexical analyzer, in Standard ML. Based on the ATS
and the OCaml. The intended compiler is Mlton or Poly/ML; there is
and the OCaml. The intended compiler is Mlton or Poly/ML; there is
Line 18,622: Line 18,622:
(* sml-indent-args: 2 *)
(* sml-indent-args: 2 *)
(* end: *)
(* end: *)
(*------------------------------------------------------------------*)</lang>
(*------------------------------------------------------------------*)</syntaxhighlight>




Line 18,676: Line 18,676:
{{libheader|Wren-fmt}}
{{libheader|Wren-fmt}}
{{libheader|Wren-ioutil}}
{{libheader|Wren-ioutil}}
<lang ecmascript>import "/dynamic" for Enum, Struct, Tuple
<syntaxhighlight lang="ecmascript">import "/dynamic" for Enum, Struct, Tuple
import "/str" for Char
import "/str" for Char
import "/fmt" for Fmt
import "/fmt" for Fmt
Line 19,025: Line 19,025:
lineCount = lines.count
lineCount = lines.count
initLex.call()
initLex.call()
process.call()</lang>
process.call()</syntaxhighlight>


{{out}}
{{out}}
Line 19,067: Line 19,067:


=={{header|Zig}}==
=={{header|Zig}}==
<lang zig>
<syntaxhighlight lang="zig">
const std = @import("std");
const std = @import("std");


Line 19,476: Line 19,476:
return result.items;
return result.items;
}
}
</syntaxhighlight>
</lang>