Talk:Compiler/lexical analyzer: Difference between revisions

(→‎Simple benchmark: new section)
Line 246:
 
: Thanks! :) I tried to be clever and dynamically construct a single regex (with one branch per token) to act as the scanner, since it's safe to assume that the Perl regex engine is more bug-free and better optimized than a <code>substr</code>-based scanner that I could have written by hand. But then I realized that there's no easy way to get the line and column number of a regex match, so I had to scan and accumulate those separately, which introduced overhead again. I wonder if the approach was still worth it, performance-wise. Not that a solution in an interpreted language like Perl could ever compete with the C solution, but it might be interesting to benchmark it against the Python solution for large input files... --[[User:Smls|Smls]] ([[User talk:Smls|talk]]) 17:06, 18 August 2016 (UTC)
 
== Simple benchmark ==
 
I ran some simple benchmarks, using a source file consisting of the following two programs, repeated over and over, until I got to 248,880 lines.
 
<lang c>
count = 1;
n = 1;
limit = 100;
while (n < limit) {
k=3;
p=1;
n=n+2;
while ((k*k<=n) && (p)) {
p=n/k*k!=n;
k=k+2;
}
if (p) {
print(n, " ");
count = count + 1;
}
}
print(count, "\n");
 
{
/*
This is an integer ascii Mandelbrot generator
*/
left_edge = -420;
right_edge = 300;
top_edge = 300;
bottom_edge = -300;
x_step = 7;
y_step = 15;
 
max_iter = 200;
 
y0 = top_edge;
while (y0 > bottom_edge) {
x0 = left_edge;
while (x0 < right_edge) {
y = 0;
x = 0;
the_char = ' ';
i = 0;
while (i < max_iter) {
x_x = (x * x) / 200;
y_y = (y * y) / 200;
if (x_x + y_y > 800 ) {
the_char = '0' + i;
if (i > 9) {
the_char = '@';
}
i = max_iter;
}
y = x * y / 100 + y0;
x = x_x - y_y + x0;
i = i + 1;
}
putc(the_char);
x0 = x0 + x_step;
}
putc('\n');
y0 = y0 - y_step;
}
}
</lang>
 
I ran them as follows:
 
timer python lex.py big.t >foo.bar
 
So the startup time for Python, Perl, and Euphoria is also included in the timings.
 
All the output files were 1,101,601 lines in length.
 
I ran each test 3 times, and took the shortest run.
 
Here are the specs for my machine:
 
Windows 7, Service Pack 1, 64-bit
 
Intel Core i7-3720QM CPU @2.60GHz
 
16.0 GB (15.9 usable)
 
 
{| class="wikitable"
|-
! Processor !! Time
|-
| C(1) || 1.08
|-
| Flex || 1.13
|-
| C || 1.34
|-
| Euphoria || 4.15
|-
| Perl || 8.36
|-
| Python || 9.24
|}
 
(1) I swapped out getc(fp) with _fgetc_nolock(fp), and added setvbuf(fp, NULL, _IOFBF, 1024*1024).
 
To me, the Euphoria, Perl and Python times are '''very''' impressive.
155

edits