Jump to content

User:Thundergnat: Difference between revisions

→‎Language Stats:: update stats table
(→‎Language Stats:: an apology)
(→‎Language Stats:: update stats table)
Line 38:
Note that this is purely for my own entertainment and curiosity. I strenuously want to avoid people making efforts to optimize for one metric or another. Many of these are heavily influenced by a particular stylistic choice made by a prolific entry author. None of these are good or bad. Just observations.
 
Comments aren't filtered out either, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.
Several things influence different metrics, custom syntax highlighting for instance. Only one language does this in any meaningful way right now but it is '''all in'''. The markup isn't filtered out though so it skews the numbers to almost complete unusefulness.
 
Comments aren't filtered out either, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.
 
How the various metrics are calculated / what they mean. Averages are the sum of all of that item in every entry divided by the total number of tasks with an entry, rounded to the nearest integer or percentage.
Line 51 ⟶ 49:
* Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
* Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
* Syntax highlighting - What syntax highlighter does it use in order from most to least common. (Note that some variation is expected, especially for tasks like [[Call a foreign-language function]] and, [[Rosetta Code/Find bare lang tags]] or probably the largest source of oddball <nowiki><lang *></nowiki> tags: [[Rosetta Code/Fix_code_tags]].)
 
{|class="wikitable sortable"
|+ As of 2022-02-0813
!Language!!Task<br>Count!!Avg. #<br>Lines / Entry!!Avg. #<br>Characters!!Avg. %<br>White space!!Avg. %<br>Alphanumerics!!Avg. %Non<br>Alphanumerics!!Avg. %<br>Non-ASCII!!Syntax <br>highlighting
|-
||Phix||15031507||5553||102921695||932.969 %||6978.8535 %||3021.1565 %||0.03 %||Phix html5 csharp c r PL/I Phi phix csharp javascript
|-
||Wren||14931499||55||15751578||31.5956 %||75.7173 %||24.2927 %||0.1 %||ecmascript c go javascript C foo bashbar xml pythonbash barpython baz
|-
||Julia||14741479||33||978976||27.82 %||77.111 %||22.899 %||0.27 %||julia Julia python ruby html5 pythonlua Julianhtml cpp html xml lb lua k
|-
||Raku||14561461||30||887885||28.87 %||69.31 %||30.69 %||0.57 %||perl6 c C xmlc shell xml html XML bash
|-
||Go||14541455||77||18291821||34.442 %||75.2927 %||24.7173 %||0.11 %||go Go unicon html bash xml c IWBASICxml thrift proto Unicon html5
|-
||Perl||14251428||34||921920||27.0304 %||68.1918 %||31.8182 %||0.1 %||perl Perl bash perl6 Shell chtml5 latex html5 pythonc
|-
||Nim||1398||4948||13381326||24.6265 %||75.5552 %||24.4548 %||1.1312 %||Nim nim python Python ruby c C cNimrod $1 xml Nimrod
|-
||Python||13271343||7271||20161996||30.772 %||75.4847 %||24.5253 %||0.04 %||python Python bash html5 cmd c cmd rubypython3 xml ebnf haskell python3 Shell
|-
||C||1193||78||18421841||28.64 %||72.56 %||27.44 %||0.01 %||c C cpp bash XML csharpgo html5 shShell perl dhtml5 makesh Assembly gomake Shelld
|-
||REXX||11431144||55||33613358||40.33 %||68.44 %||31.56 %||8.48 %||rexx REXX Rexx ring rwxx resscobol sh cobol
|-
||Kotlin||1131||47||1363||33.21 %||77.62 %||22.38 %||0.04 %||scala kotlin Kotlin C c java HTML5 javahtml5 schemeGroovy groovy html5scheme xml Groovy
|-
||Java||11231124||66||20692062||33.3941 %||78.5956 %||21.4144 %||0.01 %||java java5 Java Java5 c xml bash html5 xml foo make javacmd 12Java8 bar cmdjava shell12 Java8shell
|-
||Haskell||1110||4644||13891347||28.3234 %||76.9192 %||23.0908 %||0.08 %||haskell Haskell uniconbash Unicontext Iconhaksell icon bashc html5 C sh haksell text c
|-
||MathematicaJ||11061109||1126||409745||1726.5282 %||7069.7249 %||2930.2851 %||06.0445 %||Mathematicaj mathematicaJ bash sh c foo wolfram%s barfxml barhtml5 bazmake "~~bnf x~~"C shbar MathemticaSNUSP baz
|-
||JMathematica||10941106||2611||738409||2617.8552 %||6970.4672 %||3029.5428 %||60.4404 %||jMathematica J bash shmathematica foo c %s make bnf baz C"~~ SNUSPx~~" bar xmlbarf html5sh
|-
||Racket||1090||33||11161117||26.47 %||75.0405 %||24.9695 %||0.29 %||racket Racket scheme bash xml html5 rexx "Racket"C cmd bash C
|-
||Ruby||1087||32||824||26.23 %||76.49 %||23.51 %||0.06 %||ruby Ruby bash html5 c foo html5 rust baz tcl bar rust
|-
||C++||10691070||73||19871990||29.2523 %||72.6668 %||27.3432 %||0.02 %||cpp Cpp C++ c CPP c++ sh html5 textmake asm cmake text bash d make C asm
|-
||zkl||1011||19||688||17.98 %||69.69 %||30.31 %||0.03 %||zkl c bash r html5 csharp zkl"html5
|}
 
 
Second pass through after a lot of minor patches to the site and custom syntax highlighting filtering added. Should be a lot <strike>more accurate</strike> less inaccurate now. Numbers probably still don't ''mean'' anything, but they aren't quite as large outright whoppers. J numbers are still way overstated due to the very common decision to just include the output inside the language tags rather than in a separate output section. Not really sure what if anything to do about it (if anything.) I '''''really''''' don't want to take on trying to untangle that mess. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 21:29, 13 February 2022 (UTC)
 
 
<div style="padding:1em;background:#eeeeff;"><h3>Older commentary</h3>
Some observations: Phix numbers are completely bogus due to the custom syntax highlighting code polluting every entry. Eventually I'll look into filtering. This is a very preliminary first whack at it.
I expected Raku to have a higher percent of non-ASCII characters and was very surprised by J and REXX having so much. On closer investigation, J and REXX entries make heavy use of box line drawing characters... which aren't ASCII. Syntax highlighting directives are all over the place. Case doesn't matter but spelling nominally does. Though, to be fair, most of the syntax highlighting are very minor variations, so getting it wrong probably doesn't change much. There are a whole bunch of obvious typos in there too though. Sigh. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]), 08 February
 
:Sorry about the Phix syntax highlighting mess. I'd love to use standard Geshi, and in fact do on my own site, but waited 6 years and nothing happened. In a lovely world there would be a special page on RC containing the geshi highlighting files for all languages, that anyone could edit in the usual way with the usual single-click to undo any vandalism, and periodically (bi-annual would be plenty) some admin pushes updates into the geshi dir. I guess you could even have a <nowiki><lang PhixRC></nowiki> mechanism whereby "release candidate" geshi updates could be tried out on selected pages without risking damaging the whole site. Alternatively it should actually be fairly straightforward for a man of your talents to html-strip the Phix entries (just sayin), and/or I'm not above being tasked to go and clean up my own mess, which one day I still hope to be able to do (trust me, I too despise my own minor updates showing up as complete gobbledeygook). I also wonder if it would be at all useful for the syntax highlighting column to contain (first/random) links to offending pages, so I could find out what page it thinks is using say "PL/1" or "Phi"? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 10:04, 10 February 2022 (UTC)</div>
10,333

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.