Jump to content

User:Thundergnat: Difference between revisions

Language stats
m (Dual license any of my code contributions)
(Language stats)
Line 26:
;*[https://github.com/thundergnat/rc/tree/master Offsite repository of resource files for various tasks]
==Language Stats:==
As part of my reports I generate weekly (listed above) I have started to collect some statistics on languages that have a large number of entries. (For now, the cut off is 1000.) A sufficiently large number of entries should average out outliers and give at least an ''reasonable'' overview of how different languages stack up.
Note that this is purely for my own entertainment and curiosity. I strenuously want to avoid people making efforts to optimize for one metric or another. Many of these are heavily influenced by a particular stylistic choice made by a prolific entry author. None of these are good or bad. Just observations.
Several things influence different metrics, custom syntax highlighting for instance. Only one language does this in any meaningful way right now but it is '''all in'''. The markup isn't filtered out though so it skews the numbers to almost complete unusefulness.
Comments aren't filtered out either, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.
How the various metrics are calculated / what they mean. Averages are the sum of all of that item in every entry divided by the total number of tasks with an entry, rounded to the nearest integer or percentage.
* Task count - The number of tasks for which there is an entry in that language.
* Average lines per entry - The average number of lines in code blocks per entry. Multiple versions under the same task all count as lines per task. (If there are four versions and each has 25 lines, it counts as 100 for that task.)
* Average number of character - Average number of characters inside <nowiki><lang *></lang></nowiki> blocks per task including whitespace.
* Average whitespace - How many of the above characters are white space? (Including new line characters.)
* Average percent alpha-numerics - What percentage of the non-white space characters are alphabetic or numeric?
* Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric?
* Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
* Syntax highlighting - What syntax highlighter does it use in order from most to least common. (Note that some variation is expected, especially for tasks like [[Call a foreign-language function]] and [[Rosetta Code/Find bare lang tags]].)
{|class="wikitable sortable"
|+ As of 2022-02-08
!Language!!Task<br>Count!!Avg. #<br>Lines / Entry!!Avg. #<br>Characters!!Avg. %<br>White space!!Avg. %<br>Alphanumerics!!Avg. %Non<br>Alphanumerics!!Avg. %<br>Non-ASCII!!Syntax highlighting
||Phix||1503||55||10292||9.9 %||69.85 %||30.15 %||0 %||Phix html5 c r PL/I Phi phix csharp javascript
||Wren||1493||55||1575||31.59 %||75.71 %||24.29 %||0.1 %||ecmascript c go javascript C foo bash xml python bar baz
||Julia||1474||33||978||27.82 %||77.11 %||22.89 %||0.27 %||julia Julia ruby html5 python Julian cpp html xml lb lua k
||Raku||1456||30||887||28.87 %||69.31 %||30.69 %||0.57 %||perl6 c C xml shell html XML bash
||Go||1454||77||1829||34.4 %||75.29 %||24.71 %||0.11 %||go Go unicon html bash xml c IWBASIC thrift proto Unicon html5
||Perl||1425||34||921||27.03 %||68.19 %||31.81 %||0.1 %||perl Perl bash perl6 Shell c latex html5 python
||Nim||1398||49||1338||24.62 %||75.55 %||24.45 %||1.13 %||Nim nim python Python ruby C c $1 xml Nimrod
||Python||1327||72||2016||30.7 %||75.48 %||24.52 %||0.04 %||python Python bash html5 c cmd ruby xml ebnf haskell python3 Shell
||C||1193||78||1842||28.64 %||72.56 %||27.44 %||0.01 %||c C cpp bash XML csharp html5 sh perl d make Assembly go Shell
||REXX||1143||55||3361||40.33 %||68.44 %||31.56 %||8.48 %||rexx REXX Rexx ring rwxx ress sh cobol
||Kotlin||1131||47||1363||33.21 %||77.62 %||22.38 %||0.04 %||scala kotlin Kotlin C c HTML5 java scheme groovy html5 xml Groovy
||Java||1123||66||2069||33.39 %||78.59 %||21.41 %||0.01 %||java java5 Java Java5 c bash html5 xml foo make java 12 bar cmd shell Java8
||Haskell||1110||46||1389||28.32 %||76.91 %||23.09 %||0.08 %||haskell Haskell unicon Unicon Icon icon bash html5 C sh haksell text c
||Mathematica||1106||11||409||17.52 %||70.72 %||29.28 %||0.04 %||Mathematica mathematica foo wolfram barf bar baz "~~ x~~" sh Mathemtica
||J||1094||26||738||26.85 %||69.46 %||30.54 %||6.44 %||j J bash sh foo c %s make bnf baz C SNUSP bar xml html5
||Racket||1090||33||1116||26.47 %||75.04 %||24.96 %||0.29 %||racket Racket scheme xml html5 rexx "Racket" cmd bash C
||Ruby||1087||32||824||26.23 %||76.49 %||23.51 %||0.06 %||ruby Ruby bash c foo html5 rust baz tcl bar
||C++||1069||73||1987||29.25 %||72.66 %||27.34 %||0.02 %||cpp Cpp C++ c CPP c++ sh html5 text cmake bash d make C asm
||zkl||1011||19||688||17.98 %||69.69 %||30.31 %||0.03 %||zkl c bash r html5 csharp zkl"
Some observations: Phix numbers are completely bogus due to the custom syntax highlighting code polluting every entry. Eventually I'll look into filtering. This is a very preliminary first whack at it.
I expected Raku to have a higher percent of non-ASCII characters and was very surprised by J and REXX having so much. On closer investigation, J and REXX entries make heavy use of box line drawing characters... which aren't ASCII. Syntax highlighting directives are all over the place. Case doesn't matter but spelling nominally does. Though, to be fair, most of the syntax highlighting are very minor variations, so getting it wrong probably doesn't change much. There are a whole bunch of obvious typos in there too though. Sigh.


Cookies help us deliver our services. By using our services, you agree to our use of cookies.