User:Thundergnat: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Language Stats:: an apology)
(→‎Language Stats:: update stats table)
Line 38: Line 38:
Note that this is purely for my own entertainment and curiosity. I strenuously want to avoid people making efforts to optimize for one metric or another. Many of these are heavily influenced by a particular stylistic choice made by a prolific entry author. None of these are good or bad. Just observations.
Note that this is purely for my own entertainment and curiosity. I strenuously want to avoid people making efforts to optimize for one metric or another. Many of these are heavily influenced by a particular stylistic choice made by a prolific entry author. None of these are good or bad. Just observations.


Comments aren't filtered out, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.
Several things influence different metrics, custom syntax highlighting for instance. Only one language does this in any meaningful way right now but it is '''all in'''. The markup isn't filtered out though so it skews the numbers to almost complete unusefulness.

Comments aren't filtered out either, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.


How the various metrics are calculated / what they mean. Averages are the sum of all of that item in every entry divided by the total number of tasks with an entry, rounded to the nearest integer or percentage.
How the various metrics are calculated / what they mean. Averages are the sum of all of that item in every entry divided by the total number of tasks with an entry, rounded to the nearest integer or percentage.
Line 51: Line 49:
* Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
* Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
* Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
* Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
* Syntax highlighting - What syntax highlighter does it use in order from most to least common. (Note that some variation is expected, especially for tasks like [[Call a foreign-language function]] and [[Rosetta Code/Find bare lang tags]].)
* Syntax highlighting - What syntax highlighter does it use in order from most to least common. (Note that some variation is expected, especially for tasks like [[Call a foreign-language function]], [[Rosetta Code/Find bare lang tags]] or probably the largest source of oddball <nowiki><lang *></nowiki> tags: [[Rosetta Code/Fix_code_tags]].)


{|class="wikitable sortable"
{|class="wikitable sortable"
|+ As of 2022-02-08
|+ As of 2022-02-13
!Language!!Task<br>Count!!Avg. #<br>Lines / Entry!!Avg. #<br>Characters!!Avg. %<br>White space!!Avg. %<br>Alphanumerics!!Avg. %Non<br>Alphanumerics!!Avg. %<br>Non-ASCII!!Syntax highlighting
!Language!!Task<br>Count!!Avg. #<br>Lines / Entry!!Avg. #<br>Characters!!Avg. %<br>White space!!Avg. %<br>Alphanumerics!!Avg. %Non<br>Alphanumerics!!Avg. %<br>Non-ASCII!!Syntax<br>highlighting
|-
|-
||Phix||1503||55||10292||9.9 %||69.85 %||30.15 %||0 %||Phix html5 c r PL/I Phi phix csharp javascript
||Phix||1507||53||1695||32.69 %||78.35 %||21.65 %||0.03 %||Phix html5 csharp c r phix javascript
|-
|-
||Wren||1493||55||1575||31.59 %||75.71 %||24.29 %||0.1 %||ecmascript c go javascript C foo bash xml python bar baz
||Wren||1499||55||1578||31.56 %||75.73 %||24.27 %||0.1 %||ecmascript c go javascript C foo bar xml bash python baz
|-
|-
||Julia||1474||33||978||27.82 %||77.11 %||22.89 %||0.27 %||julia Julia ruby html5 python Julian cpp html xml lb lua k
||Julia||1479||33||976||27.82 %||77.1 %||22.9 %||0.27 %||julia Julia python ruby html5 lua html cpp xml
|-
|-
||Raku||1456||30||887||28.87 %||69.31 %||30.69 %||0.57 %||perl6 c C xml shell html XML bash
||Raku||1461||30||885||28.87 %||69.31 %||30.69 %||0.57 %||perl6 C c shell xml html XML bash
|-
|-
||Go||1454||77||1829||34.4 %||75.29 %||24.71 %||0.11 %||go Go unicon html bash xml c IWBASIC thrift proto Unicon html5
||Go||1455||77||1821||34.42 %||75.27 %||24.73 %||0.11 %||go Go html bash c xml thrift proto html5
|-
|-
||Perl||1425||34||921||27.03 %||68.19 %||31.81 %||0.1 %||perl Perl bash perl6 Shell c latex html5 python
||Perl||1428||34||920||27.04 %||68.18 %||31.82 %||0.1 %||perl Perl bash Shell html5 latex c
|-
|-
||Nim||1398||49||1338||24.62 %||75.55 %||24.45 %||1.13 %||Nim nim python Python ruby C c $1 xml Nimrod
||Nim||1398||48||1326||24.65 %||75.52 %||24.48 %||1.12 %||Nim nim python Python ruby c C Nimrod $1 xml
|-
|-
||Python||1327||72||2016||30.7 %||75.48 %||24.52 %||0.04 %||python Python bash html5 c cmd ruby xml ebnf haskell python3 Shell
||Python||1343||71||1996||30.72 %||75.47 %||24.53 %||0.04 %||python Python bash html5 cmd c python3 xml ebnf Shell
|-
|-
||C||1193||78||1842||28.64 %||72.56 %||27.44 %||0.01 %||c C cpp bash XML csharp html5 sh perl d make Assembly go Shell
||C||1193||78||1841||28.64 %||72.56 %||27.44 %||0.01 %||c C cpp bash XML go Shell perl html5 sh Assembly make d
|-
|-
||REXX||1143||55||3361||40.33 %||68.44 %||31.56 %||8.48 %||rexx REXX Rexx ring rwxx ress sh cobol
||REXX||1144||55||3358||40.33 %||68.44 %||31.56 %||8.48 %||rexx REXX Rexx cobol sh
|-
|-
||Kotlin||1131||47||1363||33.21 %||77.62 %||22.38 %||0.04 %||scala kotlin Kotlin C c HTML5 java scheme groovy html5 xml Groovy
||Kotlin||1131||47||1363||33.21 %||77.62 %||22.38 %||0.04 %||scala kotlin Kotlin C c java HTML5 html5 Groovy groovy scheme xml
|-
|-
||Java||1123||66||2069||33.39 %||78.59 %||21.41 %||0.01 %||java java5 Java Java5 c bash html5 xml foo make java 12 bar cmd shell Java8
||Java||1124||66||2062||33.41 %||78.56 %||21.44 %||0.01 %||java java5 Java Java5 c xml bash html5 foo make cmd Java8 bar java 12 shell
|-
|-
||Haskell||1110||46||1389||28.32 %||76.91 %||23.09 %||0.08 %||haskell Haskell unicon Unicon Icon icon bash html5 C sh haksell text c
||Haskell||1110||44||1347||28.34 %||76.92 %||23.08 %||0.08 %||haskell Haskell bash text haksell c html5 sh
|-
|-
||Mathematica||1106||11||409||17.52 %||70.72 %||29.28 %||0.04 %||Mathematica mathematica foo wolfram barf bar baz "~~ x~~" sh Mathemtica
||J||1109||26||745||26.82 %||69.49 %||30.51 %||6.45 %||j J bash sh c foo %s xml html5 make bnf C bar SNUSP baz
|-
|-
||J||1094||26||738||26.85 %||69.46 %||30.54 %||6.44 %||j J bash sh foo c %s make bnf baz C SNUSP bar xml html5
||Mathematica||1106||11||409||17.52 %||70.72 %||29.28 %||0.04 %||Mathematica mathematica foo baz "~~ x~~" bar barf sh
|-
|-
||Racket||1090||33||1116||26.47 %||75.04 %||24.96 %||0.29 %||racket Racket scheme xml html5 rexx "Racket" cmd bash C
||Racket||1090||33||1117||26.47 %||75.05 %||24.95 %||0.29 %||racket Racket scheme bash xml html5 C cmd
|-
|-
||Ruby||1087||32||824||26.23 %||76.49 %||23.51 %||0.06 %||ruby Ruby bash c foo html5 rust baz tcl bar
||Ruby||1087||32||824||26.23 %||76.49 %||23.51 %||0.06 %||ruby Ruby bash html5 c foo baz tcl bar rust
|-
|-
||C++||1069||73||1987||29.25 %||72.66 %||27.34 %||0.02 %||cpp Cpp C++ c CPP c++ sh html5 text cmake bash d make C asm
||C++||1070||73||1990||29.23 %||72.68 %||27.32 %||0.02 %||cpp Cpp C++ c CPP c++ sh html5 make asm cmake text bash d C
|-
|-
||zkl||1011||19||688||17.98 %||69.69 %||30.31 %||0.03 %||zkl c bash r html5 csharp zkl"
||zkl||1011||19||688||17.98 %||69.69 %||30.31 %||0.03 %||zkl c bash r csharp html5
|}
|}




Second pass through after a lot of minor patches to the site and custom syntax highlighting filtering added. Should be a lot <strike>more accurate</strike> less inaccurate now. Numbers probably still don't ''mean'' anything, but they aren't quite as large outright whoppers. J numbers are still way overstated due to the very common decision to just include the output inside the language tags rather than in a separate output section. Not really sure what if anything to do about it (if anything.) I '''''really''''' don't want to take on trying to untangle that mess. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]) 21:29, 13 February 2022 (UTC)


<div style="padding:1em;background:#eeeeff;"><h3>Older commentary</h3>
Some observations: Phix numbers are completely bogus due to the custom syntax highlighting code polluting every entry. Eventually I'll look into filtering. This is a very preliminary first whack at it.
Some observations: Phix numbers are completely bogus due to the custom syntax highlighting code polluting every entry. Eventually I'll look into filtering. This is a very preliminary first whack at it.
I expected Raku to have a higher percent of non-ASCII characters and was very surprised by J and REXX having so much. On closer investigation, J and REXX entries make heavy use of box line drawing characters... which aren't ASCII. Syntax highlighting directives are all over the place. Case doesn't matter but spelling nominally does. Though, to be fair, most of the syntax highlighting are very minor variations, so getting it wrong probably doesn't change much. There are a whole bunch of obvious typos in there too though. Sigh.
I expected Raku to have a higher percent of non-ASCII characters and was very surprised by J and REXX having so much. On closer investigation, J and REXX entries make heavy use of box line drawing characters... which aren't ASCII. Syntax highlighting directives are all over the place. Case doesn't matter but spelling nominally does. Though, to be fair, most of the syntax highlighting are very minor variations, so getting it wrong probably doesn't change much. There are a whole bunch of obvious typos in there too though. Sigh. --[[User:Thundergnat|Thundergnat]] ([[User talk:Thundergnat|talk]]), 08 February


:Sorry about the Phix syntax highlighting mess. I'd love to use standard Geshi, and in fact do on my own site, but waited 6 years and nothing happened. In a lovely world there would be a special page on RC containing the geshi highlighting files for all languages, that anyone could edit in the usual way with the usual single-click to undo any vandalism, and periodically (bi-annual would be plenty) some admin pushes updates into the geshi dir. I guess you could even have a <nowiki><lang PhixRC></nowiki> mechanism whereby "release candidate" geshi updates could be tried out on selected pages without risking damaging the whole site. Alternatively it should actually be fairly straightforward for a man of your talents to html-strip the Phix entries (just sayin), and/or I'm not above being tasked to go and clean up my own mess, which one day I still hope to be able to do (trust me, I too despise my own minor updates showing up as complete gobbledeygook). I also wonder if it would be at all useful for the syntax highlighting column to contain (first/random) links to offending pages, so I could find out what page it thinks is using say "PL/1" or "Phi"? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 10:04, 10 February 2022 (UTC)
:Sorry about the Phix syntax highlighting mess. I'd love to use standard Geshi, and in fact do on my own site, but waited 6 years and nothing happened. In a lovely world there would be a special page on RC containing the geshi highlighting files for all languages, that anyone could edit in the usual way with the usual single-click to undo any vandalism, and periodically (bi-annual would be plenty) some admin pushes updates into the geshi dir. I guess you could even have a <nowiki><lang PhixRC></nowiki> mechanism whereby "release candidate" geshi updates could be tried out on selected pages without risking damaging the whole site. Alternatively it should actually be fairly straightforward for a man of your talents to html-strip the Phix entries (just sayin), and/or I'm not above being tasked to go and clean up my own mess, which one day I still hope to be able to do (trust me, I too despise my own minor updates showing up as complete gobbledeygook). I also wonder if it would be at all useful for the syntax highlighting column to contain (first/random) links to offending pages, so I could find out what page it thinks is using say "PL/1" or "Phi"? --[[User:Petelomax|Pete Lomax]] ([[User talk:Petelomax|talk]]) 10:04, 10 February 2022 (UTC)</div>

Revision as of 21:29, 13 February 2022

My Favorite Languages
Language Proficiency
Perl Moderately Proficient
Perl 6 Still seems like I use it all the time
Raku Use it all the time

License:

Any code which I have submitted to Rosettacode may be used under the Unlicense.

I would appreciate that any use includes a link back to the Rosettacode page from which it was obtained (but obviously would have no way to enforce that.)


Reports:


Utilities:

Tampermonkey or Greasemonkey javascript applets


Resources:


Language Stats:

As part of my reports I generate weekly (listed above) I have started to collect some statistics on languages that have a large number of entries. (For now, the cut off is 1000.) A sufficiently large number of entries should average out outliers and give at least an reasonable overview of how different languages stack up.

Note that this is purely for my own entertainment and curiosity. I strenuously want to avoid people making efforts to optimize for one metric or another. Many of these are heavily influenced by a particular stylistic choice made by a prolific entry author. None of these are good or bad. Just observations.

Comments aren't filtered out, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.

How the various metrics are calculated / what they mean. Averages are the sum of all of that item in every entry divided by the total number of tasks with an entry, rounded to the nearest integer or percentage.

  • Task count - The number of tasks for which there is an entry in that language.
  • Average lines per entry - The average number of lines in code blocks per entry. Multiple versions under the same task all count as lines per task. (If there are four versions and each has 25 lines, it counts as 100 for that task.)
  • Average number of character - Average number of characters inside <lang *></lang> blocks per task including whitespace.
  • Average whitespace - How many of the above characters are white space? (Including new line characters.)
  • Average percent alpha-numerics - What percentage of the non-white space characters are alphabetic or numeric?
  • Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
  • Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
  • Syntax highlighting - What syntax highlighter does it use in order from most to least common. (Note that some variation is expected, especially for tasks like Call a foreign-language function, Rosetta Code/Find bare lang tags or probably the largest source of oddball <lang *> tags: Rosetta Code/Fix_code_tags.)
As of 2022-02-13
Language Task
Count
Avg. #
Lines / Entry
Avg. #
Characters
Avg. %
White space
Avg. %
Alphanumerics
Avg. %Non
Alphanumerics
Avg. %
Non-ASCII
Syntax
highlighting
Phix 1507 53 1695 32.69 % 78.35 % 21.65 % 0.03 % Phix html5 csharp c r phix javascript
Wren 1499 55 1578 31.56 % 75.73 % 24.27 % 0.1 % ecmascript c go javascript C foo bar xml bash python baz
Julia 1479 33 976 27.82 % 77.1 % 22.9 % 0.27 % julia Julia python ruby html5 lua html cpp xml
Raku 1461 30 885 28.87 % 69.31 % 30.69 % 0.57 % perl6 C c shell xml html XML bash
Go 1455 77 1821 34.42 % 75.27 % 24.73 % 0.11 % go Go html bash c xml thrift proto html5
Perl 1428 34 920 27.04 % 68.18 % 31.82 % 0.1 % perl Perl bash Shell html5 latex c
Nim 1398 48 1326 24.65 % 75.52 % 24.48 % 1.12 % Nim nim python Python ruby c C Nimrod $1 xml
Python 1343 71 1996 30.72 % 75.47 % 24.53 % 0.04 % python Python bash html5 cmd c python3 xml ebnf Shell
C 1193 78 1841 28.64 % 72.56 % 27.44 % 0.01 % c C cpp bash XML go Shell perl html5 sh Assembly make d
REXX 1144 55 3358 40.33 % 68.44 % 31.56 % 8.48 % rexx REXX Rexx cobol sh
Kotlin 1131 47 1363 33.21 % 77.62 % 22.38 % 0.04 % scala kotlin Kotlin C c java HTML5 html5 Groovy groovy scheme xml
Java 1124 66 2062 33.41 % 78.56 % 21.44 % 0.01 % java java5 Java Java5 c xml bash html5 foo make cmd Java8 bar java 12 shell
Haskell 1110 44 1347 28.34 % 76.92 % 23.08 % 0.08 % haskell Haskell bash text haksell c html5 sh
J 1109 26 745 26.82 % 69.49 % 30.51 % 6.45 % j J bash sh c foo %s xml html5 make bnf C bar SNUSP baz
Mathematica 1106 11 409 17.52 % 70.72 % 29.28 % 0.04 % Mathematica mathematica foo baz "~~ x~~" bar barf sh
Racket 1090 33 1117 26.47 % 75.05 % 24.95 % 0.29 % racket Racket scheme bash xml html5 C cmd
Ruby 1087 32 824 26.23 % 76.49 % 23.51 % 0.06 % ruby Ruby bash html5 c foo baz tcl bar rust
C++ 1070 73 1990 29.23 % 72.68 % 27.32 % 0.02 % cpp Cpp C++ c CPP c++ sh html5 make asm cmake text bash d C
zkl 1011 19 688 17.98 % 69.69 % 30.31 % 0.03 % zkl c bash r csharp html5


Second pass through after a lot of minor patches to the site and custom syntax highlighting filtering added. Should be a lot more accurate less inaccurate now. Numbers probably still don't mean anything, but they aren't quite as large outright whoppers. J numbers are still way overstated due to the very common decision to just include the output inside the language tags rather than in a separate output section. Not really sure what if anything to do about it (if anything.) I really don't want to take on trying to untangle that mess. --Thundergnat (talk) 21:29, 13 February 2022 (UTC)


Older commentary

Some observations: Phix numbers are completely bogus due to the custom syntax highlighting code polluting every entry. Eventually I'll look into filtering. This is a very preliminary first whack at it. I expected Raku to have a higher percent of non-ASCII characters and was very surprised by J and REXX having so much. On closer investigation, J and REXX entries make heavy use of box line drawing characters... which aren't ASCII. Syntax highlighting directives are all over the place. Case doesn't matter but spelling nominally does. Though, to be fair, most of the syntax highlighting are very minor variations, so getting it wrong probably doesn't change much. There are a whole bunch of obvious typos in there too though. Sigh. --Thundergnat (talk), 08 February

Sorry about the Phix syntax highlighting mess. I'd love to use standard Geshi, and in fact do on my own site, but waited 6 years and nothing happened. In a lovely world there would be a special page on RC containing the geshi highlighting files for all languages, that anyone could edit in the usual way with the usual single-click to undo any vandalism, and periodically (bi-annual would be plenty) some admin pushes updates into the geshi dir. I guess you could even have a <lang PhixRC> mechanism whereby "release candidate" geshi updates could be tried out on selected pages without risking damaging the whole site. Alternatively it should actually be fairly straightforward for a man of your talents to html-strip the Phix entries (just sayin), and/or I'm not above being tasked to go and clean up my own mess, which one day I still hope to be able to do (trust me, I too despise my own minor updates showing up as complete gobbledeygook). I also wonder if it would be at all useful for the syntax highlighting column to contain (first/random) links to offending pages, so I could find out what page it thinks is using say "PL/1" or "Phi"? --Pete Lomax (talk) 10:04, 10 February 2022 (UTC)