User:Thundergnat: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Language Stats:: Update stats)
m (→‎Utilities I Wrote:: Update "language links" tapermonkey script to deal with new (Mediawiki 1.39) page title formatting)
 
(10 intermediate revisions by the same user not shown)
Line 11: Line 11:




==Syntax highlighting and CSS guide==
==Reports:==
;* [[User:Thundergnat/Syntax_highlighting_and_CSS|Short guide to Rosetta Code syntax highlighting and CSS customization that I wrote]]


==Reports I Generate:==
;*[[Rosetta_Code/Rank_languages_by_popularity/Full_list|Rank languages by popularity]]
;*[[Rosetta_Code/Rank_languages_by_popularity/Full_list|Rank languages by popularity]]
;*[[Rosetta_Code/List_authors_of_task_descriptions/Full_list|List authors of task descriptions]]
;*[[Rosetta_Code/List_authors_of_task_descriptions/Full_list|List authors of task descriptions]]
Line 21: Line 25:




==Utilities:==
==Utilities I Wrote:==
'''[https://www.tampermonkey.net/ Tampermonkey]''' or '''[https://www.greasespot.net/ Greasemonkey]''' javascript applets
'''[https://www.tampermonkey.net/ Tampermonkey]''' or '''[https://www.greasespot.net/ Greasemonkey]''' javascript applets


;* [https://gist.github.com/thundergnat/c5a86a6d5e0018ac67bdea3fc48786a0#file-language_links-user-js Add language parameters to Category task entry links] - See [[Rosetta_Code:Village_Pump/Add_link_anchors_to_Language_Category_pages|this Village Pump page]] for details. Presently @ version 0.7 (2021/06/16)
;* [https://gist.github.com/thundergnat/c5a86a6d5e0018ac67bdea3fc48786a0#file-language_links-user-js Add language parameters to Category task entry links] - See [[Rosetta_Code:Village_Pump/Add_link_anchors_to_Language_Category_pages|this Village Pump page]] for details. Presently @ version 0.8 (2023/01/22)
;* [https://gist.github.com/thundergnat/5f7f36dc0cf303b110f6d7c6275fbb85#file-toggle_syntax_highlighting-user-js Syntax highlighting toggle] Toggle the task syntax highlighting off and on. Presently @ version 0.1 (2021/06/10)
;* [https://gist.github.com/thundergnat/5f7f36dc0cf303b110f6d7c6275fbb85#file-toggle_syntax_highlighting-user-js Syntax highlighting toggle] Toggle the task syntax highlighting off and on. Presently @ version 0.2 (2022/08/22)


==Resources I Host:==

==Resources:==
;*[https://github.com/thundergnat/rc/tree/master Offsite repository of resource files for various tasks]
;*[https://github.com/thundergnat/rc/tree/master Offsite repository of resource files for various tasks]


Line 44: Line 47:
* Task count - The number of tasks for which there is an entry in that language.
* Task count - The number of tasks for which there is an entry in that language.
* Average lines per entry - The average number of lines in code blocks per entry. Multiple versions under the same task all count as lines per task. (If there are four versions and each has 25 lines, it counts as 100 for that task.)
* Average lines per entry - The average number of lines in code blocks per entry. Multiple versions under the same task all count as lines per task. (If there are four versions and each has 25 lines, it counts as 100 for that task.)
* Average number of character - Average number of characters inside <nowiki><lang *></lang></nowiki> blocks per task including whitespace.
* Average number of characters - Average number of characters inside <nowiki><lang *></lang></nowiki> blocks per task including whitespace.
* Average whitespace - How many of the above characters are white space? (Including new line characters.)
* Average whitespace - How many of the above characters are white space? (Including new line characters.)
* Average percent alpha-numerics - What percentage of the non-white space characters are alphabetic or numeric?
* Average percent alpha-numerics - What percentage of the non-white space characters are alphabetic or numeric?
* Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
* Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
* Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
* Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
* Syntax highlighting - What syntax highlighter does it use in order from most to least common. (Note that some variation is expected, especially for tasks like [[Call a foreign-language function]], [[Rosetta Code/Find bare lang tags]] or probably the largest source of oddball <nowiki><lang *></nowiki> tags: [[Rosetta Code/Fix_code_tags]].)
* Syntax highlighting - What syntax highlighter does it use in order from most to least common; the highlighting specifier with how many times it was seen (in parenthesis). Note that some variation is expected, especially for tasks like [[Call a foreign-language function]], [[Rosetta Code/Find bare lang tags]] or probably the largest source of oddball markup tags: [[Rosetta Code/Fix_code_tags]]. In general, Pygments expects the lexer name to be all lower case.


{|class="wikitable sortable"
{|class="wikitable sortable"
|+ As of 2022-05-08
|+ As of 2022-09-11
!Language!!Task<br>Count!!Avg. #<br>Lines / Entry!!Avg. #<br>Characters!!Avg. %<br>White space!!Avg. %<br>Alphanumerics!!Avg. %Non<br>Alphanumerics!!Avg. %<br>Non-ASCII!!Syntax<br>highlighting
!Language!!Task<br>Count!!Avg. #<br>Lines / Entry!!Avg. #<br>Characters!!Avg. %<br>White space!!Avg. %<br>Alphanumerics!!Avg. %Non<br>Alphanumerics!!Avg. %<br>Non-ASCII!!Syntax<br>highlighting
|-
|-
||Phix||1528||55||1765||32.73 %||78.29 %||21.71 %||0.04 %||Phix c csharp javascript r
||Wren||1571||57||1646||31.54 %||75.86 %||24.14 %||0.09 %||ecmascript(1638) c(65) go(14) javascript(8) python(2) xml(2) AutoHotkey(1) bash(1) perl(1) text(1)
|-
|-
||Wren||1526||56||1608||31.5 %||75.82 %||24.18 %||0.09 %||ecmascript c go javascript C foo bar bash baz python xml
||Phix||1570||55||1782||32.69 %||78.27 %||21.73 %||0.04 %||phix(1922) Phix(7) javascript(1)
|-
|-
||Julia||1501||33||981||27.78 %||77.07 %||22.93 %||0.28 %||julia Julia ruby html5 python cpp html lua xml
||Julia||1538||34||1011||27.91 %||77.21 %||22.79 %||0.27 %||julia(1817) text(4) Julia(3) ruby(3) html5(2) cpp(1) html(1) lua(1) xml(1)
|-
|-
||Raku||1482||30||891||28.93 %||69.32 %||30.68 %||0.57 %||perl6 bash c C html rust shell xml XML
||Raku||1518||30||895||28.78 %||69.35 %||30.65 %||0.59 %||raku(1958) text(7) bash(4) c(4) xml(2) html(1) rust(1) shell(1)
|-
|-
||Go||1480||77||1814||34.52 %||75.24 %||24.76 %||0.1 %||go Go html bash c xml html5 proto thrift vlang
||Go||1497||76||1809||34.56 %||75.23 %||24.77 %||0.1 %||go(1924) html(5) text(3) bash(2) c(2) xml(2) ecmascript(1) futurebasic(1) html5(1) proto(1) thrift(1)
|-
|-
||Perl||1443||34||919||27 %||68.18 %||31.82 %||0.1 %||perl Perl bash Shell c html5 latex
||Perl||1473||34||925||27.02 %||68.17 %||31.83 %||0.1 %||perl(2047) text(12) bash(5) shell(2) c(1) html5(1) latex(1) Perl(1)
|-
|-
||Nim||1401||48||1325||24.66 %||75.53 %||24.47 %||1.12 %||Nim nim python Python ruby c C $1 Nimrod xml
||Python||1406||72||2017||30.84 %||75.56 %||24.44 %||0.04 %||python(2669) text(13) bash(5) html5(5) c(2) cmd(2) python3(2) AutoHotkey(1) ebnf(1) perl(1) Python(1) qb64(1) shell(1) xml(1)
|-
|-
||Python||1361||72||2002||30.74 %||75.45 %||24.55 %||0.04 %||python Python bash html5 c cmd ebnf python3 Shell xml
||Nim||1402||48||1332||24.64 %||75.52 %||24.48 %||1.12 %||nim(1599) text(10) python(8) c(4) Nim(3) ruby(3) nimrod(1) xml(1)
|-
|-
||C||1206||78||1848||28.72 %||72.53 %||27.47 %||0.01 %||c C cpp bash XML Assembly d go html5 make perl sh Shell
||J||1275||28||780||26.93 %||69.73 %||30.27 %||6.53 %||j(2986) text(48) J(11) bash(7) c(4) sh(4) bnf(1) html5(1) make(1) python(1) snusp(1) xml(1)
|-
|-
||J||1146||27||758||27.05 %||69.43 %||30.57 %||6.4 %||j J bash sh %s c foo bar baz bnf C html5 make python SNUSP xml
||C||1215||79||1873||28.76 %||72.62 %||27.38 %||0.01 %||c(1656) text(48) cpp(11) bash(6) C(2) xml(2) assembly(1) cafe(1) d(1) go(1) html5(1) make(1) perl(1) sh(1) shell(1)
|-
|-
||REXX||1145||55||3356||40.33 %||68.44 %||31.56 %||8.48 %||rexx REXX Rexx cobol sh
||Mathematica||1177||12||422||17.56 %||70.77 %||29.23 %||0.04 %||mathematica(1485) text(35) Mathematica(8) wolfram language(2) mathematica (1) sh(1)
|-
|-
||Kotlin||1131||47||1363||33.21 %||77.62 %||22.38 %||0.04 %||scala kotlin Kotlin C c groovy Groovy HTML5 html5 java scheme xml
||REXX||1146||56||3373||40.33 %||68.47 %||31.53 %||8.45 %||rexx(1766) text(11) cobol(1) sh(1)
|-
|-
||Java||1131||66||2066||33.43 %||78.56 %||21.44 %||0.01 %||java java5 Java Java5 c bash html5 xml foo make bar cmd java 12 java8 Java8 shell
||Haskell||1138||45||1355||28.34 %||76.84 %||23.16 %||0.08 %||haskell(2089) text(7) bash(3) Haskell(3) c(1) html5(1) sh(1)
|-
|-
||Haskell||1121||45||1347||28.35 %||76.83 %||23.17 %||0.08 %||haskell Haskell bash c html5 sh text
||Java||1138||66||2077||33.52 %||78.56 %||21.44 %||0.01 %||java(1273) java5(223) text(6) c(4) bash(3) html5(3) xml(3) Java(2) java8(2) make(2) cmd(1) java 12(1) python(1) shell(1)
|-
|-
||Mathematica||1106||11||409||17.5 %||70.75 %||29.25 %||0.04 %||Mathematica mathematica foo Wolfram Language "~~ x~~" bar barf baz sh
||Kotlin||1132||47||1365||33.21 %||77.61 %||22.39 %||0.03 %||scala(1110) kotlin(61) c(4) groovy(2) html5(2) scheme(2) java(1) xml(1)
|-
|-
||C++||1099||73||1976||29.25 %||72.7 %||27.3 %||0.02 %||cpp Cpp C++ c CPP c++ sh html5 asm bash C cmake d make text
||C++||1115||73||1973||29.27 %||72.71 %||27.29 %||0.02 %||cpp(1511) c++(14) c(8) text(4) sh(3) html5(2) asm(1) bash(1) cmake(1) d(1) make(1)
|-
|-
||Ruby||1094||32||834||26.19 %||76.49 %||23.51 %||0.06 %||ruby Ruby bash c foo html5 bar baz rust tcl
||Ruby||1103||32||832||26.18 %||76.49 %||23.51 %||0.06 %||ruby(1588) bash(4) c(2) html5(2) Ruby(2) rust(1) tcl(1) text(1)
|-
|-
||Racket||1091||33||1139||26.49 %||75.09 %||24.91 %||0.29 %||racket Racket scheme bash C cmd html5 xml
||Racket||1089||34||1155||26.48 %||75.08 %||24.92 %||0.29 %||racket(1343) scheme(18) text(9) bash(1) c(1) cmd(1) html5(1) xml(1)
|-
|-
||FreeBASIC||1026||47||1227||30.71 %||82.32 %||17.68 %||0.03 %||freebasic FreeBASIC FreeBasic qbasic basic c Freebasic zxbasic
||FreeBASIC||1073||47||1232||30.87 %||82.41 %||17.59 %||0.03 %||freebasic(1128) basic(2) qbasic(2) c(1) text(1) zxbasic(1)
|-
|-
||zkl||1011||19||688||17.98 %||69.69 %||30.31 %||0.03 %||zkl c bash csharp html5 r
||zkl||1011||19||688||17.98 %||69.69 %||30.31 %||0.03 %||zkl(1728) bash(2) c(2) html5(1)
|-
||Sidef||1003||22||545||30.82 %||70.18 %||29.82 %||0.45 %||ruby(1251) sidef(4) shell(2) html5(1)
|}
|}



<div style="padding:1em;background:#eeeeff;"><h3>Older commentary</h3>
<div style="padding:1em;background:#eeeeff;"><h3>Older commentary</h3>

Latest revision as of 00:56, 23 January 2023

My Favorite Languages
Language Proficiency
Perl Moderately Proficient
Perl 6 Still seems like I use it all the time
Raku Use it all the time

License:

Any code which I have submitted to Rosettacode may be used under the Unlicense.

I would appreciate that any use includes a link back to the Rosettacode page from which it was obtained (but obviously would have no way to enforce that.)


Syntax highlighting and CSS guide


Reports I Generate:


Utilities I Wrote:

Tampermonkey or Greasemonkey javascript applets

Resources I Host:


Language Stats:

As part of my reports I generate weekly (listed above) I have started to collect some statistics on languages that have a large number of entries. (For now, the cut off is 1000.) A sufficiently large number of entries should average out outliers and give at least an reasonable overview of how different languages stack up.

Note that this is purely for my own entertainment and curiosity. I strenuously want to avoid people making efforts to optimize for one metric or another. Many of these are heavily influenced by a particular stylistic choice made by a prolific entry author. None of these are good or bad. Just observations.

Comments aren't filtered out, so entries that have lots of comments will have higher character counts. That isn't a bad thing though. Quite the opposite in fact. The whole point of Rosettacode is to learn how different languages can accomplish the same task. Lots of comments is a good thing.

How the various metrics are calculated / what they mean. Averages are the sum of all of that item in every entry divided by the total number of tasks with an entry, rounded to the nearest integer or percentage.

  • Task count - The number of tasks for which there is an entry in that language.
  • Average lines per entry - The average number of lines in code blocks per entry. Multiple versions under the same task all count as lines per task. (If there are four versions and each has 25 lines, it counts as 100 for that task.)
  • Average number of characters - Average number of characters inside <lang *></lang> blocks per task including whitespace.
  • Average whitespace - How many of the above characters are white space? (Including new line characters.)
  • Average percent alpha-numerics - What percentage of the non-white space characters are alphabetic or numeric?
  • Average percent non-alpha-numerics - What percentage of the non-white space characters are not alphabetic or numeric? (Punctuation, symbols, etc)
  • Average percent non-ASCII - What percentage of the non-white space characters are not ASCII characters?
  • Syntax highlighting - What syntax highlighter does it use in order from most to least common; the highlighting specifier with how many times it was seen (in parenthesis). Note that some variation is expected, especially for tasks like Call a foreign-language function, Rosetta Code/Find bare lang tags or probably the largest source of oddball markup tags: Rosetta Code/Fix_code_tags. In general, Pygments expects the lexer name to be all lower case.
As of 2022-09-11
Language Task
Count
Avg. #
Lines / Entry
Avg. #
Characters
Avg. %
White space
Avg. %
Alphanumerics
Avg. %Non
Alphanumerics
Avg. %
Non-ASCII
Syntax
highlighting
Wren 1571 57 1646 31.54 % 75.86 % 24.14 % 0.09 % ecmascript(1638) c(65) go(14) javascript(8) python(2) xml(2) AutoHotkey(1) bash(1) perl(1) text(1)
Phix 1570 55 1782 32.69 % 78.27 % 21.73 % 0.04 % phix(1922) Phix(7) javascript(1)
Julia 1538 34 1011 27.91 % 77.21 % 22.79 % 0.27 % julia(1817) text(4) Julia(3) ruby(3) html5(2) cpp(1) html(1) lua(1) xml(1)
Raku 1518 30 895 28.78 % 69.35 % 30.65 % 0.59 % raku(1958) text(7) bash(4) c(4) xml(2) html(1) rust(1) shell(1)
Go 1497 76 1809 34.56 % 75.23 % 24.77 % 0.1 % go(1924) html(5) text(3) bash(2) c(2) xml(2) ecmascript(1) futurebasic(1) html5(1) proto(1) thrift(1)
Perl 1473 34 925 27.02 % 68.17 % 31.83 % 0.1 % perl(2047) text(12) bash(5) shell(2) c(1) html5(1) latex(1) Perl(1)
Python 1406 72 2017 30.84 % 75.56 % 24.44 % 0.04 % python(2669) text(13) bash(5) html5(5) c(2) cmd(2) python3(2) AutoHotkey(1) ebnf(1) perl(1) Python(1) qb64(1) shell(1) xml(1)
Nim 1402 48 1332 24.64 % 75.52 % 24.48 % 1.12 % nim(1599) text(10) python(8) c(4) Nim(3) ruby(3) nimrod(1) xml(1)
J 1275 28 780 26.93 % 69.73 % 30.27 % 6.53 % j(2986) text(48) J(11) bash(7) c(4) sh(4) bnf(1) html5(1) make(1) python(1) snusp(1) xml(1)
C 1215 79 1873 28.76 % 72.62 % 27.38 % 0.01 % c(1656) text(48) cpp(11) bash(6) C(2) xml(2) assembly(1) cafe(1) d(1) go(1) html5(1) make(1) perl(1) sh(1) shell(1)
Mathematica 1177 12 422 17.56 % 70.77 % 29.23 % 0.04 % mathematica(1485) text(35) Mathematica(8) wolfram language(2) mathematica (1) sh(1)
REXX 1146 56 3373 40.33 % 68.47 % 31.53 % 8.45 % rexx(1766) text(11) cobol(1) sh(1)
Haskell 1138 45 1355 28.34 % 76.84 % 23.16 % 0.08 % haskell(2089) text(7) bash(3) Haskell(3) c(1) html5(1) sh(1)
Java 1138 66 2077 33.52 % 78.56 % 21.44 % 0.01 % java(1273) java5(223) text(6) c(4) bash(3) html5(3) xml(3) Java(2) java8(2) make(2) cmd(1) java 12(1) python(1) shell(1)
Kotlin 1132 47 1365 33.21 % 77.61 % 22.39 % 0.03 % scala(1110) kotlin(61) c(4) groovy(2) html5(2) scheme(2) java(1) xml(1)
C++ 1115 73 1973 29.27 % 72.71 % 27.29 % 0.02 % cpp(1511) c++(14) c(8) text(4) sh(3) html5(2) asm(1) bash(1) cmake(1) d(1) make(1)
Ruby 1103 32 832 26.18 % 76.49 % 23.51 % 0.06 % ruby(1588) bash(4) c(2) html5(2) Ruby(2) rust(1) tcl(1) text(1)
Racket 1089 34 1155 26.48 % 75.08 % 24.92 % 0.29 % racket(1343) scheme(18) text(9) bash(1) c(1) cmd(1) html5(1) xml(1)
FreeBASIC 1073 47 1232 30.87 % 82.41 % 17.59 % 0.03 % freebasic(1128) basic(2) qbasic(2) c(1) text(1) zxbasic(1)
zkl 1011 19 688 17.98 % 69.69 % 30.31 % 0.03 % zkl(1728) bash(2) c(2) html5(1)
Sidef 1003 22 545 30.82 % 70.18 % 29.82 % 0.45 % ruby(1251) sidef(4) shell(2) html5(1)


Older commentary

Second pass through after a lot of minor patches to the site and custom syntax highlighting filtering added. Should be a lot more accurate less inaccurate now. Numbers probably still don't mean anything, but they aren't quite as large outright whoppers. J numbers are still way overstated due to the very common decision to just include the output inside the language tags rather than in a separate output section. Not really sure what to do about it (if anything.) I really don't want to take on trying to untangle that mess. --Thundergnat (talk) 21:29, 13 February 2022 (UTC)

Some observations: Phix numbers are completely bogus due to the custom syntax highlighting code polluting every entry. Eventually I'll look into filtering. This is a very preliminary first whack at it. I expected Raku to have a higher percent of non-ASCII characters and was very surprised by J and REXX having so much. On closer investigation, J and REXX entries make heavy use of box line drawing characters... which aren't ASCII. Syntax highlighting directives are all over the place. Case doesn't matter but spelling nominally does. Though, to be fair, most of the syntax highlighting are very minor variations, so getting it wrong probably doesn't change much. There are a whole bunch of obvious typos in there too though. Sigh. --Thundergnat (talk), 08 February

Sorry about the Phix syntax highlighting mess. I'd love to use standard Geshi, and in fact do on my own site, but waited 6 years and nothing happened. In a lovely world there would be a special page on RC containing the geshi highlighting files for all languages, that anyone could edit in the usual way with the usual single-click to undo any vandalism, and periodically (bi-annual would be plenty) some admin pushes updates into the geshi dir. I guess you could even have a <lang PhixRC> mechanism whereby "release candidate" geshi updates could be tried out on selected pages without risking damaging the whole site. Alternatively it should actually be fairly straightforward for a man of your talents to html-strip the Phix entries (just sayin), and/or I'm not above being tasked to go and clean up my own mess, which one day I still hope to be able to do (trust me, I too despise my own minor updates showing up as complete gobbledeygook). I also wonder if it would be at all useful for the syntax highlighting column to contain (first/random) links to offending pages, so I could find out what page it thinks is using say "PL/1" or "Phi"? --Pete Lomax (talk) 10:04, 10 February 2022 (UTC)