User:Thundergnat/Syntax highlighting and CSS

From Rosetta Code


Syntax Highlighting

As of August 24, 2022, Rosetta Code has moved to the Miraheze wiki server farm and that has several implications for how the site looks. The largest by far is that the Mediawiki software version has jumped to version 1.38.2 (a welcome upgrade); but Mediawiki has dropped support for the Geshi syntax highlighting system (which, fair enough, has not been well maintained for a while) and moved to the Pygments syntax highlighter. Pygments is pretty nice but there are a few quirks with it that bother me. I did a bunch of research into the details of how it worked, so figured it may be useful to collect the information in one place so others may find it useful.

The standard markup to highlight a block of source code in Mediawiki is

   <syntaxhighlight lang="languagename"> Code </syntaxhighlight>

where languagename is the lowercase name of the lexer for Pygments to use to do the parsing. There is not a default parser. The language parameter must have a value.

   <syntaxhighlight></syntaxhighlight>

is an error. If Pygments doesn't have a lexer for your language, or you don't know which one to use,

   lang="text"

is a safe, if boring choice.

It is not absolutely necessary to enclose the lexer name in quotes, but is highly recommended, and can head off parsing confusion if other parameters are added.

Speaking of which; there is an option to add automatic line numbering to highlighted blocks. (See any Raku entry for examples.) In the markup block, add the parameter 'line', and the highlighted blocks will have automatic line numbering.

   <syntaxhighlight lang="languagename" line> Code </syntaxhighlight> 

If there is not an exact dedicated lexer for your language, very often, a lexer for some other language will yield good results. The language name for the syntax highlighter does not need to match the language name in the header block.


Supported languages

A partial list of directly supported languages:

   ActionScript Ada Agda (incl. literate) Alloy AMPL ANTLR APL AppleScript Assembly (various) Asymptote Augeas AutoIt Awk
   BBC Basic Befunge BlitzBasic Boa Boo Boogie BrainFuck C, C++ (incl. dialects like Arduino) C# Chapel Charm++ CI Cirru
   Clay Clean Clojure CoffeeScript ColdFusion Common Lisp Component Pascal Coq Croc (MiniD) Cryptol (incl. Literate Cryptol)
   Crystal Cypher Cython D Dart DCPU-16 Delphi Dylan (incl. console) Eiffel Elm Emacs Lisp Email Erlang (incl. shell sessions)
   Ezhil Factor Fancy Fantom Fennel FloScript Forth Fortran FreeFEM++ F# GAP Gherkin (Cucumber) GLSL shaders Golo Gosu
   Groovy Haskell (incl. Literate Haskell) HLSL HSpec Hy IDL Idris (incl. Literate Idris) Igor Pro Io Jags Java JavaScript
   Jasmin Jcl Julia Kotlin Lasso (incl. templating) Limbo LiveScript Logtalk Logos Lua Mathematica Matlab Modelica Modula-2
   Monkey Monte MoonScript Mosel MuPad NASM Nemerle NesC NewLISP Nimrod Nit Notmuch NuSMV Objective-C Objective-J Octave
   OCaml Opa OpenCOBOL ParaSail Pawn PHP Perl 5 Pike Pony PovRay PostScript PowerShell Praat Prolog Python QBasic Racket
   Raku(a.k.a. Perl 6) REBOL Red Redcode Rexx Ride Ruby (incl. irb sessions) Rust S, S-Plus, R Scala Scdoc Scheme Scilab
   SGF Shell scripts (Bash, Tcsh, Fish) Shen Silver Slash Slurm Smalltalk SNOBOL Snowball Solidity SourcePawn Stan 
   Standard ML Stata Swift Swig SuperCollider Tcl Tera Term language TypeScript TypoScript USD Unicon Urbiscript Vala
   VBScript Verilog, SystemVerilog VHDL Visual Basic.NET Visual FoxPro Whiley Xtend XQuery Zeek Zephir Zig

Remember that the lexer name is (in general) the lowercase version of the language name. The Raku lexer is "raku".

The official (larger) list of languages supported by Pygments is available here.

There are several languages that, under Geshi, had separate lexers, but using Pygments, have been remapped to use a different common lexer under the hood. You can continue to use the old name, but if you want to use the actual lexer name, the list is here.


Look and Feel

Mediawiki provides quite a bit of customization for layout and "look & feel" available in the Appearance tab of your preferences settings page. I don't really care much for the Mediawiki default skin. (Vector) While it looks nice, I think it wastes too much space in the interest of aesthetics. I find the Monobook skin much more to my liking. (Your opinion may vary and that's OK, it's just like, my opinion, man!)


Customization

There are a few things about the default syntax highlighting that don't thrill me. Luckily Mediawiki makes it easy to customize skins with CSS. You can individually customize each skin, Globally across all Rosetta Code skins, or (!) globally across all Miraheze hosted Mediawiki software instances (probably not a great idea unless you are very sure of what you are doing.)

To scratch my itch, I added some customizations to the Shared CSS settings.

The default background color for the syntax highlighted blocks is exactly the same as for the output <pre></pre> blocks. Both are by default a light grey. While the grey isn't bad, I prefer that the code blocks look different than the output blocks, and I prefer a slightly lighter background color for the code blocks. Conveniently, every syntax highlighted code block has several CSS classes attached so you can easily modify elements with pretty fine control. I want to modify all of the code blocks to be distinctive from the output blocks so I added:

.mw-highlight pre {
  background-color: #fff9ee;
}

to the Shared CSS. The .mw-highlight finds the Mediawiki highlighted elements, The pre confines it to the <pre></pre> blocks, and the background-color is set to a light taupe(?)

Sort of like this.

That's great for large scale, general purpose modifications, but you can make very targeted changes as well.

The default syntax highlighting renders numbers in a medium grey. Kind of low contrast and hard to see IMO. I'd like to make them a more contrasty color and maybe bold so they stand out more. We need a little more information first. Checking the properties of syntax highlighted integers I see they belong to a class "mi"; decimal numbers are class "mf". What do those classes mean?

Tokens

Looking in the token.py file for Pygments, there is a list of all the possible output tokens; which are converted to span classes in the Wikimedia highlighted text. The Numbers section lists all of the possible numeric tokens that may be emitted by the tokenizer.

   Number:                        'm',
   Number.Bin:                    'mb',
   Number.Float:                  'mf',
   Number.Hex:                    'mh',
   Number.Integer:                'mi',
   Number.Integer.Long:           'il',
   Number.Oct:                    'mo'

The "mi" elements are integers, and "mf" are floating point numbers. The classifications in Python may not match up exactly with numerics in your language. For instance, in Raku, 1.5 is a rational number not a floating point, but we can only work with what's available. Probably should include the other numerics too in our customization.

On a side note: Speaking of working with what's available, the Raku lexer in Pygments gets octal numbers incorrect. It treats a decimal number that starts with a zero (0) is being octal but that's a holdover from Perl. In Raku an octal number is denoted by beginning with 0o. (Much like binary is 0b and hexadecimal is 0x.) Probably should file a bug report with Pygments.

So. I'd like the make the numbers black instead of grey and to make them stand out a bit, I'll make them bold. I really only want to affect the particular language I'm mostly interested in, (Raku, if you hadn't guessed.) so want to only apply my changes to the Raku examples. Luckily there is a class for each different highlighting lexer.

For Raku the highlighted code class is .mw-highlight-lang-raku.


Putting it all together

I've added these elements to my shared CSS file:

.mw-highlight-lang-raku .mi {
  font-weight: bold;
  color: black;
}

.mw-highlight-lang-raku .mf {
  font-weight: bold;
  color: black;
}

.mw-highlight-lang-raku .mh {
  font-weight: bold;
  color: black;
}

.mw-highlight-lang-raku .mo {
  font-weight: bold;
  color: black;
}

.mw-highlight-lang-raku .mb {
  font-weight: bold;
  color: black;
}

Now all of my numerics in Raku examples show up black and bold.

For example: 1 2.0 3e1 0x5 06 0b111 instead of 1 2.0 3e1 0x5 06 0b111

Much easier for me to see. :-)

Theoretically I should be able to combine all of the numerics that are sharing the same markup in to one declaration but it doesn't seem to work correctly with the site.

You can look right at my shared CSS file any time you like.


Hope this may be helpful to somebody. --Thundergnat (talk) 20:30, 31 August 2022 (UTC)