Village Pump:Home/Syntax Highlighting ( archived 2009-06-18 )

Revision as of 00:54, 23 January 2009 by MikeMol (talk | contribs) (→‎Problem with Lisp: GeSHi update fixed it.)

I'm going to make some major changes to the Syntax Highlighting extension this weekend. Instead of denoting a block of C code as:

<C>(some code)</C>

Code will be denoted as:

<code lang="C">(some code)<code>

This will significantly clean up the Mediawiki extension namespace, and make formatting tricks with CSS easier. I'd rather create an attribute to <pre>, but that appears as though it could be more complicated. --Short Circuit 05:52, 2 July 2008 (UTC)

Will we need to go and change all of the previous highlighting then? --Mwn3d 06:11, 2 July 2008 (UTC)
Yes. Both approaches will be supported for a little while, I expect, but the current system will definitely be phased out. --Short Circuit 23:42, 2 July 2008 (UTC)
Isn't it possible to access the raw files behind the Wiki, so one could run a simple replace operation on all of them? Also, is there any way to add syntax highlighting for languages for which it isn't supported yet? --Dirkt 11:07, 6 July 2008 (UTC)
I'm not sure about the raw files idea (it sounds easy enough), but the new languages idea gets a bit hairy. The latest GeSHi (v1.0.7.22) has support for 96 languages (counting things like "java" and "java5" as different languages). These languages don't necessarily overlap with the 103 here. Newer languages like Rhope and non-computer languages like TI-83 BASIC will probably never have highlighting. In order to add a new language, we would need to make a special PHP file for it with lists of the following:
  • All keywords that could be highlighted (in a jagged two dimensional array, separated into groups which get different kinds of highlighting)
  • Style codes for each group of keywords, comments, escape characters, brackets, strings, numbers, function names, symbols, scripts, and regex's
  • URLs for each group (if you want keywords to link anywhere like javadocs for Java)
  • Quote characters
  • Symbols from the language (besides math operators)
  • Comment characters
Also it would need a regular expression for comments. It would probably be best to talk to the people working on the GeSHi project about adding new languages.--Mwn3d 16:29, 7 July 2008 (UTC)

Adding new languages

Ok, I had a look at the source in the SVN. Doesn't seem too difficult to add new languages; basically all one has to do is to copy the php files for one of the existing languages and change it to suit the new one. I guess I could make for Haskell, say, in less than an hour. BTW, I cannot see 96 languages, the SVN just has C, Codeworker, C#, CSS, Delphi, Doxygen, Eiffel, HTML, Java, Javascript, PHP, QBasic, SQL, VHDL and Web3D. Unless I looked in the wrong place.

I'd like to propose the following:

  • Add a page about syntax highlighting, what languages are available, and what one must do to extend/change the existing highlighting.
  • Link that page from the homepage.
  • Make a copy of the php-files with the syntax highlighting code available via the Wiki. Then people can just grab the code that fits best, and turn it into code for a new language.
  • When they've done that, they should ask someone with admin rights to incorporate the changes. That shouldn't happen too frequently, so the workload for the admins should be tolerable. For security reasons, it's probably a bad idea to allow editing of "life" php code.
  • Make <code lang="xyz">...</code> act the same as <pre>...</pre>, if there's no syntax highlightling definition for xyz. This will allow to write syntax highlighting tags right now, instead of having to replace them all later when syntax highlighting for that languages becomes available
  • Once the php code has settled, one can submit it back upstream.

Rationale:

  • Rosetta is the ideal place to write and test syntax highlighting. There are already many code examples available, and it will be immediately useful. And it will also offer motivation to make syntax highlightling for more esoteric languages.
  • I'd like the "feedback loop" to be as short as possible, in true Wiki style. People contribute because they can immediately see the results. If one first has to contact the upstream developers, then wait until the next version of Geshi comes out, then wait until it's installed at Rosetta, etc., I guess the motivation to do something will be pretty low. It's already bad that one ask to one of the admins to "go live" with it, but I guess otherwise the security risk is just too great.
  • I offer to do all of the work outlined above myself as far as I am able to :-) So I need someone with admin rights to make a copy of the php scripts of the currently installed available through the Wiki, but I can write the other Wiki pages etc.

Comments? --Dirkt 09:29, 8 July 2008 (UTC)

You can see all of 96 languages here: http://geshi.svn.sourceforge.net/viewvc/geshi/tags/RELEASE_1_0_7_22/geshi-1.0.X/src/geshi/
I guess you could try to just morph existing languages into new ones...I wouldn't want to do it, but I can add them if you make them. Maybe we should talk about it in the IRC channel. --Mwn3d 20:58, 8 July 2008 (UTC)
Ok, that's better :-) I guess those languages should be enough for most cases. BTW, is the new code/lang-tag already active? I cannot get it to render. While we're at it, the math-tag is also broken.
I also created/changed a couple of pages to document the syntax highlighting stuff. Please correct/update as appropriate. --Dirkt 10:26, 9 July 2008 (UTC)
We had a page on GeSHi already. The math tag was never installed. When I made the Formatting page before I told Short Circuit about it, but it wasn't very important back then. We don't deal much with math symbols more complicated than <sup> and <sub> anyway. --Mwn3d 14:18, 9 July 2008 (UTC)
Well, that page isn't particularly easy to find: Special:Whatlinkshere/Help:GeSHi and the two links there list no pages linking to it except the above link you gave. And if you don't know that GeSHi is installed in the first place (and what it is), the name doesn't help, either. IMHO, it's really better to naming pages after their function. For the math tags, some people (not me) did propose problems that need a substantial amount of math, and we have several problems that at least come near a moderate amount of maths (like eigenvalues for matrices). So occasionally, the math tags would be useful. Actually, when involved with the first problem mentioned, I spend quite some time rewriting stuff into math tags, only to discover they don't work. I don't know how difficult it is to support them, but at least I would appreciate a clear decision, and easy to find information that tells you if they are supported, or not :-) --Dirkt 10:06, 10 July 2008 (UTC)

Problem with Lisp

There is something wrong with the highlighting of certain keywords in Lisp, for example: <lisp>defun</lisp> <lisp>list</lisp> <lisp>length</lisp> --Spoon! 09:03, 3 November 2008 (UTC)

This still hasn't been fixed, as of December 31st, 2008

This still isn't really our problem to fix. Check the discussion above. The syntax highlighter we use is from the open source project GeSHi. If someone could help us find a suitable replacement, that'd be a big help. --Mwn3d 01:00, 1 January 2009 (UTC)
Looks like the GeSHi update fixed it. --Short Circuit 00:54, 23 January 2009 (UTC)

Missing Python keywords

At least two keywords aren't getting highlighted for python syntax, 'any' and 'all'. I suspect that 'with' is also missing.
--64.238.49.65 00:46, 14 November 2008 (UTC)

The latest GeSHi release has the new Python keywords, builtins, and types defined. This would be the GeSHI version released on 25 Dec 2008. --Rldrenth 15:02, 2 January 2009 (UTC)
The new version of GeSHi is installed. --Short Circuit 00:54, 23 January 2009 (UTC)

Enhancement for C syntax hl

While adding some example for C, I noticed the following oddities:

  • the parser is not case sensitive (C is!), since it hl-ed If as the keyword if--ShinTakezou 14:25, 17 December 2008 (UTC)

<c>if</c> <c>If</c> <c>iF</c> <c>IF</c>

Fixed. I'll send the relevant changes upstream. --Short Circuit 00:53, 23 January 2009 (UTC)
  • the multiline preprocessor defines (using \ at the end of the line) are not handled

--ShinTakezou 14:25, 17 December 2008 (UTC)

It looks like GeSHi handles preprocessor directives by identifying the # as a single-line comment character. In order to use a single # for a multi-line comment, one would need to modify the custom regex field:

<php>'COMMENT_REGEXP' => array(1 => '/\/\/(?:\\\\\\\\|\\\\\\n|.)*$/m'),</php>

Adding another regex to the array to handle multiline preprocessor directives should fix that. My regex is rusty, though. --Short Circuit 00:53, 23 January 2009 (UTC)

Update GeSHi

Can we get GeSHi updated? Also I created a Modula-3 language file for GeSHi and submitted it to their sourceforge forums, hopefully it will get put in SVN for their next release. --Mbishop 05:22, 22 January 2009 (UTC)

It's been updated, per your request. Also, I created this 1.5MB file in an attempt to put the GeSHi source files where everyone could see and correct them. I'd intended to place it into a subpage of this one, but MediaWiki OOMs while it parses the wikicode. And I'm not fond of the idea of increasing the interpreter's memory consumption limit again. --Short Circuit 10:02, 22 January 2009 (UTC)
Is there a way to edit that file? Is that the one that the site uses or is it just compiled from the GeSHi files? --Mwn3d 19:14, 22 January 2009 (UTC)
No way to edit it where it currently sits...It's too large for Mediawiki in one large clump. It could be broken into per-language pages, though. --Short Circuit 23:42, 22 January 2009 (UTC)
I think if you do it right, languages you add to the syntax highlighting will show up in the "Parser extension tags" here. I'm not quite sure though. --Mwn3d 13:48, 22 January 2009 (UTC)
I can't seem to get the Modula-3 highlighting to work? I tried <modula3> and even <source lang="modula3">. --Mbishop
<modula3> isn't listed in the parser extension tags on Special:Version. I guess something has to be inserted somewhere in the MediaWiki code, so it knows that <modula3> stuff has to be handed to GeSHi. --Ce 18:45, 22 January 2009 (UTC)
Perhaps, I know the language file works (tested it on my own apache), but I didn't test with mediawiki, not sure what needs to be done there. --Mbishop 19:52, 22 January 2009 (UTC)
The Mediawiki extension I'm using may have a specific list of supported languages. I'll check to see what exactly is going on... --Short Circuit 23:42, 22 January 2009 (UTC)
Indeed, that was the problem. Changed. <modula3> should now work. But now the namespace seems to have conflicts. The contribution copyright warning seems to be being parsed as a language. This is the kind of thing I was worried about with the namespace pollution. Working on it... --Short Circuit 00:04, 23 January 2009 (UTC)
Fixed. Apparently, there's a language whose GeSHi tag would be <div>. --Short Circuit 00:22, 23 January 2009 (UTC)