Recently in News Category

Unexpected downtime

| No Comments | No TrackBacks
Tuesday, we had a few minutes of unexpected downtime. From the sounds of things, Linode rolled out an update of their node manager, and at least a few nodes* rebooted. Rosetta Code's node also rebooted. It doesn't seem that any data was lost.

* Gauged by anecdotes seen on a live twitter search I'm no longer watching.
Happends
  • We switched servers from Slicehost to Linode.  Site is running a lot faster, now.
  • ImplSearchbot was shut down.  At present, a rewrite of a rewrite of it is serving up static JSON files built from Mediawiki's category data drawn from the internal database representation.
  • Johannes Rössel created a client-side script in PowerShell to perform the type of work that ImplSearchBot performed, based on the JSON data.
Happenings
  • Opticron is building a MediaWiki extension to do on-the-fly generation the reports that ImplSearchBot.  To this end, a fresh export of many of Rosetta Code's pages was produced.
  • The "Tasks not Implemented in " pages were moved to the new Reports namespace.
Happenings to Be
  • It's well past time to update and upgrade GeSHi again, and to pull in support for the various languages that have cropped up on Rosetta Code.  Michael Mol is part of the GeSHi project, so if there are any changes, features and other concerns that need to be addressed, renew them on the relevant Syntax Highighting page.
  • With the update of the Blog software, XFeeds has been having issues.  It may be updated, replaced or removed; The specific outcome remains to be seen.
  • The server may get some additional reconfiguration to add support for Squid caching, but doing so for MediaWiki is non-trivial, will require some extensive attention.
  • Michael Mol has found an outside company that is willing to work with Rosetta Code towards the goal of publishing and selling books based on categorical specs provided by any user or visitor of the site. Naturally, the site's GFDL license will be respected; Any book sold will have a free electronic copy available for download. Each book is expected to contain a list of contributors to the pages used, as well as the contents of their user pages.
  • Michael Mol has been logging #rosettacode on Freenode continually since 2007, but hasn't yet put those logs online in a consistent and updateable fashion. Assistance and/or advice would be helpful.

Due to a lack of available time resulting from the growth of the site, the growth of ISB's mission, and other issues unrelated to Rosetta Code, I do not have time to maintain and operate the bot in addition to other site maintenance tasks. As a result, ISB was disabled since Labor Day 2009 due to a lack of sufficient time to get the bot up and running properly, and very likely won't be resumed in its normal role. It needs to be replaced by another bot, mechanism or avenue maintained and operated by someone who has more time available to respond to bugs and feature requests. To this end, ISB has been repurposed to provide fast access to the raw category data of the wiki, avoiding some of the overhead that bots that depend on category data currently face.

I should add that the "unimplemented in X" pages are still needed, or at least the information they provide is, but I don't have the time to maintain the software that explicitly creates and maintains them; There is a backlog of other things I need to work on with respect to site infrastructure, including fixes to old problems and addition of new features. I am very, very open to helping anyone interested in writing a substitute bot get started.

Until I have time to set up the new Report namespace, take a look at these JSON files. There is one JSON file there for every category on the wiki. Each JSON file contains the contents of the relevant category, with the exception of this one, which contains the names of all the categories. There is a running service on the server that updates the JSON files within a few minutes of a page being added to the category. The update is tied to the server's five-second load average, and should almost never take longer than four minutes. The timestamps on the file, for the most part, reflect the last time the category was updated; The files are only written to if the category contents change between checks, or if the file was deleted by manual means.

Wander over to ImplSearchBot Fate and Replacement and discuss a replacement for the ImplSearchBot. Some time in the next couple weeks, I'll be ready to grant Bot privileges to whatever account is to replace ImplSearchBot. Don't limit your ideas to MediaWiki bots, though; There are a variety of other ways the data could be used, from RSS feeds to in-browser widgets.

Change of hosting, other updates

| No Comments
Rosetta Code is now hosted at Linnode, and has roughly twice the physical server capacity that it had at Slicehost. That extra space allowed me to properly tune MySQL, as well as improve usage of memcached and install php5-xcache. I'm still not using Squid, though I was able to pull 5-7Mb/s worth of pages when using HTTP KeepAlives.

ImplSearchBot has been down since Labor Day, and will likely remain down until it is replaced.  More on that later.

You might have noticed either that the blog was down for much of the week following Labor Day, or, alternately, that the blog looks significantly different.  For a combination of performance and security reasons, I've migrated the Rosetta Code blog from Wordpress to Movable Type.  Old posts have formatting issues.  I may go back and correct them as I have time, but of all the traffic data I have, nothing suggests that that would be worthwhile.

The Rosetta Code planet is not being updated at the moment, but that will be rectified this weekend.  Hopefully, I will also be able to finish the infrastructure to provide faster and simpler access to the data that ImplSearchBot depends on.  And I will likely write a couple more blog posts.

News, notes and plans

| 2 Comments
ImplSearchBot is running again on a maintained basis.  It wasn't buggy in itself, but its traffic pattern did not mesh well with massive incread in traffic that came from StumbleUpon.  Load averages of 20-50 do not make a server with four logical processors happy.The initial fix for ImplSearchBot was to change its behavior to pause periodically whenever the server's load average exceeded a threshhold value.  When I saw that Rosetta Code visitors were continuing to receive HTTP 500 errors, I dug into matters a bit more deeply.Part of the problem was using FastCGI to handle the traffic load, and its behavior and the related problems are documented in my previous post.  The solution available to me was to Switch from FastCGI back to mod_php.  Conveniently, mod_php doesn't time out within  any time frame that the average Web user will notice.  Inconveniently, mod_php will not work with Apache's mpm_worker (Would someone mind making mod_php and extensions thread-safe?), so the server is still stuck with a separate copy of all of PHP's static runtime data for each process, rather than sharing between threads.  So, memory wise, we're no more efficient than with FastCGI and php-cgi.Also, inconveniently, it required additional configuration to limit the number of concurrent Apache processes running; The default settings for mpm_prefork on Ubuntu don't see any problem with having twenty-five or so concurrent processes running.  Within ten minutes of restarting apache with the new settings, the server bogged down under enough concurrent processes that it became totally unresponsive to external stimuli; The last thing I could get out of the console before a hard reboot was an OOM message regarding MySQL.  To this end, I configured Apache in a way that should limit the number of concurrent processes that get spawned.Then we ran into a problem with search engine crawler bots.  They were requesting pages so quickly that they were either causing too many Apache threads to be spawned, or, after I reconfigured Apache, filling the "waiting clients" queue, causing legitimate users to have to wait an undue amount of time.  I tested with my browser, and my browser timed out.First to bat was our old nemesis, Yahoo! Slurp.  I long suspected Slurp of ignoring robots.txt, or at least the Crawl-delay directive, as the primary culprit of most reports I've had of HTTP500 errors turned out to be a large number of rapid-fire requests from Slurp's user agent.  Well, I've finally discovered that Slurp doesn't exactly ignore Crawl-delay.  Rather, rosettacode.org was getting hit by five concurrent crawls by Slurp bots coming from different IP addresses.  Whatever value I set for Crawl-delay was effectively being divided by five on average, and nothing prevents those bots from all requesting different pages within the half-second time span.I increased Crawl-delay.  If Slurp is identified as the cause of problems again, I won't have much of a choice but to disallow it entirely.Next to bat was 80legs.  Their distributed crawl system, clever as it is, was hitting the server with a request every second.  Their crawler bot does not support Crawl-delay, but they responded quickly to my request for throttling.Finally, I had to slow down Twiceler, which, while not as heavy as Slurp or 80legs, was still heavy.  Twiceler supports Crawl-delay.  So do Google's indexer, Bing's and a fair number of others.  I added Crawl-delay to the global section of robots.txt.Things seem to be running fairly snappy, now.  Not as good as they could, but it's running.While all these were happening, I received three different offers of hosting.  One from someone in #rosettacode who I don't know very well, someone in #perl who I don't know at all, and one offer from Boise On-Call IT, which is a small company run by someone I know both socially and professionally.  For now, I'm trying to work things out with Boise On-Call IT services (There's a potential for technical incompatibility; Rosetta Code has had some heavy configuration done to be able to run reasonably well in a small footprint, and more is needed, if only for the sake of aiding expandability.).  If that doesn't work out, I'll revisit the other offers, as well as consider the possibility of moving to a larger dedicated host, but that's going to cost.I've not yet found good commercial colocation facilities local to me (Grand Rapids, Michigan), and running things out of my home carries with it a large number of unpleasant problems with infrastructure quality and cost complications.Finally, I still need to tune MySQL.  A large portion of it is sitting in swap, and that needs to be fixed.  As with Apache, MySQL's default settings are more suitable with a system with far more memory than the 256MB slice Rosetta Code is running on currently.  I've also discovered   If I can tune MySQL to not require so much RAM, then it's conceivable that memcached might be able to have more than 3MB of its allocated 64MB in physical memory, rather than swap...With the system running as well as it is at the moment (admittedly, it feels a tad slower than, say, a couple weeks ago), I've changed things with ImplSearchBot's run schedule.  Rather than running every four hours as it did as recently as two weeks ago, or even once per day as I rescheduled during the StumbleUpon influx, it's running continually.  Every time it finishes its cycle, it starts over.  With the way it's currently coded, and with the current non-internal server load, every page that ImplSearchBot regularly touches is being updated within an hour and a half.  Future updates should make that a worse case scenario, with the normal case being on the inside of fifteen minutes, or even within a minute if there's nothing else to do.One of the most common and recommended things to do for MediaWiki is to use Squid as an accelerator cache.  Unfortunately, that's not really an option for Rosetta Code, as it requires patching and doing a custom build of Squid for full effectiveness, and it's possible that the site may need to move to a server where custom builds of such software isn't an option.  There's also the question of compatibility with other software packages which don't support the Vary HTTP header, etc.Another problem I've witnessed in MediaWiki is Monobook's usage of PHP-driven CSS and Javascript files.  Install Firebug, navigate to any page on Rosetta Code's wiki, and watch the transferred files as you do a full refresh.  Anything that pulls from index.php requires the server to process your request with PHP, which means hitting the database again, which blocks the Apache process, which either (pre-mpm-reconfigure) spawns another Apache process, potentially causing overuse of swap or (post-mpm-reconfigure) holds up the client request queue longer.  I don't mind Common.css so much; That's rather important whenever things need to change.  The other requests for PHP-provided styling and client-side scripting currently return empty files, which means database utilization without providing any utility to most end-users.  (The returned files may be edited on a per-user basis in their preferences.)It's been suggested (and I've even broached the subject myself in the past) that Rosetta Code move away from MediaWiki.  While it could potentially ease the pain of some of our current problems, I know of no other affordable CMS with as strong a core developer base, as strong as an install base, as familiar an editing interface, as smooth an upgrade path history, as strong a modding community, or as long-viewed in probable funding and continued development.  In other words, while it can be a pain to bend to our needs, it's at least stable, codewise, and I'm usually more than happy to bend things to do what they weren't originally intended to do.However, there is one piece of software that's currently public-facing that I would like to do without.  I want to get rid of Wordpress.  Yes, it has a large developer and modding community.  Yes, it likely has a long future as a software package.  However, its upgrade path is painful, it's computationally expensive to run (It hasn't had a good built-in cache, and the previous plugin we were using was abandoned by its developer.), and I'd like to find something better.  Most promising is Movable Type; I like the idea of serving up static files from disk unless things are actually being posted.Finally, one of the barriers to getting and maintaining better hosting is lifting slightly; Within a few months, Rosetta Code should start being able to sell books derived from its content.  The plan is to allow users to request a book with content chosen based on a rule set, have that book be generated programmatically, and have the PDF and printed version be available at the same time.  The PDF as a free download, and the printed version via a Print-On-Demand service.  Money gained through sales via the POD service would pay for the manual component of producing the books (I never found a POD service I could comfortably fully automate things with), as well pay for upgraded hosting and other improvements to the site. (It would be nice, for example, to have a paid server admin, or backend developer, or...) Additionally, the ruleset that chose the book's content would be saved, and the book would be re-released in subsequent editions periodically.  I might even be able to automate submitting the PDF versions to archive.org.

Language template changed

| No Comments
Hey guys, it's Mwn3d. You may have noticed a change to the language template. This change was slowly developed in the language beta template, discussed, and implemented, but there may still be problems. Go to the category page of your favorite languages (and some of your least favorite languages) and make sure everything looks ok. Here's a little checklist:
  1. The lang tags can be checked against this list.
  2. Fill in any of the other features for any language.
  3. Move links to official sites in the category text to the "site" parameter in the template.
  4. If you see an error in the features for a language, fix it or discuss it in the language's talk page.
  5. If you see a problem with the way the tables or div boxes are set up, fix it or discuss it on the language template talk page.
  6. If you just want to spruce up the template with color or something cool like that, go ahead. It's a wiki so we can just undo it if we don't like it.
Also, for anyone working in any of the newer languages to RC, the language comparison table could use more rows. It's a good quick reference for people who want to see how a new language works.

Calling the POD People

| No Comments
Hello, it's me, Mike Mol.  I'm writing here today because I'd like to do something, and I don't know how to do it.  While that's generally the case for the folks who visit Rosetta Code, this particular question can't be solved by comparing two or more programming languages, or by putting up a Task and seeing how other people do it.I would like to extend Rosetta Code to print.  As in bound hard-copy dead trees.  I'd like for Rosetta Code to sell one or more books that take a few languages, show those languages side-by-side for various tasks, and, of course, list all the contributors to those code samples, and include a URL where the print-ready PDF is available. (It is GFDL content, after all.)To go from print-ready PDF to an actual hardcopy book, I need a printer.  To avoid dealing with sales, I need a publisher. (I have absolutely no interest in dealing with the headaches revolving around processing payments from PayPal, checks or any other payment vendor, canceled checks, refunds, returns, you name it.)  So far, POD services like LuLu would seem to be the best option.Of course, I'd like to avoid as much editorial and layout work as possible, so I'll most likely automate the entire process--and therein lies the problem; Any time a new book is ready to go out, I'll need to upload it and set everything up.  I would much, much, much prefer to be able to use a service where an API allows me to programmatically do all the work I would otherwise have to do by hand.  I have no complaints about writing the code on my end to interact with such a thing.If I can get that far, then the skies open up with a realm of possibilities. I could provide a page where anyone could request that a book be published compares the languages they'd like to see compared, or which includes all of the tasks that they're interested in.  They click Submit, the script spends a day or two preparing the book, and then the PDF and book get published simultaneously, and they can have their copy (or copies) within a week or two.There are a number of cases where somebody might want one (or more) hardcopies This could be particularly valuable to teachers who want to contrast a set of languages, showcase a particular language, provide example code for a number of algorithms or other problems, etc.Yes, there's obviously concern about ill-timed vandalism.  Each PDF would probably be held for a little while before being sent on to the publisher.  That would mostly be a case of watching the pages that were included for signs that they'd been vandalized. (Rosetta Code has an awesome community for watching for vandalism.)  I suppose somebody could get themselves listed as a contributer under an obscene name, but I can work around that, too.

Upgrades complete

| No Comments
Rosetta Code Mediawiki and Wordpress installs have been updated to latest stable versions.  For the curious, the wiki database backup weighs in at 129MB.  Please email me if something's broken.--Short Circuit

Comments re-enabled

Comments have been re-enabled.Comments with 5 links or more will not be posted.  Comments with an incorrectly-filled CAPTCHA will probably not be posted.

Comments disabled

Due to a deluge of spam getting past reCAPTCHA over the last two days, comments have been temporarily disabled.

About this Archive

This page is an archive of recent entries in the News category.

implsearchbot is the previous category.

Stats is the next category.

Find recent content on the main index or look in the archives to find all content.