How can we fight this spam which is attacking RC nowadays? I don't like too much black listing of netblocks as suggested maybe somewhere, since you could block also "common" people. I've seen a brand new spam user after posting to Named Argument, and I've discovered I can't do too much more than saying I've seen it. I've also noticed the name of these spammers follow a pattern which could be identified (but likely will change...) ... --ShinTakezou 11:17, 29 June 2009 (UTC)

On the Tcler's Wiki, we block problem netblocks from making updates (well, we actually show the spammer a preview page but never commit it to the database, which is a nicer solution as they think they've spammed you successfully) but without seeing the logs for addresses where those spam users are being created from, it's hard to tell whether that will work. It's a fairly stupid spammer though, since external links are all nofollow-marked. Maybe simple techniques will work for now. Plus visibly blocking that netblock from creating a new user too. —Donal Fellows 13:51, 29 June 2009 (UTC)
I didn't want to block the IPs because we had previously had a problem with an IP collision with a legitimate user. I'm not really sure what else I can do. We do have a CAPTCHA, but maybe it's not good enough. --Mwn3d 13:57, 29 June 2009 (UTC)
Since I think it's not robotic spam, I can't see that a CAPTCHA would help. —Donal Fellows 14:13, 29 June 2009 (UTC)
Yeah and I wouldn't suggest turning off anonymous edits because we've had a recent surge of legitimate anonymous editors (and some people would probably think that was inconvenient). We may just have to keep up the old fashioned delete and block strategy. --Mwn3d 14:25, 29 June 2009 (UTC)
Gah. Drop off the face of the planet for a weekend and come back to another spam influx. It could very well be robotic spam if they have a human being sign up the account; CAPTCHAs are only presented to anonymous edits, account creation and login failures. Those settings have worked well for us for the better part of two years. Roboticizing after account creation was an eventuality, but it depended on someone deciding that RC was a big enough target to go the extra steps. (And extra steps are something that the spam economic model tends to avoid; They'd rather hit more weak targets than fewer higher profile ones.) I'm not going to have time to tweak the server settings for a few days, at least. In the mean time, let's watch to see if the problem is going to be bad enough to warrant significant attention. (Unless they've broken reCAPTCHA, it's roughly 1:1 manual labor, which is uneconomic for spammers.) If need be, it might be possible to do a halfway-block; Rather than an outright ban on a user or IP, force all edits from them to go through reCAPTCHA. But that will likely require modding an extension, which I don't have time for right now. --Short Circuit 16:17, 29 June 2009 (UTC)
I don't know if it's possible, but we can deny accounts containing "buy" in their names. --Guga360 16:35, 29 June 2009 (UTC)
If the accounts are manually created, this will not give you much. As soon as the spammer gets the error message, he'll just change the account name to something which works. A better idea would be to special-case edits adding hyperlinks, and demand a captcha for those even for logged-in users. That would stop bots adding links, while not affecting normal users too much (few legitimate edits contain external links, therefore having to solve a captcha in those cases would not be too much of a burden). You could also maintain a whitelist for URLs not protected by captchas (e.g. everything on wikipedia.org), in order to minimize the impact for legitimate edits. --Ce 09:10, 30 June 2009 (UTC)
For Wikipedia there's the special wp: link domain. —Donal Fellows 11:01, 30 June 2009 (UTC)
Don't give an error message to the spammy accounts. Just silently fail to commit any changes they make. (Better would be giving them their own view of the world, but that's more work.) —Donal Fellows 11:13, 30 June 2009 (UTC)
Then diagnosing and resolving false positives would be a PITA. --Short Circuit 14:55, 30 June 2009 (UTC)
If links trigger captchas, then the bots will just post raw URLs. I've seen that one before... --Short Circuit 14:55, 30 June 2009 (UTC)
A raw URL is a link in the wiki, so naturally it should trigger captha, too. --PauliKL 09:14, 2 July 2009 (UTC)

Timing

The creation of each of those accounts requires some form of manual attention. Keep an eye out for a schedule on when they seem to be appearing. For someone to go to that much work to spam a site like this is rather odd. --Short Circuit 00:26, 1 July 2009 (UTC)

We're seeing the same sort of spam at the erights.org wiki, also a MediaWiki; if you want to do analysis looking there as well might be useful. (Feel free to help with the deleting, of course :-) ) --Kevin Reid 00:42, 1 July 2009 (UTC)

Looks like CAPTCHAs don't work

We're still getting spammed even with annoying levels of CAPTCHAs. Looks like this is some ass doing it manually or they've broken reCAPTCHA, though the fairly low rate of spamming indicates that this is probably manual. Time to ban some netblocks from doing updates to the database, given that nuking from orbit isn't an option. (When spam is a problem, there's no point trying half-measures first. They won't work. Spammers are the scum of the earth and have a financial incentive to boot.) —Donal Fellows 11:18, 1 July 2009 (UTC)

Read this. Seems that it's likely a manual effort in an attempt to create landing pages. Banning netblocks isn't really going to help, as Tor makes for an easy workaround. At this point, I'm thinking either utilizing an RBL blacklist, or come up with a Bayes-based edit filter based on the ConfirmEdit extension. (Ham gets marked via MediaWiki's patrol mechanism, while spam gets marked by page deletion.)
The other thought is that spammers are putting manual effort into creating landing pages for email campaigns and the like. We could conceivably #REDIRECT the spam pages to a common target page for the time being. --Short Circuit 18:01, 1 July 2009 (UTC)
Blacklists or Bayesian filters are not very effective ways to filter spam, and they create false positives. A good spam filtering is based on what the spammers are actually selling: their contact information (e-mail address, web address etc.). I would think there is only one or just a few spammers that bother to manually create pages here in Rosetta Code, so it should be possible to add their contact information to spam filter manually. --PauliKL 09:59, 2 July 2009 (UTC)
You could also try simply turning off the creation of new accounts for a while (e.g., a couple of weeks) to encourage the spammers to go elsewhere. The number of new genuine users turned off by this is probably going to be quite small, and the problem does at least seem to be confined to user pages. —Donal Fellows 13:23, 2 July 2009 (UTC)