Blog comment spam taken to the next level?
Posted by Jonas Elfström Wed, 19 May 2010 21:01:00 GMT
Only days after starting this barely visited blog the comment spammers showed up. I activated Akismet and it solved my problem for a while. Over time the spamming got worse and although almost no spam actually showed up on the blog, my fear of false positives still forced me to remove the spam manually. Not my idea of fun.
I then activated a very simple hurdle. I demanded JavaScript to be able to post comments. Well, not really, what Typo really did was to expect the HTTP header X-Requested-With: XMLHttpRequest
. That worked flawlessly for about three years. Just six weeks ago I noticed the first spambots including the needed header to pass through. Almost all posted stupid drug ads that Akismet easily identified as spam. The situation was still under control.
Yesterday something completely new happened. I got a comment on an old post about my admiration of Ira Glass and This American Life. The comment seemed believable enough and it passed the spam filter. Still, the link at the end undeniably identified it as spam. Then there was another, and another, and up until now a total of six spam comments in the new and more advanced format. I'm still not sure if these are scripted but it seems an awful lot of work if they are actually manually typed.
And also these on a post about infinite ranges in C#:
I'm kind of bothered with the ones both referring to content of the post and at the same time mentioning that it's not about programming like most of my posts are.
Are these human or machine made? A clever combination? Also, if you happen to be a spambot and actually answer this then I guess I will have to congratulate you for passing the Turing test.
PS. They spammed from 110.0.0.0 - 110.255.255.255
so if you happen to have problems with the same spammers and aren't worried about blocking the Philippines then you know what to do. DS.
Good article. One of the challenges we faced building DirtyPhoneBook, besides the obvious scaling and legal concerns was how to deal with spam. We took a several tiered approach.
a) As of now, not auto-linking URLs to decrease the incentive to spam b) Rate-limiting commenting. c) Using Akismet for spam control and having other services code ready to go if need be. d) Having a reporting system for users to report spam e) Having a captcha system (which will soon also be a number verification system to remove that captcha) f) Making some form data hidden fields change so that robots can’t easily be programmed to submit comments.
There are a few other things we did but those are some of the basics and despite a few serious attempts to spam us so far nothing has gone through.
@Charles thanks, and yes there are certainly a lot of things I could do to try to stop these.
In this case the new kind of spams made me curious to find out if they are generated automatically, from the content of my blog, or if some kind of virtual sweatshop is involved. Because if they are purely software based then they are certainly up there with ELIZA, and that’s a new breed of spam, at least for me.
Seeing as the spam-comments are well-spelled and with no obvious (american?) internet-twitter-sms acronyms and misspellings, there’s no chance a normal internet user could have written them, they must be machine-made! ;)
Are there any thoughts going on regarding hash cash for blog comments? Now that spam bots are able to analyze content, parse and execute javascript and possible pass captchas, is it time to simply make it more time-consuming for them to do so? Or is this idea dead due to a fundamental flaw in the thinking?
For example, when I want to comment on this article, I could get a hidden field with a checksum of a nonce. Upon submission, I would be required to submit the nonce itself, expensively calculated in JavaScript, and the checksum. The server could quickly verify that I have done the calculation.
This never took off with mail, but I believe that is partly due to inertia and the needed support in mail clients. In the blog comment case, you control part of the client, so that every javascript-enabled browser has support for the scheme.
@clacke Interesting idea but wouldn’t it be kind of cruel to users from slow mobile devices and such? There’s also a huge difference in speed of the JavaScript engines in the current browsers. IE is still very common and way behind in speed (and more).
If the problem gets worse I think I will add a very simple custom text based CAPTCHA. I have that on a page that’s been online and indexed by search engines for 11 years. There’s still not one single spambot that has figured it out. They sure do try to post though.
I’m trying to avoid having to log in (OpenId, Twitter, FB, and so on) to post because I like the idea of it being very simple to comment. Also some commenters may like the anonymity.
Yes, the mobile device issue would probably be a problem. Although with mobile devices these days running on > 1GHz Snapdragons, maybe not as big a problem as you think. I’m not talking about minutes here. Only enough to make it less cost-effective to run large-scale spambots. But maybe someone with more insight will tell me that the spammers have access to huge botnets anyway, making the CPU time cheap enough that a scheme like this is meaningless unless it creates a problem even for legitimate users.
Is your CAPTCHA home-grown? I have been thinking about this before. With a few widespread CAPTCHA implementations, spambots would probably focus on trying to get around those. Any site running a custom implementation, with some personal twist, would need the spammers’ personal attention and thus escape the bulk of the spam.
Yes, and ridiculously simple. The question is “8th word on this page” and the word is always the same and is even present in the title of the page.
Exactly, but only until they perfect their AI. :)
Well, the mother of all wikis at c2.com still uses “Type the code word, 567, here”, which seems to be working reasonably well, combined with the efforts of moderators and ad-hoc temporary write-blocking of troublesome nets.
So, are you predicting spam overtaking military and porn as the main technology driver? ;-)
Spam, bacon, sausage and blog spam: a JavaScript approach - by Sam Saffron
A couple of minutes ago I finally hacked together a JavaScript based approach. It will be very interesting to see if I will get any spam in the next couple of days.
It didn’t work at all. Either they have a spambot that runs JavaScript or it’s posted by humans. Next try will be showing the numbers as images. If it’s a spambot I hope that will stop them for a while.
Now the numbers in the comment calc question are images. Let the countdown begin!
It took them three hours and my conclusion is that this actually is human entered spam. Fascinating! How can that ever be a profitable business?
Possible new kind of spam http://ezliu.com/my-favorite-spammers-world-of-tanks/