tackling comment spam
Tonight I implemented the new global comment spam filters. A few weeks ago, I had built some internal tools that let me quickly track and delete comment spam. Being a reactive stopgap, I knew I would eventually have to build some sort of preventive filter to stop comments from even getting into journals.
And so this week I commited myself to building the new comment spam filter. And I think it's done.
Fortunately for me, comment spam is localized to a few specific keywords: viagra, cialis, vicodine, xanax, zoloft, casino, blackjack, backgammon, hgh, and a few more I can't remember off the top of my head.
Again, fortunately, these are only being posted by guests. Because I don't have the resources (both time, expertise, and server power) to build a "true" Bayesian filter, I've built a rather basic implementation that simply does a keyword check against known "flag" words that I predefine (in future iterations of the spam filter, users will be able to flag comments as spam and delete, which should send the comment to a script which determines keywords with high probability of being associated with spam).
If you're a guest, posting a comment, and you have flag words in your comment, guest name, or in the link, you will *then* be prompted with a captcha.
A captcha (completely automated public Turing test to tell computers and humans apart) is one of those "please input the letters from the following image" type tests, like so:
I absolutely HATE captchas. HATE HATE HATE. I have failed some of those tests repeatedly (which I guess makes me a computer) which is incredibly frustrating. I even posted a joke captcha a while ago:
In any case, I decided to bite the bullet and implement the most rudimentary of all captcha images:
Yes, that is my captcha image. It is mind-numbingly easy to read. Yes, I realized it's very susceptible to screen readers, but if a spammer starts using screen reader, I can simply spend a few hours obfuscating the text a bit then. No need to do premature optimization on the obfuscation, especially since I hate impossible-to-read-captchas.
So basically, anonymous guest posters posting comments with bad spam keywords hit this prompt:
If they pass, then the comment gets posted. Otherwise it's ignored. It's not the most advanced solution, but I'm pretty sure it's going to be enough to cut back on a majority of comment spam as it is - as I keep tuning spam filter, I'm sure I can hit a higher percentage of comments.
So if you see any weird things happening with comments, please let me know.
dwooillk
![](http://ic.tabulas.com.s3.amazonaws.com/1000/panda_icon.jpg)
Leedar
![](http://ic.tabulas.com.s3.amazonaws.com/8000/[8000]Clno.png)
Tallullah
![](http://ic.tabulas.com.s3.amazonaws.com/5919/Celtic-tree-sm.jpg)
roy
![](http://ic.tabulas.com.s3.amazonaws.com/2/roy-small.gif)
Now the comment spam gets blocked before it even gets posted.
Tallullah
![](http://ic.tabulas.com.s3.amazonaws.com/5919/Celtic-tree-sm.jpg)
hapy
tonylee
tonylee
tonylee