tackling comment spam
Tonight I implemented the new global comment spam filters. A few weeks ago, I had built some internal tools that let me quickly track and delete comment spam. Being a reactive stopgap, I knew I would eventually have to build some sort of preventive filter to stop comments from even getting into journals.
And so this week I commited myself to building the new comment spam filter. And I think it's done.
Fortunately for me, comment spam is localized to a few specific keywords: viagra, cialis, vicodine, xanax, zoloft, casino, blackjack, backgammon, hgh, and a few more I can't remember off the top of my head.
Again, fortunately, these are only being posted by guests. Because I don't have the resources (both time, expertise, and server power) to build a "true" Bayesian filter, I've built a rather basic implementation that simply does a keyword check against known "flag" words that I predefine (in future iterations of the spam filter, users will be able to flag comments as spam and delete, which should send the comment to a script which determines keywords with high probability of being associated with spam).
If you're a guest, posting a comment, and you have flag words in your comment, guest name, or in the link, you will *then* be prompted with a captcha.
A captcha (completely automated public Turing test to tell computers and humans apart) is one of those "please input the letters from the following image" type tests, like so:
I absolutely HATE captchas. HATE HATE HATE. I have failed some of those tests repeatedly (which I guess makes me a computer) which is incredibly frustrating. I even posted a joke captcha a while ago:
In any case, I decided to bite the bullet and implement the most rudimentary of all captcha images:
Yes, that is my captcha image. It is mind-numbingly easy to read. Yes, I realized it's very susceptible to screen readers, but if a spammer starts using screen reader, I can simply spend a few hours obfuscating the text a bit then. No need to do premature optimization on the obfuscation, especially since I hate impossible-to-read-captchas.
So basically, anonymous guest posters posting comments with bad spam keywords hit this prompt:
If they pass, then the comment gets posted. Otherwise it's ignored. It's not the most advanced solution, but I'm pretty sure it's going to be enough to cut back on a majority of comment spam as it is - as I keep tuning spam filter, I'm sure I can hit a higher percentage of comments.
So if you see any weird things happening with comments, please let me know.
dwooillk
Leedar
Tallullah
roy
Now the comment spam gets blocked before it even gets posted.
Tallullah
hapy
tonylee
tonylee
tonylee