Spammy sites are the kiss of death for any social network. One of the my top goals for this site is to maintain it as spam-free as possible.

I got back from vacation and found the site was overrun with spam sites. I have been, for the past few weeks, spending about 10 minutes every night deleting spam sites (I have tools). This manual process worked well, as it yielded no false positives.

While I was gone, there was a crescendo of spam activity, and Tabulas was overrun with spam accounts. I had to turn on some automated tools.

Fortunately, I had anticipated this day a long time ago. Tabulas actually decomposes your entries for metadata (links, images, anything non-text) and stores them. It became a rather trivial task to write a method that would "score" how spammy your site was based on its age, the links posted, and the length of your content.

I turned on the automated tool, and it detected 1000 spam sites. Wowza. Deleted!

I felt pretty good about this tool, so I have it running in the background. Today, it's found about 250 sites which are spammy. I checked my manual tool and found very few spam sites, so the tool is working quite nicely.

The next step will be dealing with all these spam sites that are created content without links. I'm not sure what they're trying to accomplish by posting junk content without links - maybe they plan on linking in later when their link juice is up?

Who knows.

Other changes I made:

  • All comments (from registered users) are now being run through Akismet
  • Site creation from India, China, and Pakistan are banned (they can log-in and post content just fine)
Posted by roy on July 27, 2010 at 09:30 PM

Comment posted on July 28th, 2010 at 05:27 AM
glad to see real people again on the front page. Thanks Roy!

marvellouslyderanged (guest)

Comment posted on July 27th, 2010 at 09:42 PM
cool!! love it,. way to go Roy.. :)