I've been doing some large-scale API/syndication work (part of the rykorp/Opinmind partnership). After pushing all public data to Opinmind, I got a number back of roughly ~400K entries being pushed to them. I found this rather interesting, since the current database counts show close to 900,000 entries are posted.

There are roughly ~110,000 entries that are posted at a higher privacy level than "friends." ~65K are posted at "friends-only", while ~45K are posted at "private."

This means roughly 1/8 of all posts are published specifically to be protected - if you calculate the number of accounts which specifically ask search engines not to crawl them and those which list themselves as not "public" to the Tabulas directory, most of Tabulas is actually behind the gated community.

I can only imagine this number will increase dramatically once the new Tabulas privacy controls get released - how many people will actually opt to publish everything publicly when they will have greater granularity over who can read it?

This seems to be slightly related to the whitelist/blacklist methodology in combatting spam (speaking of which, I spent about an hour today cleaning up comment spam on Tabulas; before today, roughly 1/12th of all comments on Tabulas were from comment spammers!).

Whitelist spamming basically means email providers will only accept email from providers they know are genuine - everybody else gets bounced. Blacklist spamming works the other day, the email provider will accept all emails, except from those they know are bad.

Right now, people are trusting the public by default and "blacklisting" their more private entries... given how prevalent employer Googling is becoming, how long before people just go to the whitelist approach - all entries are only viewable by people they trust?

As much as I've wanted to push more open standards between sites (I've been doing a lot of API development for Tabulas lately), it seems to be largely pointless since most users don't care - look at the largest community sites (LJ, Xanga, MySpace, YouTube, etc.) ... they're all gated communities.

Posted by roy on February 20, 2006 at 01:06 AM in Web Development | 3 Comments

Comment posted on February 21st, 2006 at 08:40 PM
Wow, that's actually really interesting. So many private entries, I had no idea. I think I've posted 3 in all my time here. But yea, I agree with Allen. There are "friends" and then there are "friend-friends." Hierachy situation.

Fortunately, I don't have very much to hide from the public. My posts are pretty silly in general.
Comment posted on February 20th, 2006 at 02:16 PM
i'm looking forward to the new privacy controls roy, there have definitely been times when I didn't post because some of my "friends" should not be privy to the information.
Comment posted on February 20th, 2006 at 01:59 AM
then it's time to open up those gates... yeah-- i don't know what i'm talking about.