The joys of maintenance. So much fun (not). Late last night as I was pondering what my life would be like if I were married to Keira Knightly, I also started thinking about how to scale Tabulas to more servers.

I was thinking about my previous idea of having one superserver database ... but then I thought ... hrm that won't work for the entry data. Entry data is currently being fulltext indexed which allows it to be searchable. This isn't really effective since there are only about a 100 people who can actually search their entries. Eventually the entry table will become too big for one server to handle.

So I began thinking about clustering the users, much like LiveJournal does. I have no idea how this would work, so I guess I'll write out ideas and implement them as I need to in the future.

I'm guessing the superdataserver can store all relevant metadata (user data, sessions data, profile data, etc. etc.). However, each cluster would contain the entry for users. I can limit each cluster to 20000 users each (so each server would only be responsible for storing entry data for 20000 users).

The problem here is with features like the "friends" view, the server might have to instantiate tons of new mysql connections simply to grab the friends entry data. Right now it's all on one server so it's not a problem to run complex queries ... but if the server had to connect to multiple servers and then organize them, that would even more CPU intensive ...

Hrm. Not sure how that would work.

Right now I'm spending time optimizing the Tabulas server (which is shared with 150 other sites). I'm turning off a bunch of services and vacuuming out a lot of nasty postgresql tables which were killing performance every night from 11pm - 2am. Hopefully tonight the only thing running between those hours will be the standard updatedb.

We'll see.
Posted by roy on January 6, 2004 at 02:51 PM in | 8 Comments

Related Entries

Want to comment with Tabulas?. Please login.

Comment posted on January 7th, 2004 at 05:00 PM
to solve this problems you need some statistics

Do a majority of the database page loads come from a few popular blogs or mostly from the many smaller blogs?
If your lucky and most of come from the popular blogs, then just set off the caching after a certain amount of views per day like you said.

Also it would be interesting to know how many people think the friends and friends of are an important part of tabulas. They seem like the big problem to me.

vlad (guest)

Comment posted on January 6th, 2004 at 10:30 PM
the way i see it, the hard drive is working regardless. if not to retrieve a cached page, then to retrieve the code and database info to build a dynamic page. i can't be certain that the former would be an improvement, but i'd at least try it, when you have the desire/time.
Comment posted on January 6th, 2004 at 11:42 PM
hrm. the more i think about it, it wouldn't be a bad idea *if* database use got to be horrible to switch the "front page" of each user's tabulas to their RSS feed.

so in the logic it could store how often that page is requested per time frame; if it exceeds a certain amount, it can use the RSS feed to generate that page.
Comment posted on January 6th, 2004 at 11:32 PM
The hard drives don't spin as much for database; it's mostly handled through memory.

There's some misconception that handling data from databases is "slow." This is only true if you don't have optimizied queries. Go try running apachebench on drawing data out from DB from parsing a flat file in PHP... it's almost negligible.

But if I were to "cache" the entries, I would have to cache each entry as a separate file. And imagine on each page if PHP had to load up, parse 8 files+ and then display them.

I do cache certain things on the site (the front page for example), but caching on a per-entry basis would be counterproductive and would actually be adverse to the performance of the site...

Caching does work on a small scale, but on a large site like this, it doesnt scale very much.

I mean, if this were true, why don't we all go back to Perl and flat text files ;)

vlad (guest)

Comment posted on January 6th, 2004 at 07:34 PM
why don't you try caching the output from the pages? on a large site, this would make a huge reduction in the number of database queries.

something like this.
Comment posted on January 6th, 2004 at 08:23 PM
It does reduce database queries, but at the cost of putting the data on the hard drive.

You're moving the load off of the CPU/memory onto the hard drive (now the hard drive has to spin up for every entry).

I did consider caching the data, but it's not just worth it in terms of efficiency (like the hard drive spinning up) as well as the convenience of having to store the data somewhere.

MacDaddyTatsu (guest)

Comment posted on January 6th, 2004 at 04:29 PM
Index us alphabetically by username or e-mail address?

MacDaddyTatsu (guest)

Comment posted on January 6th, 2004 at 04:29 PM
Index us alphabetically by username or e-mail address?