January 6, 2004
Ah, the joys of maintenance
The joys of maintenance. So much fun (not). Late last night as I was pondering what my life would be like if I were married to Keira Knightly, I also started thinking about how to scale Tabulas to more servers.
I was thinking about my previous idea of having one superserver database ... but then I thought ... hrm that won't work for the entry data. Entry data is currently being fulltext indexed which allows it to be searchable. This isn't really effective since there are only about a 100 people who can actually search their entries. Eventually the entry table will become too big for one server to handle.
So I began thinking about clustering the users, much like LiveJournal does. I have no idea how this would work, so I guess I'll write out ideas and implement them as I need to in the future.
I'm guessing the superdataserver can store all relevant metadata (user data, sessions data, profile data, etc. etc.). However, each cluster would contain the entry for users. I can limit each cluster to 20000 users each (so each server would only be responsible for storing entry data for 20000 users).
The problem here is with features like the "friends" view, the server might have to instantiate tons of new mysql connections simply to grab the friends entry data. Right now it's all on one server so it's not a problem to run complex queries ... but if the server had to connect to multiple servers and then organize them, that would even more CPU intensive ...
Hrm. Not sure how that would work.
Right now I'm spending time optimizing the Tabulas server (which is shared with 150 other sites). I'm turning off a bunch of services and vacuuming out a lot of nasty postgresql tables which were killing performance every night from 11pm - 2am. Hopefully tonight the only thing running between those hours will be the standard updatedb.
We'll see.
I was thinking about my previous idea of having one superserver database ... but then I thought ... hrm that won't work for the entry data. Entry data is currently being fulltext indexed which allows it to be searchable. This isn't really effective since there are only about a 100 people who can actually search their entries. Eventually the entry table will become too big for one server to handle.
So I began thinking about clustering the users, much like LiveJournal does. I have no idea how this would work, so I guess I'll write out ideas and implement them as I need to in the future.
I'm guessing the superdataserver can store all relevant metadata (user data, sessions data, profile data, etc. etc.). However, each cluster would contain the entry for users. I can limit each cluster to 20000 users each (so each server would only be responsible for storing entry data for 20000 users).
The problem here is with features like the "friends" view, the server might have to instantiate tons of new mysql connections simply to grab the friends entry data. Right now it's all on one server so it's not a problem to run complex queries ... but if the server had to connect to multiple servers and then organize them, that would even more CPU intensive ...
Hrm. Not sure how that would work.
Right now I'm spending time optimizing the Tabulas server (which is shared with 150 other sites). I'm turning off a bunch of services and vacuuming out a lot of nasty postgresql tables which were killing performance every night from 11pm - 2am. Hopefully tonight the only thing running between those hours will be the standard updatedb.
We'll see.
Posted by roy on January 6, 2004 at 02:51 PM in | 8 Comments
Comment with Facebook
Want to comment with Tabulas?. Please login.
kdb003
Do a majority of the database page loads come from a few popular blogs or mostly from the many smaller blogs?
If your lucky and most of come from the popular blogs, then just set off the caching after a certain amount of views per day like you said.
Also it would be interesting to know how many people think the friends and friends of are an important part of tabulas. They seem like the big problem to me.
vlad (guest)
roy
so in the logic it could store how often that page is requested per time frame; if it exceeds a certain amount, it can use the RSS feed to generate that page.
tabulas
There's some misconception that handling data from databases is "slow." This is only true if you don't have optimizied queries. Go try running apachebench on drawing data out from DB from parsing a flat file in PHP... it's almost negligible.
But if I were to "cache" the entries, I would have to cache each entry as a separate file. And imagine on each page if PHP had to load up, parse 8 files+ and then display them.
I do cache certain things on the site (the front page for example), but caching on a per-entry basis would be counterproductive and would actually be adverse to the performance of the site...
Caching does work on a small scale, but on a large site like this, it doesnt scale very much.
I mean, if this were true, why don't we all go back to Perl and flat text files ;)
vlad (guest)
something like this.
roy
You're moving the load off of the CPU/memory onto the hard drive (now the hard drive has to spin up for every entry).
I did consider caching the data, but it's not just worth it in terms of efficiency (like the hard drive spinning up) as well as the convenience of having to store the data somewhere.
MacDaddyTatsu (guest)
MacDaddyTatsu (guest)