Amazon S3
I've been participating in discussion on digg and TechCrunch regarding the new Amazon service S3. I stay generally on top of new sites launching, and never has a service impressed me so much. The year is early, but I think I can safely say that S3 is going to be one of the top offerings of '06, and the best so far (sorry, Google, nothing you've done this year is worthy of that mention, unfortunately). As more developers wrap their heads around the significance of this offering, and toolkits for S3 are released (I'm working on a PHP class that will fully interact with S3 that I'm going to release under a BSD license as soon as I get it done).
S3 is a new service which allows for scaleable file storage. In laymen's terms:
Imagine you have a computer with 40GB hard drive space. Now, filling that capacity is very easy; it's already set-up. But what happens when you want to fill up to 45GB capacity on your computer? Usually you would buy a CD-burner or maybe a USB external hard drive (or if you're really talented, you'll buy a new hard drive). Now, servers don't have the luxury of using CD backups or USB external hard drives because all data needs to be accessible whenever it's asked. The value in Amazon is that not only are you just paying for the storage you pay for ($0.15/gig month), but that I don't ever have to deal with the hassle of "What do I do when I run out of space?" They automatically handle that. On top of that, all my data is redundant, which means that if one of their computers fails, there's another one ready to serve up my data.
I went into the economics of my decision to shift to S3 in my previous post, so you can read that for the cost breakdown. When looking at a service like this, the cost breakdown isn't the only factor that plays in - the monetary gains won't be hugely significant for other developers who run services (I've just been paying out of the ass for storage, that's all). The primary benefit is the fact that you no longer need to maintain a server on your own, and this is a HUGE benefit. Nothing scares me more than finding out one of my servers died, and that nothing was backed up. Furthermore, they also handle the HTTP requests well; if a file gets slashdotted or digged, you don't have to worry that this'll bring the whole network down to its knees. (Amazon also offers a Bittorent option, but I'll discuss this later)
Technical stuff
Now I'll be delving a bit more into the technical details of this service, so feel free to skip this if you're a casual reader.
S3, by itself, means nothing to consumers. This is the largest misunderstanding behind Amazon S3: People saw $0.15GB/month and immediately thought "backups!" Unfortunately, you can only access S3 right now through a virtual gate which only developers know how to access. This means that, for the immediate future, this site means nothing to consumers, but very much to developers (like me).
The pricing structure of Amazon's S3, amusingly, makes it hard to scale a business around the site. Graphically speaking, this is a graph of how costs are distributed on a site like Tabulas (for running your own vs. using Amazon's S3)
In Section A, it is much cheaper to run your own server up to a certain hardware limit (this depends on the cost of hard drives). In keeping with my analogy before, this would be roughly the time spent to fill that initial hard drive. The curve is shifted because you start with a certain amount of space for a given price (fixed cost).
Once you hit a certain break point (which I've reached for Tabulas), your network requires some level of maintenance and hardware support (load balancing, RAID-cards, redundant backups, networked servers), and this is why the costs increase so rapidly for "running your own." Essentially it equates to the fixed costs of owning your own rack. This is exactly where Amazon's price structure excels; I would say up to about 400GB or so, Amazon's price structure works best.
Section B is where Amazon's price structure beats that of running your own network.
Section C is where the fixed costs of setting up your own internal network start returning better gains per-server added. Economically speaking, this is where you've reached economies of scale. This is generally the stage where large corporations can just buy generic components, throw them on a network, change a config file, and have the network operate smoothly.
The downside to Amazon's business is that the costs are linear; if you want to run anything large-scale, you need diminishing costs per capital in the long run.
Basically this is why S3 won't be incredibly useful for backup services, and why services like box.net aren't threatened by this service. However, those sites will be forced to adopt better UIs for uploading (uploading large files to the web is problematic); the value will be in the ease of backup/restoring through the web interface. Also, sites like box.net will probably have to focus more larger account sizes, as I can easily imagine a business model based on S3 that offers personal hosting of roughly 10GB - 20GB (that seems to be the sweet spot in terms of what people want vs. how Amazon's pricing works) that would effectively cannibalize the low-end of the market (more on this later).
Why is this exciting?
I think this is the most significant web launching of 2006. Amazon's effectively created a new market that doesn't threaten any existing markets (this doesn't replace commodity hosting, doesn't threaten the whole online backup market, doesn't replace larger sites with their networks) while offering a great pricing model (pay as you go!). I actually spent last night shifting Tabulas' backend to S3 with much success, and I have to say that it is *very* well done. It's simple, powerful, and does exactly what it advertises.
As my graph shows, Amazon's value lies in that middle section. What does this mean? S3 allows developers to bootstrap without paying large up-front infrastructure costs. Jeff Jarvis has been talking about the "world is getting smaller." I think reading TechCrunch not only shows you the vast number of small firms that are building applications, but how quickly they're deploying. You usually read product histories with lifecycles of 3-6 months. And it's getting shorter. The technological barriers to market entry are getting lower ... now we also have financial barriers to market decreasing.
I can personally say that S3 is going to make MY life easier in terms of managing and maintaining Tabulas - I can continue to focus on product improvements and customer support. I now have PayPal's IPN handling all my billing needs and Amazon will soon be handling the critical backend work. The goal is to stay small and lean ... cut costs and then maybe I can achieve my goals for Tabulas. And don't be thinking I'm the only one who realizes this ... I can assure you right now there are hundreds of developers who are redoing the calculations for infrastructure costs as they try to build a startup. It's getting easier and cheaper to start-up... what an exciting world.
I had two really cool ideas for this service, but I'll save that for the next post. One of the idea is an expansion of the personal backup solution that I think would work well and be priced well, and the other is a podcasting distribution model.
Comment with Facebook
Want to comment with Tabulas?. Please login.
bert
hapy
PubertY2K