So you want to create a scalable solution but the confidence seems to be low overall. There are shadows of doubts – can we do it? will this really take off? is it too ambitious? … so on and so forth.
But really – why is scalability hard? Haven’t all this been done before? I mean seriously, there are huge web sites out there that already does this sort of thing don’t they? Whatever we want to do is not really a path breaking scientific effort. So why all this hoopla around scalability and why have so many doubts?
Google 2.3 ?
The answer lies in a simple observation. How many Amazon software versions have you seen on sale? How many shrink wrapped facebook or flickr versions have you come across? You don’t see Yahoo 2.0 or Google 2.3 out there do you?
Yes, its true that all we want to do has already been done so many times before. There have been countless instances of huge scale, done by so many teams and probably in much better or complex ways than we can even start to imagine. So let’s examine why shrink wrapped scalable applications are not so readily available like any other offshore-able work.
Compare a public media storage application like YouTube against a massively available Stock Trading Network. The challenges that each face are so different. While a YouTube like service requires to scale Terabytes upon Terabytes of readily available disk storage, a stock trading network would mostly want to control the network latency for sending updates. The data controlled by Flickr is not so critical but the Stock system needs to have a totally fool proof accountability for all the data that it processes. As you can see the requirements are totally different and the way you would do things are totally different too. A solution created in one domain might not be so readily adaptable to another.
Even if one were to take the case of two efforts in the same domain, the approach to scale would be totally different. If Site A chose to use Java + app server based technology Site B might go for a PHP with mySQL based approach. The models used, implementation technologies used and finally the approach used to scale could be totally different based on the design tradeoffs that are made.
Moreover, at the scales that we are talking about, even a single requirement can cause a ripple that needs to be adjusted and accounted for throughout the system.
eg Lets take the case of a basic content management system. Lets assume that Site A has an additional requirement to send change emails as a tracking mechanism every time something changes. A simple functionality change such as this could have profound implications on the latency of the operations, the network bandwidth required, configuration properties and such.
As you can imagine, no single war story is going to be the same. Custom scripting, tweaks, configuration headaches, you name it all of that will be different from implementation to implementation.
More importantly, building a system in place that scales is a totally different effort compared to building one that scales and that can be sold commercially. Usually the scales of economy dictates that not many folks would require such a big solution for any domain. Businesses that usually require such high levels of technology tends to require lots of customizations which makes these huge solutions essential very different from one another. Then again technology is often viewed as a competitive advantage and companies prefer to roll their own rather than buy a ready made solution. If flickr software is commerically available (assuming its possible), how many buys do you anticipate? How would one flickr distinguish itself from another?
Therefore components of such big efforts like the database or the web server or the application server is instead what is generally available. The glue to tie all these into one single big comprehensible (& usually messy) high performance solution is extremely specific and totally incompatible from effort to effort.
Further, the aforementioned components of your architecture again are usually customized to be easy to use for the most number of customers because thats probably where the most money is. No architect i know would even bat an eyelid for a solution that requires a central Database of a normal size (2-4GB) and causes a fairly decent amount of hits. Everyone knows that the component would work out of the box probably aided by some small customizations that the installation can do for you.
The moment you deviate from the norm, things start becoming more complicated and less user friendly. The assumption that goes with this is that if a customer’s solution requires to do such work, he is also prepared to take the extra cost of employing a special administrator like a DBA or can pay for a special installation by the solution experts. Often this is also necessitated by the sheer amount of configuration required. Any DBA will tell you that no database of a fairly large size can run properly without custom tweaks ( de-normalize table A + put table B on a different server + use Indexing on table C and remove the one on Table D). Creating a self handling maintainable big solution is therefore really a big challenge,
A single installable that that puts your favorite editor in your laptop is therefore a far cry from these beasts of complexity with huge amounts of configurations and weeks of hands on learning curve just to be fairly conversant with the entire system.
Thats right. These big beasts are a system on their own. They are never referred to as a product for exactly the same reason. These efforts are created as a solution for a range of problems that needs to be tackled in an eco-system and as a result encompasses a vast range of technologies and platforms. This in turn usually translates into tons of custom code and glues which are rarely designed as a reusable sellable generic library specifically because they are so custom made in nature.
Any project thats fairly large in nature attracts different set of laws and penalties and process compared to the smaller efforts that people are normally used to.
Sell it too?
Oh and if you are trying to make a resellable scalable aka big solution – well thats much more of an effort than simply building one that works. You have to make all those custom configurations and scripts and tweaks customizable and easy to find and all that. This often means that the system has to take care of itself, is auto-configurable to an extend and has built in self diagnostics. Thats a toll order for anyone or anything in engineering.
This is why creating scalable applications is such a big deal and is still a new problem every-time you deal with it. What you are attempting to build is simply too big a project.