As you probably know, the companies that “run the web” (e.g. Google, Yahoo, Amazon, eBay, etc.) generally don’t use the “big iron” mainframe infrastructures used by many of our clients, but instead focus on using LOTS of cheap commodity boxes (e.g. Google had ~450,000 in 2006).
Yahoo has an interesting article discussing an open source effort, (Hadoop), to implement one of the pieces of infrastructure that these companies typically use (and have built custom internally). Essentially, Hadoop is an open source software platform that lets applications process huge amounts of information within large clusters, and it’s already being used/taught/researched in some colleges. Technically, it’s an implementation of MapReduce, which is one of Google’s internal framework.
There have been many discussions of computers becoming “part of the grid”, “a metered utility”, etc. And companies like Amazon are already offering proprietary versions of things like this with theirAmazon S3 Service and to some extent their other Amazon Web Services.
So it will be interesting to see if Hadoop and future open source equivalents of technologies likeGoogle File System, Google Sawzall, and Google BigTable start to become the new “distributed grid” based version of a modern “open source mainframe.” And to see if these technologies start to become the massive distributed version of LAMP for small scale web systems.
And finally, it will be interesting to see if, how, and when things like this start to penetrate our typical Fortune 500 clients and if they start to displace “big iron” mainframes.
Even if it never does, it’s still interesting stuff, enjoy…