This document contains only my personal opinions and calls of judgement, and where any comment is made as to the quality of anybody's work, the comment is an opinion, in my judgement.
[file this blog at: digg del.icio.us Technorati]
As usual I am skeptical in most cases about the idea of large centralized computer systems, in particular for storage, while there is a fairly reasonable case for centralized computation, and the main difference is in the perishable nature of computation and its scalability.
Centralization is based on the assumption that these:
are disdadvantages worth suffering for the advantages of:
For centralized computation in the now common form of
computation clusters the advantages are fairly valuable (except
Easier communication among users which is rarely
important), as a bigger cluster can handle bigger problems in a
shorter time (if they scale), can have lower latency if in a
small physical volume, and most importantly because it can load
balance. The latter is very important because computation is a
very perishable product: any unused CPU time is wasted
Consider the obvious case of two clusters each of which is 50% busy over a long period of time: odds are that there are many times when one is 100% busy, and has a queue of waiting jobs, and the other isn't, and a single cluster twice as large would be able to process some of the queue of the first with the spare capacity of the second.
Not only is CPU time very perishable, but it is also rather highly reproducible (just restart the job when there is a fault) and its use is very concentrated: when a computation oriented job run it uses up as many resources as possible that are available.
Large storage systems have a very different profile: most of the data is merely stored instead of used (a very small part of the data on a sotrage system is in use at any one time, storage space is not perishable, as any unused space remains available, and its the cost of it being unused is often low, and latency matters a lot less.
But for many usage patterns the advantages of centralization do not matter for a storage system because:
Of course all these arguments are somewhat different at different scales; for example currently 1TB drives are fairly popular, and let's consider for a typical corporate storge pool systems design, say for an organization with 1,000 users:
Overall I assume that usually a good size for an independent storage server is to server more than 10 but less than 50 users, or the number of users and computers that can be sensibly attached to a single LAN, that is a single switch, which usually have 24 or 48 ports.
Alternatively of the size that fits in a full format tower case, or a 1U or 2U rackable case, that is 6-8 drives, one that can be installed in an ordinary office under a desk or on a shelf. Then there are delightful products like the 4U Thumper for the more massive designs, but these really require racking in a computer room.
My usual detail preferences of course apply: to use RAID10 either within each storage servers, or by using DRBD to mirror across 2 servers and then RAID0 the mirror pairs or use Lustre with striping to achieve much the same thing.