Analysis Beam may be one acid bend of accumulator action, but big abstracts is causing developments at the added ancillary of the accumulator pond, with IBM developing a 120 petabyte 200,000-disk array.
The boss drive is actuality developed for a abstruse supercomputer-using chump “for abundant simulations of real-world phenomena” according to MIT’s Technology Review, and takes accepted large-array technology trends a footfall or two further.
IBM Almaden accumulator systems analysis administrator Bruce Hillsberg says that 200,000 SAS deejay drives are involved, rather than SATA ones, because achievement is a concern. A back-of-an-envelope adding suggests 600GB drives are actuality acclimated and Seagate 2.5-inch Savvios appear to mind.
We’re told that added racks than accustomed are actuality acclimated to board the drives in a abate bulk of floorspace than accepted racks would require. Also these racks are water-cooled rather than fan-cooled, which would accept analytic if advanced drawers awash abounding of baby anatomy agency (SFF) drives were actuality used.
Some 2TB of accommodation may be bare to authority the book abstracts for the billions of files in the array. The GPFS alongside book arrangement is actuality acclimated with a adumbration that beam anamnesis accumulator is acclimated to acceleration its operations. This would announce that the 120PB arrangement would include, say, some Violin Anamnesis arrays to authority the meta-data, and would browse 10 billion files in about 43 minutes.
RAID 6, which can assure adjoin two drive failures, is not abundant – not with 200,000 drives to attending afterwards – and so a multi-speed RAID accoutrement is actuality developed. Multiple copies of abstracts would be accounting and striped so that a distinct drive abortion could be acceptable easily. A bootless drive would be rebuilt boring in the background. The clean would not apathetic the accessing supercomputer bottomward abundant if at all. A dual-drive abortion would accept a faster re-build. A three-drive abortion would get a faster clean again, with, we assume, the compute ancillary of the supercomputer slowing bottomward somewhat due to a lower arrangement I/O rate.
Hillsberg doesn’t say how abounding drives could accompanying fail. The MIT commodity argument says the arrangement will be “a arrangement that should not lose any abstracts for a actor years afterwards authoritative any compromises on performance”. Really, accord it a rest, this is business BS. Having it assignment and not lose abstracts for 15 years will be acceptable enough.
We’re absorbed that constant deejay drives are actuality acclimated – apparently all the abstracts on the arrangement will be classed as primary data, afar from the book meta abstracts which will charge a beam speed-up. That agency no tieringcomputer application is needed.
There will be acquaint actuality for added big abstracts drive arrangement suppliers, such as EMC’s Isilon unit, DataDirect and Panasas. It will be absorbing to see if they carelessness accepted racks in favour of added units, SFF drives, water-cooling and bigger RAID algorithms too.
Bootnote
Storage-heavy supercomputer simulations are acclimated in such tasks as acclimate forecasting, seismic analysis and circuitous atomic science – but there would accept no acumen to accumulate any such customer’s character a secret. Another breadth in which supercomputer simulations are important is nuclear weapons: afterwards alive tests it becomes a difficult predictive assignment to acquaint whether a warhead will still assignment afterwards a accustomed aeon of time. As a result, the US nuclear ammunition labs are leaders in the supercomputing field.