Sunday, April 20, 2014

Storage: Challenges of high-count disk enclosures

Stuffing 500-1,000 2.5" drives in a single enclosure may be technically possible, but how do you make those drives do anything useful?

Increasing drives per enclosure from 15-45 for 3.5" drives to 1,000 requires a deep rethink of target market, goals and design.

Not the least is dealing drive failures. With an Annualised Failure Rate (AFR) of 0.4%-0.75% now quoted by Drive Vendors, dealing with 5-15 drive failures per unit, per year is a given. In practice, failure rates are at least twice the Vendor quoted AFR not the least because in systems, conditions can be harsh and other components/connectors also fail, not just drives. Drives have a design life of 5 years, with an expected duty-cycle. Consumer-grade drives aren't expected to run 24/7 like the more expensive enterprise drives. Fail Rates over time, when measured on large fleets in service, increase over time and considerably towards end of life.

It's isn't enough to say "we're trying to minimise per unit costs", all designs do that, but for different criteria.
What matters is the constraints you're working against or parameters being optimised.
Competing design & operational dimensions start with:
  • Cost (CapEx, OpEx and TCO),
  • Performance,
  • Durability, and
  • Scalability
'Performance' is a multi-faceted dimension with many meanings. It's broken into Internal and External at the highest level.
What type and speed of External interface is needed? SAS, Fibre Channel, Infiniband, or Ethernet?
If Ethernet, 1Gbps, 10Gbps or 40-100Gbps? The infrastructure costs of high-bandwidth external interfaces goes far beyond the price of the NIC's: patch-cables, switches, routers in the data-centre and more widely are affected.
Does the Internal fabric designed to be congestion-free with zero contention access to all drives, or does it meet another criteria?

There are at least 4 competing parameters that can be optimised:
  • raw capacity (TB)
  • (random) IO/second, both read and write.
  • streaming throughput
  • Data Protection as RAID resilience
The hardware architecture can be High Availability (many redundant and hot-plug sub-systems), simple or complex, single layer or multi-layer and with or without customised components.

Close attention to design detail is needed to reduce per-IO latency to microseconds, necessary for high random IO/sec. 1,000x 5400 RPM HDD's can support 180k IO/sec, or around 6 microseconds per IO.

Marketing concerns, not engineering, mostly drive these considerations.
This forces a discipline on the engineering design team: to know exactly their target market, what they need, what they value and what price/capability trade-offs they'll wear.

RAID resilience

This falls broadly into three areas:
  • Spare policy and overhead,
  • Parity block organisation overhead, and
  • Rebuild time and performance degradation during rebuilds.
Different choices all affect different aspects of performance:
  • RAID-5 reduces write-performance by a factor of 3 for streaming and random IO.
  • while RAID-6 burns CPU's in the Galois Field calculations needed for the 'Q' parity blocks,
  • RAID-1 and RAID-10 are simple, low-CPU solutions, but cost more in capacity and don't offer protection against all dual-drive failures.
The essential calculation is the likelihood of specific types of failure creating a Data Loss event: Mean Time to Data Loss. 

No comments: