Thursday, November 10, 2011

Questions about SSD / Flash Memory

  1. Seagate, in 2010, quote their SSD UER specs as:
    Nonrecoverable read errors, max: 1 LBA per 10^16 bits read
    where a Logical Block Address (LBA) is 512 bytes. Usually called a 'sector'.

    But we know that Flash memory is organised as 64Kb blocks (min read/write unit).
    Are Seagate saying that errors will be random localised cells, not "whole block at a time"?
    Of course, the memory controller does Error Correction to pick up the odd dropped bits.
  2. Current RAID schemes are antagonistic to Flash Memory:
    The essential problem with NAND Flash (EEPROM) Memory is that it suffers "wear" - after a number of writes and erasures, individual cells (with MLC, cell =/= bit) are no longer programmable. A secondary problem is "Data Retention". With the device powered down, Seagate quote "Data Retention" of 1 year.

    Because Flash Memory wears with writes, batches of components will likely have very similar wear characteristics and if multiple SSD's are mirrored/RAIDed in a system they will most likely be from the same batch, evenly spread RAID writes (RAID-5 writes two physical blocks per logical block) will cause a set of SSD's to suffer correlated wear failures. This is not unlike the management of piston engines in multi-engined aircraft: avoid needing to replace more than one at a time. Faults, Failures and Repair/Install Errors often show up in the first trip. Replacing all engines together maximises the risk of total engine failure.

    Not only is this "not ideal", it is exactly worst case for current RAID.
    A new Data Protection Scheme is required for SSD's.
    1. Update 23-Dec-2011. Jim Handy in "The SSD Guy" blog [Nov-17, 2011] discusses SSD's and RAID volumes:
      So far this all sounds good, but in a RAID configuration this can cause trouble.  Here’s why.

      RAID is based upon the notion that HDDs fail randomly.  When an HDD fails, a technician replaces the failed drive and issues a rebuild command.  It is enormously unlikely that another disk will fail during a rebuild.  If SSDs replace the HDDs in this system, and if the SSDs all come from the same vendor and from the same manufacturing lot, and if they are all exposed to similar workloads, then they can all be expected to fail at around the same time.

      This implies that a RAID that has suffered an SSD failure is very likely to see another failure during a rebuild – a scenario that causes the entire RAID to collapse.
  3. What are the drivers for the on-going reduction in prices of Flash Memory?
    Volume? Design? Fabrication method (line width, "high-K", ...)? Chip Density?

    The price of SSD's has been roughly halving every 12-18 months for near on a decade, but why?
    Understanding the "why" is necessary to be forewarned of any change to the pattern.
  4. How differently are DRAM and EEPROM fabricated?
    Why is there about a 5-fold price difference between them?
    Prices (Kingston from same store, http://msy.com.au, November 2011):
    DDR3 6Gb 1333Mhz $41 $7/Gb
    SSD 64Gb  $103    $1.50/Gb
    SSD 128Gb  $187 $1.25/Gb
    

    It would be nice to know if there was a structural difference or not for designing "balanced systems", or integrating Flash Memory directly into the memory hierarchy, not as a pretend block device.
  5. Main CPU's and O/S's can outperform any embedded hardware controller.
    Why do the PCI SSD's not just present a "big blocks of memory" interface, but insist on running their own controllers?
  6. Hybrid SSD-HDD RAID.
    For Linux particularly, is it possible to create multiple partitions per HDD in  a set, then use one HDD-partition to mirror a complete SSD.  The remaining partitions can be setup as RAID volumes in the normal way.
    The read/write characteristics of SSD and HDD are complementary: SSD is blindly fast for random IO/sec, while HDD's currently stream reads/writes at higher sustained writes.
    Specifically, given 1*128Gb SSD and 3*1Tb HDD, create a 128Gb partition on all HDD's. Then mirror (RAID-1), the SSD and one or more of the HDD's (RAID-1 isn't restricted to two copies, but can have as many replicas as desired to increase IO performance or resilience/Data Integrity). Remaining 128Gb HDD partitions can be stripped, mirrored or RAIDed amongst themselves or to other drives. The remaining HDD space can be partitioned and RAIDed to suit demand.

    Does it make sense, both performance- and reliability-wise, to mirror SSD and HDD?
    Does the combination yield the best, or worse, of both worlds?

    Is the cost/complexity and extra maintenance worth it?

No comments: