Monday, October 31, 2011

RAID, Backups and Recovery

There's an ironclad Law of SysAdmin:
"Nobody ever asks for backups to be done, only ever 'restores'"
Discussing RAID and Reliable Persistent Storage cannot be done without reference to the larger context: Whole System Data Protection and Operations.

"Backups" need more thought and effort than a 1980-style dump-disk-to-tape. Conversely, as various politicians & business criminals have found to their cost with "stored emails", permanent storage of everything is not ideal either, even if you have the systems, space and budget for media and operations.

"Backups" are more than just 'a second copy of your precious data' (more fully, 'of some indeterminate age'), which RAID alone partially provide.

Sunday, October 30, 2011

Revisting RAID in 2011: Reappraising the 'Manufactured Block Device' Interface.

Reliable Persistent Storage is fundamental to I.T./Computing, especially in the post-PC age of "Invisible Computers". RAID, as large Disk Arrays backed up by tape libraries, has been a classic solution to this problem for around 25 years.

The techniques to construct "high-performance", "perfect" devices with "infinite" data-life from real-world devices with many failure modes, performance limitations and a limited service life vary with cost constraints, available technologies, processor organisation, demand/load and expectations of "perfect" and "infinite".

RAID Disk Arrays are facing at least 4 significant technology challenges or changes in use as I.T./Computing continues to evolve:
  • From SOHO to Enterprise level, removable USB drives are becoming the media of choice for data-exchange,  off-site backups and archives as the price/Gb drives multiples below tape and optical media.
  • M.A.I.D. (Massive Array of Idle Disks) is becoming more widely used for archive services.
  • Flash memory, at A$2-$4/Gb, packaged as SSD's or direct PCI devices, is replacing hard disks (HDDs) for low-latency random-IO applications. E-bay, for example, has announced a new 'Pod' design based around flash memory, legitimising the approach for Enterprises and providing good "case studies" for vendors to use in marketing and sales.
  • As Peta- and Exabyte systems are designed and built, its obvious that the current One-Big-Box model for Enterprise Disk Arrays is insufficient. IBM's General Parallel File System (GPFS) home page notes that large processor arrays are needed to create/consume the I/O load [e.g. 30,000 file creates/sec] the filesystem is designed to provide. The aggregate bandwidth provided is orders of magnitude greater than can be handled by any SAN (Storage Area Network,  sometimes called a 'Storage Fabric').
Internet-scale datacentres, consuming 20-30MW, house ~100,000 servers and potentially 200-500,000 disks, notably not as Disk Arrays. Over a decade ago, Google solved its unique scale-up/scale-out problems with whole-system replication and GFS (Google File System): a 21st century variant of the 1980's research work at Berkeley by Patterson et al called "N.O.W." (Network of Workstations). A portion of which was the concept written up in 1988 as the seminal RAID paper: "Redundant Arrays of Inexpensive Disks".

Tuesday, October 25, 2011

My Ancient Computer Project

Preamble: If you're really looking for advice on floppy drives: 8", 5¼" and 3½", try these links:
http://www.retrotechnology.com/herbs_stuff/s_drives_howto.html
http://www.classiccmp.org/dunfield/
http://www.deviceside.com/

The good folk at "Device Side Data" sell a USB adaptor for 5¼ inch floppy drives and a range of relevant cables, power-supplies and enclosures.
For about US$100, you can have a working  5¼in USB setup, but it's bring-your-own-drive.
Sourcing a 5¼ inch floppy drive may be tricky: they went out of production 10-15 years ago.
Sourcing 5¼ inch floppy disks, new or 'slightly used' is probably a greater challenge.



Some weeks ago a friend (AF) asked if I could copy a 5¼ inch floppy disk for him, leading to this little adventure...

AF had a 3½ inch floppy that he might have copied everything onto and wanted to check.
Without specialist equipment, I knew I wouldn't be able to recover all potentially readable data, but I offered to do what I could with a standard drive.

In the end, AF decided to try something else, so I  didn't get to do his copy.
But it did make me get my old 386-SX properly setup, networked so I could move files to/from it and a way to easily copy 5¼ in and 3½ floppy disks.
Before this, I still booted it occassionally, but the real-time clock battery had died and it had no network card or CD-ROM drive.

This decision led to weeks of farnarkling and some interesting lessons.

While it's unclear if my 386-SX will survive another 2 decades, the software can live on through tools like QEMU, WINE/Crossover and even DOSBOX. So there is some value in recovering the data both on the hard disk and my collection of floppies.
The impetus to recover data from unreadable media, 5¼in floppies, is obvious.
For the readable 3½in floppies, taking a copy now is a good investment of time if I ever want to access the data again: magnetic media does degrade over time. In another 20 years, the coating on those floppies may be flaking off in big lumps.

386-SX Initial Config:
  • purchased late 1991, ~$3250
  • 386-SX CPU, 20Mhz (selectable to 8Mhz). No FPU. 8Mhz ISA bus, 8-slots.
  • Dual floppy drives, 5¼ in [boot] and 3½ in.
  • IDE disk. 80Mb, WD AC280.
  • single parallel port, dual serial ports, one used for mouse, other for modem.
  • 5Mb RAM [max at time]
  • Super VGA1024x768 16-col, 640x480, 256 col. 512Kb [K-i-l-o not Mb]
  • 33cm display. [fixed scanrate, can be destroyed @ wrong Hz]
  • mini-tower. pre-ATX power-supply (no 'soft' power switch or 'halt')
  • No sound-card.
  • DOS 5.0 and Windows 3.0
Current state-of-play is:
  • 386-SX has DOS 6.20, Win 3.11, Networking, CD-ROM and Zip drive all working.
    Hard-drive cloned/backed up and (5) ZIP disks read and copied. [ZIP drive back in its box]
    Single 3½in floppy as A: drive. BIOS only allows booting from first drive.
  • 5¼ in drive working as 2nd drive in a Linux machine (2001, Celeron 667) with built-in networking.
  • Working through copying all  5¼ in floppies I can find.
    Tally so far [40]: 1 unreadable, 1 with errors.
    Update 04-Nov-2011: 210 floppies read, 40-50 5 in: 8 "no data". 150+  3½in, 5 "no data"
  • Now have a USB 3½ in floppy drive, can read those at leisure on newer machines.
The most important lesson for me came about two weeks in:
  • I was fixated on doing everything on my 1991 vintage 386-SX.
    At one point I was running through the options of replacing the motherboard and the various costs which weren't attractive given I was 'just playing'.
  • Since USB became ubiquitous, finding machines/motherboards with floppy drive controllers is increasingly difficult, which means even embedded boards with FDC's are rare and expensive.
  • Then I realised I already had everything I needed in the 2001 vintage Celeron system I had tucked away.
    It's loaded with Fedora Core 3 (support ended in 2004) with a linux 2.4 kernel. Old, but usable.
  • It was perfect for what I wanted to do.
    It had a floppy drive controller and I could transplant the cable (with  5¼in 'slot' connectors) from the 386-SX to the Celeron 667.
 I also learnt a little about floppy drive connectors:
  • 5¼in drives have a 'slot' (card-edge) connector on the drive and a header not unlike a 40-pin IDE connector (34-pin is used).
  • 'Classic' 3½in drives use a socket connector similar to IDE connectors (but 34-pin)
  • The $30 USB 3½in drive I bought uses an incompatible tiny connector (two versions, a plug and a socket with a conversion cable). Previously, I didn't know this variant existed. Noted to save other people from popping open put-together-permanently cases. I didn't care about the warranty, but the case needs to be firmly shut or drive operation is affected.

Before starting any work, I had to backup the original 3860-SX disk. My memory was that I'd bought a 30Mb 'RLL' drive (a 20Mb ST-506 drive with a modified controller).
Turns out I really had an 85Mb IDE (now called ATA or PATA), a "WD Caviar® AC280", not only larger, but it would allow me to connect an IDE CD-ROM drive in their as well. My unused hardware pile has any number of CD-ROM drives.

Most importantly, an IDE/ATA drive gave me the option of backing up the drive via an IDE/USB interface... Of which I have a number of versions.

I also had an old system backup of around 30 * 720kb 3½ floppies. DOS 5.0 and Win 3.0.
Using QEMU on a Linux system, I was able to restore this backup to a virtual disk drive.
Shuffling all those disks was painful and slow.

Connecting an early IDE drive to a modern(ish) IDE/USB interface didn't work.
Probably because additional commands were introduced to identify the drive, possibly because this old drive responded to "CHS" (cylinder-head-sector), not LBA (Logical Block Addressing). From the AC280 spec. sheet, the drive electronics did support any reasonable CHS settings, not only the physical layout.

My next preferred method was to connect a second IDE drive as a 'slave' (D: in DOS), fdisk and format it and copy the original drive contents, then connect this drive to a modern system with IDE/USB interface and back it up.

The first 3.5" HDD I tried from my unused pile (1Gb Fujitsu) had errors.
Next drive tried was a 3.5" 4.3Gb. Worked reliably.
In the final config, I replaced that drive with a slower, quieter 2Gb 2.5in drive, cloning it via an IDE/USB interface.

The Phoenix BIOS in the 386-SX is very old. Not only doesn't it support LBA drives, it seemed to limit drives to 1023 cylinders - and 15 heads/63 sectors. Around 470Mb.
A very large fraction of my time was spent fiddling with disk CHS specifications and attempting to get fdisk to ignore the BIOS settings.
[No, I can't update the firmware, the BIOS is pre "Flash-the-BIOS"].

I could setup multiple (large) partitions on the drive with Linux and the IDE/USB interface, but then would run into troubles under DOS and the 386-SX.
I tried 'fdisk' from DOS 5.0, 6.2, 6.22, Win-95 and FreeDos on the 386-SX, but all would only see the 470Mb allowed by the BIOS. Extra partitions would be displayed, but couldn't be changed.

I don't have enough unused 1.44Mb 3½in floppies to back up the entire 85Mb drive, so was very glad my 2nd method worked.

Getting a working ISA-bus (not PCI) network card was simple: I had two in my "unused bits pile".
The one I chose I'd bought new, a Netgear NE2000 clone, but I didn't realise that or find the box (with install floppy) for a while. Relied on Windows NE2000 driver and Internet downloads at first.
The other card didn't have a clear name/identifier, nor did I get a good match from the chip numbers.

This was another 'surprise': how to get any info on installed ISA cards.
I also spent 2 or more days fiddling with the IRQ/DMA settings on the NE2000 card. I'd forgotten the problems that PCI made go away. I ended up with IRQ 5 (COM2) and found an IO address range through trial-and-error. I don't know for sure if its a clash or not.
I couldn't find a tool that would list for me all the cards + settings in the system.
Norton's "SI" provides everything but DMA ports.
MSD (Microsoft Diagnostics) didn't help either.

Getting the CD-ROM working was a good idea, if a little problematic.
Using standard Linux tools, I was able to create an ISO image of the original DOS/Windows system and also add some additional tools.
The problematic part was creating a disk image with Uppercase filenames. Whilst DOS 6.2 (really MSCDEX) reads the root directory correctly, no files or directories can be read/listed.
Perhaps it is the 'Joliet' option I use that causes this... Had the same difficulty with the FreeDos CD.

I managed to get non-DOS booting via 3½in floppies:
Until I tried it, I wasn't aware that FreeDos uses linux as a base. Uses 'SYSLINUX' as the boot loader and seems to have a kernel.sys.

It took a good deal of searching to find any Linux that would support:
  • vanilla 386
  • no FPU
  • no RAM disk for under 12Mb,.
"Floppyfw" recognised the (single) NE2000-compatible network card, but didn't include a shell.
BG-TLB does include BusyBox, but only support 'plip' networking over the parallel port (not tried).
So while I have seen a linux shell prompt and been able to mount the DOS/FAT filesystem, I haven't been able to dual-boot the 386-SX or run it as a Linux only system.

Whilst FreeDos could read its install CD-ROM, DOS 6.2 was unable to read files/directories contents (the lowercase name problem above).

Another Linux not-tried was tomsrtbt: "Tom's floppy which has a root filesystem and is also bootable." http://www.toms.net/rb/tomsrtbt.FAQ. It advertises itself as "The most GNU/Linux on 1 floppy disk."
Which might have worked, but it formats 3½in floppies at the non-standard 1.7Mb, not 1.44Mb, with the caveat "Doing this may destroy your floppy drive".

One of the 'surprises' I got was being unable to replace the original 3½in floppy drive with a newer drive. The interface and connectors were all the same, but the newer drive wouldn't work in the old system.
Was it me connecting it incorrectly, a faulty drive or something more?
Unable to tell and unwilling to devote a bunch of time testing it.

One of the worst 'surprises' I got was after installing the IOmega (parallel-port) ZIP drive software on the system after a fresh install of Windows 3.11. [Windows 3.11 had decent Networking support and Microsoft still have downloadable a good TCP/IP stack.]
I had it all setup, tested and working and foolishly, in Windows, selected "Optimise settings" and the system hung.
Whereafter, the system couldn't see any Comms ports, serial or parallel. Which was very problematic because the 386-SX didn't come with a mouse port (DIN or PS/2). I used a serial port for the mouse.
Windows hung when it booted, leading me to try to revert the ZIP drive install and later to re-install Windows.
There were countless reboots and 6-8 hours later I gave up and went to bed.
I had realised/diagnosed that when the machine booted, the BIOS reported "0 serial ports" and "0 parallel ports". The BIOS setup screen only allowed me to selected HDD and floppy drive settings.

First thing in the morning, I powered on the machine and it worked perfectly. Including the original copy of Windows that would hang.
All I can think of was a power-cycle (off/on) cleared the fault, whereas a 'cold' or 'warm' reset (reset switch or ctrl-alt-del) didn't. In the many reboots, I hadn't thought to power-cycle the machine. [A note for myself and others experiencing weirdness on old hardware.]

I also have an old Dell Inspiron 7000, dating from 1999. I got it with a removable ZIP drive, figuring I could do backups and bulk-data transfers using it and the parallel-port ZIP drive. While tinkering around, I disassembled the other ZIP drive. It's an IDE/ATAPI drive, but the connectors are non-standard. I was hoping I'd be able to kludge it to work on another system - but not to be.

One of the BIOS limits, noted above, was it will only boot from the first floppy drive. When I'd configured the machine, I'd made the 5¼in drive "A:". Part of my reconfig was to move that drive and make the 3½in drive "A:", so it could boot from it. And most "floppy disk images" on the net are 1.44M for 3½in drives.
Then I executed a perfect "rookie mistake": I forgot to make a bootable 3½in system disk before moving the 5¼in drive. And all my system disks were, of course, 5¼in.

The 386-SX is in a "mini-tower" case with very limited space between the back of devices in the drive bays and the motherboard etc. This makes running cables and changing drives quite time consuming. Especially with older connectors that are loose and can be jiggled off. This was part of the reason to move to a 2½in HDD - much more space. I did need to find a spare 2½in-3½in mounting kit first.
I didn't believe the weight of the whole system, let alone just the removable cover. Meaty!

Early on I replaced the on-board/real-time clock battery so the system would remember the time over reboots. More modern systems use 3V lithium batteries (CR2032) for this. This old motherboard used a 4.5V alkaline battery (mounted off-board with Velcro). About a week in, I used 3 AAA alkaline batteries in a modified carrier (and a cannibalised connector) to craft a replacement. A less pretty way is to load 3 batteries in a cut-to-length tube with wire soldered directly to exposed battery ends. Soldering wires directly onto batteries requires some technique. You may need help if you try it. There could be an explosion risk with alkaline batteries becoming overheated (they are marked "do not dispose of in fire"). Research this properly or get help if you choose to do this.

The system, including DOS, quite happily accepts dates of 2011. No problems there...

One of the problems I haven't addressed yet is: "How do I clean the drive heads?"
Back in the day when I was a sometime 'operator' on mainframes, cleaning tape drive heads was part of the ritual. We used Isopropyl alcohol + cotton swabs - because it didn't leave a residue. The swabs would always come away stained with oxide coating from the tape.
For these old drives, I've two reasons to want to be able to clean the heads:
a) these are old drives and may well have an internal dust build-up, and
b) older disks are likely to shed more of their coating than when new.
Some media formulations from the late 1980's are known to suffer problems. I've heard first-hand accounts of the work needed to recover period audio-tape recordings due to this problem.
  • 5¼ inch floppy drives load the heads directly in-line with feed-slot.
    It is possible to get a swab in there, though v. difficult to see what's happening.
  • 3½in floppy drives drop the disk to both lock the disk in-place and load the heads, hence the heads, being offset, aren't accessible for swabbing through the feed slot.
Another thing to investigate...