Validation: When Fail-Safes Aren't Safe

Replaying the scenario...

Consider the following list of fail-safes:

  1. Make the hard drive difficult to remove in order to prevent naive users from destroying their systems.
  2. When the system hibernates, set a flag in non-volatile memory to prevent the user from entering BIOS, where he or she might change some BIOS setting that invalidates the hibernation image, thus preventing the system from ever resuming.
  3. When booting, check the hard disk first, as this (1) speeds up the booting process and (2) prevents someone from injecting a virus into the system via a surreptitiously inserted USB stick, CD-ROM, or DVD-ROM.
  4. If the user tries to shut down in the midst of an installation or upgrade, hibernate instead. This allows the installation or upgrade to carry on from where it left off without the need for a set of complex and error-prone boot-time checks for installation/upgrade progress.

Each of these fail-safes seems eminently reasonable. What can possibly go wrong?

Middle daughter and I found out the hard way, and, worse yet, on the very first laptop that she purchased for herself with her own money.

She had done a number of upgrades, and so foresaw no problems in carrying out yet another. Unfortunately, the upgrade encountered an error, and aborted. The system offered hibernation as the only choice, so she let the system hibernate, figuring on retrying the upgrade later on.

Unfortunately, the upgrade had aborted while installing grub, and therefore did not have a working boot loader. Therefore, attempting to boot the system failed due to lack of a boot loader.

But no problem, just install one of the many liveCDs on hand and re-install from scratch. Except that the system attempts to boot from disk first. Because there was enough of grub in place to make the disk appear to be bootable, the system happily ignored the liveCD.

Still not a problem. Just remove the disk, and the system will have no choice but to boot from the liveCD, and presumably also allow entry into BIOS. Except that the plastic cover one must remove to get at the disk was glued down. Cautious applications of brute force indicated that a broken piece of plastic was more likely outcome of further efforts than was access to the hard drive.

My daughter's laptop was now a very expensive doorstop.

Fortunately, she knew someone who had dealt with this type of laptop before, and this person understood the precise application of brute force required to remove the plastic cover without damage. Once the hard drive was removed, it was possible to enter BIOS, change the boot order, and then recover using a liveCD.

A quick inventory of other laptops around the house located one additional system that was booting first from disk, and this laptop was quickly modified to check the CD before attempting to boot from the disk. Yes, it boots more slowly, but safety first! (And yes, I did inform my daughter of the dangers of untrusted USB drives and CD/DVD-ROMs.)

No, Virginia, fail-safes are not always safe!

Epilog

The error that occurred during the upgrade turned out to be non-reproducible, so I have no hope of a fix. But I cannot resist offering yet another fail-safe: have the installation and/or upgrade procedure temporarily reconfigure BIOS to make the system check the CD/DVD-ROM and USB first. Alternatively, provide a hot-key sequence that forces access to BIOS even in the presence of a hibernation image.

Which will no doubt lead to more problems.

Which will no doubt lead to more fail-safes.

Which will no doubt...

Oh, never mind!!!