Science News illustrates vividly just how important it can be
to program a piece of software correctly and document its proper
usage to the customer.
At a recent government conference on computer assurance, it
was revealed that the February 25 failure of a Patriot Missile
battery in Dharan to track and intercept an incoming Scud missile
was traced to a 0.36 second error in the timing of a software-driven clock. That missile subsequently struck a warehouse being
used as an Army barracks and resulted in American deaths.
It turns out the software was working as designed. The
original specs for the system were based on the assumption that the
system would never be in continuous operation for more than 14
hours. Periodic maintenance was assumed to bring the system down
at least that often. Accordingly, the programmers coded the clock
with an algorithm that produced an error of 1 part in 1,000,000.
The resulting accumulated error was judged to be insignificant over
periods of 14 hours or less.
However, the crew operating the Dharan missile battery wasn't
aware of this limitation. By the time the fatal Scud arrived,
their system had been running continuously for 100 hours with no
apparent problems. Although the problem had been identified a week
earlier and a fix cassette had been sent to the field, it didn't
arrive at the Dharan battery until the day after the Scud attack.
Sometimes even 6+ sigma is not enough...