Earlier this week, Delta passengers worldwide were stranded as a computer failure completely screwed up operations. The ensuing chaos provided a good look at how the robots are actually going to kill us, but also raised some good questions: how does one power outage ground an airline, and how fired is the sysadmin?
The Week spoke to Delta’s COO, Giles West, to try and understand what happened to take the entire network offline. It’s a sad story of backups that should’ve worked, knock-on effects, and one seriously expensive outage.
“Monday morning a critical power control module at our Technology Command Center malfunctioned, causing a surge to the transformer and a loss of power,” West said. “When this happened, critical systems and network equipment didn’t switch over to backups. Other systems did. And now we’re seeing instability in these systems,” West told The Week.
In other words: a power surge caused by one malfunctioning piece of equipment tripped a power transformer, killing everything at Delta’s command center in Atlanta. Clearly, this shouldn’t have happened, and there should have been a backup power system in place (or an entire backup command system).
But even with this failure, why did a computer failure in Atlanta stop planes from flying in London? From news stories at the time, it sounds like the main problem was with the passenger information system. Without the computer, airline staff couldn’t check in passengers or issue boarding passes, a vital step in loading the plane. Some news outlets reported Delta staff filling out boarding passes by hand, but that’s a bad fix at best.
The problem because worse because even though the outage only lasted a few hours, the knock-on effects on Delta’s service was huge. Running an international airline involves careful balancing of crew and aircraft all around the world, and a small delay will cause major delays for later flights which were never affected by the computer outage.