Peculiar Books Reviewed: Henry S. F. Cooper Jr.'s "Thirteen: The Apollo Flight that Failed"

Originally published October 1, 2014

In the first Peculiar Books Reviewed we discussed David A. Mindell's delightful book "Digital Apollo" and, in particular, took from the book this lesson: a technical system which puts the human component in a position of supremacy over the machine is more capable of achieving the aim of the system than one which holds humans in a subservient position. That is, ignoring any moral or dystopian considerations, putting people in a role in which they are made to serve machines creates worse outcomes in terms of what has been built. Mindell puts this as the difference between "engineering" and "pilot" mentalities, the former being in favor of full automation–think Werner von Braun's desire have mere passengers aboard a clockwork spacecraft–and the later in favor of manual control of the craft. The "pilot" mentality works fine in systems that are relatively simple but as the complexity increases human ability to cope with the banal demands of operations falls off: we can only do so much. The "engineer" mentality succeeds up until the system encounters a situation not expected by the engineers and the mechanism, being unable to cope, falls back to human operators who may or may not be paying attention at the time of the crisis or capable of adapting the mostly automated system to their immediate needs.

This idea, that the role of people in a complex system–spacecraft, software only, industrial etc–can be considered in purely technical terms is important enough that I'm going to spend this review and the next elaborating on it. There's a moral argument to be made as well, as hinted at in the review on Francis Spufford's "Backroom Boys", but the time is not ripe yet for that.

At a little after nine central standard time on the night of Monday, April 13, 1970, there was, high in the western sky, a tiny flare of light that in some respects resembled a star exploding far away in our galaxy.

Thus begins Henry S. F. Cooper, Jr.'s "Thirteen: The Apollo Flight That Failed", one of the best technical explanations of a catastrophic failure and its resolution ever written. This "tiny flare of light" was a rapidly expanding cloud of frozen oxygen coming from the now seriously damaged Service Module (SM). A tank failure ("Later, in describing what happened, NASA engineers avoided using the word 'explosion;' they preferred the more delicate and less dramatic term 'tank failure'…") of Oxygen Tank No. 2 damaged the shared line between the two primary oxygen tanks and the three fuel cells. Immediately after the failure two of three fuel cells began producing reduced amounts of electricity as a set of reactant valves which fed them were jostled shut, permanently, by the force of the failure. Another valve, meant to isolate Oxygen Tank No. 1 from No. 2 failed because of the same mechanical jarring, but was left in an open position. Over the next two hours, both tanks vented into space, pushing the craft off course and ruining the Service Module.

The subsequent flight which Cooper so expertly lays out was a "ground show", in the words of the astronauts themselves. Usual operation of the flight is a delicate balance between the on-board astronauts–in physical possession of the craft and able to manipulate it–and the flight controllers, receiving constant telemetry from the craft, thinking through consequences and making recommendations. Cooper describes this by saying "Astronauts are more like officers aboard a large ship… (and) there were about as many astronauts and flight controllers as there are officers aboard a big vessel (…) In fact, one of the controllers, the Flight Director, in some respects might have been regarded as the real skipper of the spacecraft…" Apollo craft could have operated independently of any ground crew but only in planned-for situations. Post-failure, it became the flight controllers' task to find a plan to land the astronauts safely and the crew's job to carry this out.

Plan they did. With the service module ruined it was abandoned and the crew began to use the Lunar Module (LM) as a life-boat, an eventually never seriously considered.

Aside from some tests a year earlier (…) no one had ever experimented to see how long the LM could keep men alive–the first thing one needs to know about a lifeboat.

Almost entirely through luck the LM was equipped sufficiently to make the trip back survivable and possible. Cooper was likely unaware, but as Mindell pointed out the LM and SM had duplicates of the same computer, meaning that the LM computer, not being a special purpose lunar-landing device, could make rocket burns to return the craft to Earth. The rigging of various internal systems–made famous in the Apollo 13 film: the CO2 scrubbers were incompatible between modules and had to be adapted–careful rationing of electricity and continuous drift from a landing flight-path kept Mission Control busy creating and testing new flight checklists.

Cooper's real interest is the people involved in this story and their interplay through the crisis. Astronauts rushed in to man simulators to test flight controller theories about rocket firings, computer teams kept telemetry gathering systems, flight projection calculators and the CMS/LMS Integrator which "would insure that the instructions for the two modules dovetailed–that there were no conflicts between them" humming. Cooper is telling the story of a complex organization manning a damaged complex system, with human lives at risk. Implicit in all of this are the machines these people are using: tools being adapted to new situations and the spacecraft being repurposed in ways never intended.

In a basic sense, the Apollo spacecraft was a couple of habitable tin cans, some rockets and two computers to control said rockets. The computer was 'programmed' by calling up subroutines and feeding in input parameters, all augmented by feedback from the pilot. Normal flight operations dictated the call-up of subroutines and the parameters input, with a feedback loop dictated by real-time telemetry from the craft and astronauts' expert opinions. The Apollo computer could not demand nor decide, it was instructed. To deal with this 'limitation' NASA was forced to invest in training of all flight staff and ensure that the craft could be flexibly programmed by the astronauts. This, of course, meant that the craft and crew were not not rigidly locked into a fixed plan but could use their human understanding to change course (literally, in this case) as reason dictated.

In documenting the catastrophic failure of Apollo 13, Cooper has likewise documented the exquisite working of a complex organization in a position of mastery over a complex system. These human-oriented complex systems are arranged to take our instructions, to guide but not command. In a crisis, this proves invaluable: we humans may apply our intelligence to the problem at hand and use the machine as just another tool in the solution, keeping in mind, of course, the limitations of the machine but never once struggling to bend it to our informed wills. We may also choose to opt out of the tool's use. Only Jim Lovell, commander of the Apollo 13 mission, intended to make use of the LM's ability to automatically land itself. He never got the chance, of course, but there's something telling in the notion that every other astronaut who landed on the Moon–comfortable with and pleased by the craft's theoretical abilities, all–would choose to go down manually.

As a society, we're building more and more purely automatic complex systems. In the best case they take no input from humans and function in so far as the system's engineers were able to imagine failures. In the worst case, they demand input from humans but do so within the limited confines of the system engineers' imagination, implicitly invalidating any and all expert opinion of the human component. Such systems are brittle. Such systems are, indeed, not maintainable in the long-term: the world changes and knowledge of their operation is lost as none of the humans involved in the system ever truly were responsible for understanding its mechanism.

What Cooper has done is craft an engaging story about a very nearly fatal six day trip round the moon in a faulty craft. What he has also done is to give a vision of the effective interplay between human and machine in a way which enhances the overall capability of the people involved, extending their strengths and making up for their weaknesses. This is the valuable contribution of Cooper's book: a rough blueprint, seen through a particular accident, for complex systems that must be tolerant of faults in the fulfillment of the designer's aims. Machine-oriented systems are fine, maybe even less onerous to run in the average case, but in failure scenarios seriously bad things happen.

More on that, next review.