If Neil Armstrong were your engineer, you wouldn't need this!
Lesson XIV of the Lunar Landings
Before Neil Armstrong was the first man to step on the Moon, he was one of America’s premier test pilots and among its first astronauts. In the 1960s every space flight was an experiment and had some feature that was being performed for the first time ever and every flight achieved more than the flight before.
Today, docking spacecraft is a relatively common occurrence — every flight to the International Space Station requires the linking of the spacecraft with the station. The astronauts’ training is based on known parameters, they are assisted by on-board computers and much of the flight can even be automated.
But in March of 1966, when Neil Armstrong and David Scott took off in Gemini 8 in order to link up with an unmanned vehicle called Agena, they were attempting to perform a task which had never been done before — not even by the Soviet cosmonauts, who had claimed all the previous “firsts in space”.
Gemini was an interim series of spacecraft, a step between the Minimum Viable Product (aka MVP) of Mercury (which proved that astronauts could function in space) and the planned Apollo (which would show that astronauts could reach the Moon). Gemini’s goal was to prove that astronauts could work productively in space, and one of the major tasks was to demonstrate docking with other spacecraft.
After checking out their spacecraft’s systems, the two astronauts located and rendezvoused with the Agena. They maneuvered closer and successfully linked the two vehicles together. During this time David Scott used the IBM computer onboard the Gemini to send remote control commands to the Agena.
Having succeeded in the first, and most important, part of their mission, the astronauts relaxed while they flew out of the range of ground communications stations. Back in the 1960s, the limited technology and fewer relay satellites meant that the astronauts could not constantly communicate with Mission Control in Houston, but spent part of each orbit essentially alone.
It was during this communications blackout that they noticed that the two spacecraft had started spinning… and were spinning faster as the seconds went by!
The aberrant spin had started slowly but was quickly reaching a rate of revolutions which would not only affect the fragile human body of the astronauts but might warp the strong-yet-delicate metal alloys and electronics of their vehicle.
During every previous space emergency and nearly every future emergency, the solution to the problem would be reached as part of the collaboration of the astronauts in space and the engineers on the ground. The engineers had access to more information about the spacecraft, more powerful (much more powerful) computers, reams of documentation, and planning documents at their fingertips and the possibility of enlisting armies of experts into crisis rooms to solve the problems.
Gemini 8 was spinning, and spinning fast. If the astronauts had waited for ground control communications to resume, they might have found that the spacecraft had broken apart while they were waiting for help. Even if the spacecraft held together, the rapid spinning would disorient the astronauts, and problem-solving would get more difficult as time went by.
But NASA’s planning was nothing if not comprehensive and both Armstrong and Scott had spent months training for every eventuality (or as this case proved, just about every eventuality). They had their flight plan which was the “happy path” of what they would do during a perfect flight and endless contingency plans which were what they would do if (or rather, when) problems arose during their flight. Contingency plans are often called run-books in modern IT parlance and can range from a document describing step-by-step actions to be performed through a series of automatic actions triggered by an engineer all the way to a fully automated response which would solve the problem as soon as it was detected, without needing any manual intervention by a human.
While the Gemini spacecraft already had a proven track record of successes, the newer Agena was an unknown factor and most of the docking contingency plans/run-books were associated with it. Indeed, the very last remark relayed to the crew from Mission Control before communications were cut off was a reminder to cut the Agena loose if it began to misbehave.
So Armstrong and Scott, seeing the Earth and stars flash by their windows at an ever-increasing speed, calmly and collectedly pressed the buttons on their IBM control computer to deactivate the Agena and disconnect from it. The fact that they spent precious time (and attention) to shut down the Agena properly meant that it could later be investigated and re-used for future missions.
But instead of solving their problem, they saw that their Gemini, free of the massive Agena, was not returning to normal behaviour but was actually spinning much faster!
The “correct” run book, the one they had practiced and prepared for, the one they had been told to use by Mission Control, was actually the wrong one and had placed them in even greater danger.
Despite this initial setback, Armstrong and Scott had more than one trick up their sleeves and called upon their months of training for the flight, years of astronaut training, and decades of training as test pilots — and of course their own personal instincts and capabilities.
Wrestling with the recalcitrant spacecraft, they came to the conclusion that there was no option but to completely deactivate the main maneuvering engines (which were probably the cause of the spin) and stabilize the spacecraft using the reentry engines. The only problem with that was that the mission rules were explicit — once the re-entry engines were activated, there was no choice but to return to Earth as soon as possible.
So minutes later, when Gemini 8 returned to communications range with Mission Control, there was little for the astronauts to say besides:
There was a life-threatening incident that nearly destroyed the spacecraft; we’ve managed it and saved both ourselves and the Gemini 8 spacecraft; now where do you want us to land?
Later, after the systems in Gemini 8 were examined in detail, the cause of the problem was found. One of the engines controlling the stabilization of the spacecraft was configured to “fail open”, which meant that a control failure would cause the engine to assume that it should always be active and start spinning the spacecraft. It was easy to fix this and no other Gemini (or any other spacecraft) had the same problem.
Regarding the actions of the astronauts during the emergency — they acted on the best information they had before the emergency began; the Agena was not as dependable as the Gemini, they had limited information about which engines were actively firing and which were not. Thankfully, due to their skills and abilities, they were able to wrest control and recover the mission.
But it begs the question — can we always depend on having engineers with the skills and reflexes of Moonwalkers? Scott commanded Apollo 15 and was the seventh man to walk on the Moon while Armstrong was… well, I’m going to assume my readers know.
Moreover, the astronauts spent months training for their missions and knew every possible contingency while modern operations environments change in such a rapid fashion that no single person can keep the entire architectural view in their mind, not to mention keep track of the correct action to take (or runbook to execute) for any eventuality.
While it’s easy to jump to the wrong conclusion based on personal knowledge and try to disconnect from your Agena, a solution such as Watson AIOps will suggest handling your maneuvering jets as your next action first.
This is where solutions such as Watson AIOps come in to play — it is the “expert astronaut” who is constantly examining your environment and keeping up with changes. Watson AIOps correlates the various anomalies in your system (whether logs, events, or metrics) with the possible solutions (contingency plans or runbooks) and recommends the next best action to take, at the right time, by the right person.
Now, if Neil Armstrong or David Scott were in control of your operations, you might not even need the help of Watson AIOps… but since they are not going to be working for you, most people will be better served by having the likes of Watson AIOps as their co-pilot.
You can also watch a recent webinar dedicated to Watson AIOps and don’t forget to register for the upcoming IBM Academy of Technology conference PREVAIL2020 which covers all things Performance, Availability, Site Reliability Engineering, and AIOps. PREVAIL2020 includes a keynote session dedicated to AIOps.
PREVAIL2020 will also feature a session dedicated to the 50th anniversary of the Apollo 13 flight, including lessons that have not been published here yet — please register and watch!
Bring your plan to the IBM Garage.
IBM Garage is built for moving faster, working smarter, and innovating in a way that lets you disrupt disruption.
Learn more at www.ibm.com/garage