So, You Want to Run a Space Shot…

Matthew JL
22 min readSep 21, 2021

--

From 2017 to 2020 I was involved in the high-power rocketry (HPR) team at my alma mater. After my graduation (unfortunately, an event interrupted by COVID-19 related shutdowns) I sat down and wrote up some of my experiences in working towards a high-altitude flight to the edge of space. Project management as a whole is a wonderfully counterintuitive science, and the insular tendencies of the collegiate rocketry sphere means that the same mistakes are bound to be made over and over again. Here, thus, is my original essay on the topic.

There are probably dozens of books and essays that cover the complexities of engineering a sufficiently complex project — one of my favorites is John D. Clark’s Ignition! — but I can count on one hand the number of works that exclusively focus on management and logistics of such a challenge. And most of those are targeted at computer science or the tech sector. As far as I’m aware, no general guide to actually running a complex engineering project exists.

A saying I’m fond of is “There are two ways to learn something: Screwing up for yourself, or watching someone else screw up.” Most space shot teams tend to do the former. An unspoken rule for college space shot teams is that they don’t tend to share information freely with one another. There’s a really strong sense of competitive protectionism rather than sharing ideas, and the consequence of that is that most teams tend to make the same mistakes over and over again. Fortunately, we’ve never had a situation where those mistakes lead to lost limbs or lives, only time and money. (Though not for lack of trying). The same principle can be observed in both engineering and management, and can be just as deadly if neglected in either case. We all know how a poorly engineered rocket can fail, but a poorly managed rocket can be just as catastrophic!

The framework that you should become familiar with, and one that I will reference throughout this essay, is “general systems thinking.” The core idea behind general systems thinking is that a very complex system can be built out of much smaller parts, and that the root of all scientific endeavors is our innate ability to work by analogy and by breaking things that are beyond our individual comprehension into smaller chunks that can be easily understood. A great example of this, from physics, is approximating the three-body problem with patched conics. The latter is an imperfect representation of reality, but can be easily managed by someone with tools as simple as a calculator, compass and pencil and paper. The “actual” solution takes a tremendous amount of processing power and an intimate understanding of the whole three-body system. But, of course, even that involves ignoring an entire universe’s worth of stars, planets, and galaxies hurtling around and exerting tiny, nonzero forces on our system! Thus, we are forced to reduce a complex problem down to simpler parts, no matter what.

It might not seem like this is directly consequential to running a space shot project, but adopting this philosophy is actually the most critical factor in ensuring success. Many university teams have already hit upon some element of general systems thinking, taking the final rocket and splitting research and development efforts into different “teams” or “subteams,” but this is as far as they go. Merely having a breakdown of the final product into smaller subsystems isn’t enough, which I can attest to through firsthand experience. (Remember, one of the two ways of learning something is to watch someone else screw up).

One of the most common problems we had at the high-powered rocketry team I was a part of (located at a major northeastern university) was in coordinating different subteams into doing certain tasks. There is one question that you should ideally never have to ask as a member of a complex project, and that is “What should I be doing?” There was one particularly infamous incident where my subteam co-lead (who was actually captain of the club) could not attend a subteam meeting, and gave me no information about what the subteam should have been doing at that meeting. My first words to the subteam members were “Well, since X isn’t here tonight, I have no idea what I’m supposed to be giving you.” We should have been working on an interstage (couples two stages of a rocket together) for a test flight, but that information was never handed off to me, nor was it freely available. The design was left fluid and a finished product wasn’t actually produced until about two months later. This was an incredibly common occurrence with anything that subteams did not have nigh total control over.

This leads me to the most important word in general systems thinking, which is “interface.” Interfaces are how we couple together a bunch of simple black boxes into a complex system. You encounter interfaces all the time in your day-to-day life. Think of a power outlet. Power outlets don’t really care what you plug into them — a lamp, a toaster, a vacuum cleaner, a computer — all they “know” how to do is supply current at a certain voltage, and frequency, and do it in a mechanically specific way. Likewise, whatever you connect to a power outlet doesn’t care where those plugs are, just as long as the input is satisfied. There are standards by which an interface has to operate — try plugging a British plug into a US socket, for example — but they are well defined and accessible to everyone. A company seeking to make a new lamp doesn’t have to concern themselves with the electrical layout of every house it might be installed in, only the standard by which most customers use. Not all interfaces rely on one output and one input, however. USB ports are a well-defined interface, and data and power can be exchanged in both directions.

And this is the big secret about general systems thinking: We don’t actually care about the contents of the “boxes” we divide our complex project into, only about their interfaces. As long as the interface requirements are met (mass, power supply, mechanical connections), we don’t have to be bothered by what the solutions actually look like. It becomes the responsibility of an integration team to ensure that those requirements are satisfied.

Now thinking about this in terms of interfaces, how might we have solved the interstage issue that I faced? Ideally, we would have been given information on how it interfaced with the rest of the rocket — how big it needed to be, how much pressure it would have to support, how it would release, how it would attach to the upper stage, and so on — and we could have worked on it even if we hadn’t been told what to do specifically. That implies as well that a concrete design for the vehicle should have existed and been finalized. In short:

Your project should be defined, and then divided into interfaces that can be handed down to subteams before work begins.

What we should have done is, at the beginning of the term, sat down as a management team and laid out a general design for the space shot rocket. As a consequence of this, interfaces could be defined and given to the subteams as a guide. Then development could have begun.

This isn’t to say that a general design can’t evolve over time, of course. It’s very rare that an original design survives a confrontation with reality; as a rule, we can’t possibly hope to identify every single problem that needs to be solved right at the very beginning. A design should take feedback from different subteams and change to reflect those inputs. Of course, the closer you get to getting things mostly right in the very beginning, the shorter your development effort will be.

So, how do you get things right in the very beginning? You do your homework, figure out what people have already done and why, and adapt that as your first solution to the problem. Problem solving has to start from somewhere. There are many people that balk at the idea of the first move in a complex project being a re-tread of past efforts. In my experience, these people are the most likely to be actively screwed over by biting off more they can chew, which translates into program-wide slippages or worse. Again, I speak from failure. My first real experience in project management ended with a disaster caused by insisting on going for a more complex, revolutionary system.

The project I was working on was a rover platform derived from a 3U CubeSat — basically a tiny standardized satellite about the size of a tissue box. I knew the frame/chassis could be made out of wood with great success, but I decided to go with laser cutting the frame out of ¼” acrylic. This quickly became the main source of delays and slippage in the whole project. The laser cutter occasionally flat-out broke down. Calibration was a nightmare. It took weeks to get a successful cut. The school year actually ended by the time we got something resembling a chassis. Even then, the four sides of the box flat-out shattered when trying to cut them from the acrylic sprue.

Where did I go wrong? Simple. Instead of starting with a solution I knew would be successful, I tried to innovate without a full understanding of the problem. Had I simply duplicated what was already out there by making the chassis out of wood or 3D printed plastic, I would have come much farther along and learned more about the problem — and possibly would have produced a final product. Instead, I learned that laser cutters can break in more ways than you’d expect and that burning acrylic smells terrible.

I also learned that you have to be willing to cut off “solutions” when they prove to be more trouble then they’re worth. This is something that many people overlook in designing solutions to problems. I could probably have salvaged some aspect of the project by abandoning it when I first realized that the solution I conjured up didn’t work. However, this is generally hard to do without the benefit of hindsight — being able to identify a sunk cost in the moment is extremely difficult. It’s generally better to start with a working solution as a result.

But the wise problem-solver should tread carefully, especially in a domain where a bad design can lead to casualty. A well-intentioned propulsion subteam might look to the internet for motors that people have built, and come across the unfortunately common PVC pipe rocket motor, fueled by a combination of potassium nitrate and sugar, and with a nozzle made of compressed cat litter. (There are also a number of equally dangerous “hybrid motors” that are little more than glorified blowtorches, but I digress). Someone with propulsion expertise probably recognizes all of the things wrong with these motors. Most critically, over pressurized PVC tends to shatter rather than peel apart, and said PVC is invisible to X-rays. That means that an exploding PVC motor is likely to turn into pure shrapnel and embed itself in someone’s flesh, after which it basically turns invisible and becomes difficult to extract. Not a good way to go out.

Of course, this isn’t an excuse to propose a crude but functional system as the final system. But there are many times in which a solution, no matter how crude it is, is rendered effective just because it works. One of the reasons why a general systems approach is successful is because, as I said earlier, it doesn’t care what solutions actually are — just that they interface in the correct way with the rest of the system. In rocketry terms, this would be allowing for a given design to fly either with “crude” off-the-shelf avionics or a “better” custom avionics board with bells and whistles. This has two benefits: One, it allows for the designers and builders to gain practical experience with a “crude” but reliable solution; and two, it means that the overall system will function regardless of whether or not the “better” solution is finished in time for flight. You wouldn’t have to wait for the “better” solution to be finished to get research and development feedback for the whole system in a flight test. To put it in concise terms:

Problem-solving should strive to be evolutionary, not revolutionary. A system that works now is worth a better one in two weeks.

The eagle-eyed reader with some experience in project management might recognize that this also sounds an awful lot like Agile product development — the idea of making something now, learning from that process, and then applying that to a more advanced version of the solution, over and over again And you’re absolutely correct about that — this is, in fact, a prime example of Agile development.

Designing a perfect solution on the first try is virtually impossible. By the same token, predicting the amount of time a novel task is going to take is also virtually impossible. The only way to resolve both of these issues is, in fact, to throw in the towel and accept that solving any problem is going to take effort and time, and we have absolutely no way of knowing either of those values if we lack direct comparison.

This flies in the face of intuition. Many people respond to a complex project by breaking it down into discrete, tightly constrained parts and assigning timetables down to the week or hour, which inevitably causes frustration as reality sets in and dates start slipping to the right. In the worst-case scenario, time flies by and everything gets thrown together at the eleventh hour in a crunch. This happened to my team several times, and eventually I got so sick of dealing with them that I started making up excuses as to why I couldn’t participate in a build. (Remember, in every job, you start out with four living grandparents).

You might be wondering how anything gets done, then, if there’s no real way to schedule things efficiently. One solution to the problem is, as I just mentioned, Agile development: Each subteam operates on a two-week prototyping cycle where they seek to evolve their design some step ahead, and then send the subteam lead off to a biweekly general meeting where they present their progress, give feedback to the general design of the whole rocket if needed, and highlight their plans for the next prototyping cycle. Emphasis is put on making things now rather than worrying about making a perfect system in the time that’s available.

There’s many reasons why this works, but chief among them are the facts that Agile-type development guarantees steady progress while giving subteams a heavy investment in the overall outcome of their subsystem. The latter point, as it turns out, is extremely important for a volunteer organization.

I mentioned earlier in this essay that most subteams at the rocketry team were rather lost unless they had almost total control over whatever system they were working on. The two most successful subteams were propulsion and telemetry/avionics. Both had a relatively high membership count, and consistently produced excellent products. The leads associated with both of them were exceptionally bright people, but I knew there had to be another factor as to their success — and it turned out to be that high degree of autonomy.

Both subteams were unwittingly practicing a form of Agile development, steadily working away at their problem until a solution was found, and not really caring much about the overall qualities of the space shot vehicle. They had their well-defined interfaces (as both were relatively simple — a motor tube and the top of one of the body tubes), and didn’t have to align with scheduled launch dates, as they were able to substitute their work for more “crude” solutions. The avionics team fell back on off-the-shelf electronics, and the propulsion team fell back on off-the-shelf motors. This enabled them to pursue their own development without the normal pressure of delivering a product. And, because they were making steady progress with their subsystems, they were able to maintain a steady subteam full of people who were devoted to solving the problems at hand. Non-subteam-leads could tangibly interact with solutions and feel that they made an actual effect on the project as a whole. Ownership of something is a huge factor in motivation, and motivation is how the boring stuff gets done. None of the other subteams really accomplished that.

Something else that is intuitive, but wrong, is the idea that the only way to mature hardware to a point where it’s space shot-capable is through a campaign of progressively larger rockets. Unless you work near an extremely high-altitude flight field like Black Rock or Spaceport America, this is largely impractical. In one estimate for our space shot project, I found that transportation alone — just driving out to the launch field — would have cost $2,000! Multiply that across two or three test flights, and you’ll see where a development campaign starts to become costly. Rather, the majority of non-integrated hardware proofing can be accomplished with relatively low-altitude flights (L2-sized rockets) and good-quality bench tests. I specify “non-integrated” because integration always presents some kind of issue, no matter how good your interface definitions are. Sometimes a bunch of systems that work fine by themselves just won’t like being put together, and this is the job of the integration team to fix — but I digress.

One of the most critical engineering problems we were faced with under our space shot architecture (two stages, commercially available motors in each) was that of high-altitude ignition. I single this out as a problem because it is quite possibly the biggest challenge that rocketeers face. Very few people have done it successfully above around 50,000 feet, and it requires some very expensive chemicals to do right. Of course, getting to 50,000 feet is difficult in its own right — few people get rockets up to that altitude in the first place, let alone plan on firing one that high! Management thought that the best solution to this problem was to flight-test it on a flight up to 5,000 feet, and then another one up to 30,000 feet. I remember raising the following hypothetical during a meeting: “What happens if, when we go to test this at Black Rock in a flight up to 30k feet, the ignition system fails? Do we have to go all the way out there to fly it again, or do we just assume that we’ve fixed it for the space shot?” I never got a concise answer on that beyond a loose assurance that it wouldn’t happen.

This was a symptom of a larger issue with the development program: Every single test flight was intended to be “all-up.” As in, all of the subsystems expected to be present on the final space shot vehicle would be present on each test vehicle. This, of course, presents many practical issues aside from integration and the challenge of driving out to the high-altitude flying fields, chief among them being the fact that any subteam sliding to the right would, by nature, slide the test flight to the right even if the other subteams were on track. (This also has the potential to invite the second-system effect, where a subteam idled by this may be tempted to tweak things with their hardware just enough to make it better and wind up creating something worse that itself invokes delays, and so on).

Most importantly, lessons learned from one test vehicle did not directly translate into the other two, or the space shot vehicle itself. This meant that each vehicle was fundamentally unique, which rendered the idea of generating applicable flight data moot. For example, you can build a 30,000-foot rocket out of fiberglass and “conventional” materials. Any higher and faster and you have to switch to heavy-duty carbon fiber and phenolic resins for heat protection. The actual technological gulf between 5,000 feet and 100,000 feet is dramatic, even though it doesn’t seem that way on the surface.

A successful development program might have taken a page from the history books. The Saturn I, the immediate predecessor to the Moon-bound Saturn V, was the largest rocket ever developed on its debut in 1961. But it didn’t achieve full operational capability until some four launches later. What gives? Well, Werner von Braun’s governing philosophy (as the leading authority in rocketry in the United States at the time) was to conduct flight tests in metaphorical and literal stages. The first three Saturn Is to fly had water tanks with the same overall dimensions as the planned second stage rather than a live upper stage. The fourth had an inert, but otherwise functional, second stage. Only on the fifth flight were both stages fully operational.

By doing a series of flight tests like this, a tremendous amount of confidence was developed in the final system, as every part tested would operate in an actual flight environment. Insight gleaned from these flights was directly applicable to an operational system. Being able to generate flight data with or without an upper stage is a true testament to general systems thinking — it didn’t matter that a fully operational, flight stage wasn’t present; the rocket treated it as if it was because it interfaced in the same way an operational stage would have.

In this vein: Since you should already have a general idea of what your final product is going to look like, test flights can be reduced down to what I termed “aerodynamic test vehicles” — rockets that share the same outer mold line as the final product, and mimic the flight masses, but don’t actually reach altitudes much above 5,000 feet or so. Understandably, this means that in a two-stage vehicle, the upper stage will have to be replaced with an inert motor or ballasted accordingly. Even then, an aerodynamic test vehicle is a much closer parallel to the final product than fully operational flight test vehicles.

Something else worth keeping in mind is the idea that every subsystem, while inherently a black box, is mathematically a variable. The idea behind isolating and bench testing subsystems is to constrain possible failure modes. While you won’t catch all of them — again, that’s nigh impossible to do, bench testing is a simulation of reality (and even simulations are themselves simulations of reality!) — you can at least limit the possible explanations for things when they do, inevitably, go wrong.

We dealt with an example of this in our first two-stage launch. It was a really beautiful rocket — nearly 11 feet long, 4” in diameter. A genuinely impressive sight at the flying field. The launch button was pushed, and it roared skyward. Then the second stage failed to fire, and it split into two stages as it passed apogee. Only one set of parachutes successfully deployed; the first stage rocketed into the ground and buried itself some two feet into the dirt.

While we got some data back from this flight, as the second stage was completely untouched (and the motor un-fired), the avionics bay in the first stage was totally destroyed. I don’t think I had ever seen a circuit board flat-out pulverized into dust before. Of course, this was the board that we programmed to fire the second stage motor, ignition being provided by about two and a half feet of igniter wire terminating in an e-match mounted in the second stage motor. This system had never been tested on the ground.

The “official” probable cause given was that, because we displayed this same rocket at a career fair and had a few incidents where the first stage AV bay was dropped, the barometric pressure sensors were busted and never detected altitude correctly, hence why the ignition signal was never given and why the parachutes failed to deploy after separation.

However, since the system had never been tested on the ground — we only assumed that it would work because it worked for the ejection charges — I thought of a second possible cause: The ignition circuit was so long and thin (i.e., high resistance) that it killed the batteries when it was triggered, leaving the AV board powerless after separation and thus unable to trigger the parachutes.

Both possibilities would have produced the same result (no ignition, no parachute deployment), and we had no way of telling the two apart because we lacked the critical bench test of the ignition circuit that would have allowed us to write it off as a probable cause — or at least given it less weight in the accident investigation. Or, alternatively, a bench test would have highlighted a failure mode and forced us into coming up with a more robust ignition circuit. The fact that the broken pressure sensor was accepted as an official cause came down to politics, though the ignition system was quietly re-designed to ensure a board brownout wouldn’t occur.

The fact is that, had we conducted an adequate testing program prior to flight, we likely wouldn’t have evaporated almost $2,000 worth of materials. A simple hardware checkout before flight would have caught issues with the altitude sensor, but that’s something that only became obvious in hindsight. Which brings me to my third point:

Sometimes, the most expensive system you make is one that you intended to be cheap.

“Expense” and “cheap” don’t always refer to money, of course — just the same, they can refer to time and effort. A move that was intended to save time, such as not conducting bench tests of the ignition system before flight, wound up being very expensive in the long run.

This is the balancing act that you have to strike from the design phase all the way through preparing for the flight. There are many occasions when throwing extra development effort at some aspect of the system presents a net benefit by simplifying the system as a whole. For example, imagine a two-stage rocket compared to a single-stage rocket. Each one has a critical path, where failure at any point along the path is a mission-ending failure. The two-stage rocket’s critical path might look something like this:

1. Ignite first stage motor.

2. Separate first and second stages.

3. Ignite second stage motor.

4. Deploy second stage recovery system.

(Obviously, we would intend to recover the first stage as well, but in terms of claiming the title of “reaching space,” the only altitude data that counts is that of the second stage). Thus, there are four nodes on the critical path. Now let’s look at the single-stage rocket’s critical path:

1. Ignite motor.

2. Deploy recovery system.

Suppose that each step beyond igniting the first motor has a 90% chance of success (this is generally extremely reliable, and can be safely pinned at almost 100%). By this logic, the two-stage vehicle achieves a theoretical reliability of 73% while the single-stage vehicle gets to 90%. Purely by altering the number of nodes along the critical path, we managed to boost the reliability of the overall system by +17%, which is astonishing.

Because of math that I won’t get into in this essay, most space shot vehicles coalesce around a two-stage vehicle because it requires no new motor development. By financial cost, it is the cheapest, but it invokes the need for a reliable high-altitude ignition system and thus a nonzero amount of research and development. It can be conquered, but costs time instead of money. This is not something that is immediately apparent at the beginning of the design phase.

On the other hand, getting to space with a single commercially available motor is rather difficult, and in the best of cases involves spending upwind of $20,000 or more for just one flight. In-house motor development is certainly possible, but likewise costs time instead of money. USCRPL took the better part of 10 years to go from launching conventional rockets to a successful space shot, though in my opinion that timeline could be condensed provided most resources were thrown at the problem of motor development — something that requires a heavy dose of luck.

The final piece of advice that I can share with you is this:

No amount of documentation is too exhaustive. It will help you if you screw up, and help teach the next generation of team members.

Something that most newly minted collegiate team leads don’t think about is the fact that they have to, inevitably, graduate at some point, and every graduation year brings a loss of general knowledge. This is especially true as the plank owners of your rocket team go on to greener pastures.

As I said at the very beginning of this essay, there’s two ways to learn something — through your mistakes, or through someone else’s mistakes. When you’re starting out with a clean slate, as many teams do, you’re going to make a lot of boneheaded mistakes and will develop a casual understanding of how not to do things. Case in point: “Don’t let the motors sit for too long after firing or else they become a giant pain to scrape all the char out of” or “Medium-density fiberboard does not make for good bulkheads or centering rings.” These aren’t written down anywhere (unless you searched really hard for them), and are only learned through screwing up.

Preserving this experience is ideally done as a steady stream of new team members come in and work on the project at hand. But things inevitably do get lost in translation, deviance in a system gets normalized, and the knowledge transfer isn’t as efficient as you’d hope. This is where documentation comes in: Write things down so future generations can appreciate why you made the decisions you made. Archiving designs can also serve as inspiration for future projects — we learn from history!

This has a more important effect in the immediate term, too, in the fact that documentation will help guide you through a failure. Being able to pinpoint a decision that was made months ago as the root cause of a failure is a very important thing. Careful documentation of our failed avionics bay, for example, would have revealed when it was mishandled much sooner than we thought. The written word lasts longer than memory — recollection months or years later is fuzzy.

Before I close out this essay, I’d like to highlight a very important fact, which is that even though I’ve described many different ways in which my rocket team failed, the people that were involved were universally very bright and talented. It was all, collectively, a learning process, and it was only in the last few months of my time there before graduating that I started coalescing a better image of how things should have been done. I was fully complacent in the decision-making that went on, and I am equally responsible for making bad choices the same as anyone else. If anything, this should serve as a caution to those trying to lead a group of very bright people — we are all equally capable of making the same mistakes no matter how smart we are. Even with our best intuition and judgement, screwups can happen. The most important thing you can do as a leader is split someone’s character from their actions. Good people can make counter-productive mistakes through no fault of their own, and that doesn’t transform them into bad people. That is an important lesson that leadership must internalize, but beyond the scope of this paper.

Getting to space is an incredibly difficult challenge. You will inevitably go through a phase where you have to figure out how to run things, but my belief is that the advice laid out in this essay will put you in a significantly better position than starting from scratch, and is applicable to more things than just managing a rocket team.

Now… start planning.

Required Readings

The Mythical Man-Month: Essays on Software Engineering, by Fred Brooks.

While targeted at computer systems and software, this book is the one that served as the genesis for Agile development. A great discussion on how intuition can mislead us.

An Introduction to General Systems Thinking, by Gerald Weinberg.

One of the best books written about general systems thinking as a whole, this text serves as a very thorough introduction to how to think about complex systems. Very insightful.

Ignition! An Informal History of Rocket Propulsion, by John D. Clark.

Probably one of my favorite books ever written, Dr. John D. Clark’s oral history of the early days of rocket research is a romp through an often-neglected aspect of spaceflight history.

Stages to Saturn: A Technological History of the Apollo/Saturn Launch Vehicle, by Roger E Bilstein.

An excellent technical history of the development program backing the audacious plan of landing a man on the Moon by the end of the 1960s. A must-read for any project manager.

Rogers Commission Report Appendix F — Personal observations on the reliability of the Shuttle, by Richard Feynman.

Insightful commentary on the root causes behind the Challenger disaster of January 1986. Describes how perception of a system influences engineering decisions, and how management and engineers must always be on the same page.

--

--

Matthew JL

Geologist, part-time rocketeer, recovering space fanatic.