What Can a Software Engineer Learn From a Rocket Engineer? Quite A Lot Actually.
Imagine you’re building thousands of rocket engines and spending billions of dollars on a quest to make humanity a multi-planetary species. If your manufacturing process isn’t as efficient as possible, you’ll run out of money before you get there. You probably need to optimise this process like crazy right? So why don’t we see if we can optimise our software in the same way…
(please ignore the fact that nobody needs to manufacture software, that’s why it’s called software not hardware, just bear with me)
In a recent interview, Elon Musk talked about how manufacturing was underrated and was the most difficult problem SpaceX had to solve.
“10,000% more work goes into the production system than the thing itself.”
“The amount of effort that goes into the design rounds down to zero relative to the amount of effort that goes into the manufacturing system.”
He went on to describe 5 steps used to optimise engineering for manufacturing, which emerged from the many successes and mistakes of rocket programs and Teslas in the past:
1. Make the requirements less dumb
The requirements are definitely dumb; it does not matter who gave them to you. Therefore, requirements get challenged, which allows for out-of-the-box thinking (see Cybertruck).
“Everyone’s wrong. No matter who you are, everyone is wrong some of the time.”
“All designs are wrong, it’s just a matter of how wrong.”
He also states that specific people should own requirements, not departments, so that they can be questioned much quicker.
2. Try very hard to delete the part or process
You can’t afford to add a part ‘just in case’. Musk encourages deleting as many parts as much as possible, even if you end up adding some back later.
“If you’re not occasionally adding things back in, you’re not deleting enough.”
3. Simplify and optimise the design
This step comes third, not first, for good reason. The critical point of this whole process is to not do this step until you’re absolutely sure it’s needed before you start wasting your time.
“The most common error of a smart engineer is to optimize something that should not exist.”
Elon stressed the importance of an engineer understanding the whole system to avoid optimising the wrong thing. He revealed that SpaceX engineers had made the mistake of putting enormous amounts of effort into reducing the weight of engine components, but not the weight of unused fuel, despite them being equivalent.
4. Accelerate cycle time
Here’s where you start to find issues, and have a chance to fix them before it’s too late. It’s like learning to run before you run a marathon.
“You’re moving too slowly, go faster! But don’t go faster until you’ve worked on the other three things first.”
Why is this separate from the fourth step? Because this assumes you wait until any and all issues have been found before removing all humans from the process.
It would be hugely costly to try to automate something that was overly complicated or wasn’t needed.
This is also the point where you should aim to remove in-process testing for parts that can be produced reliably. Musk states that a tester could be creating a choke point, and will always have some have false negatives.
“If things are getting to end-of-line testing, and are passing, then you don’t need to do in-process testing.”
We can summarise this process as the following:
Design → Question → Simplify → Optimise → Accelerate → Automate
How does this relate to software?
So let’s say instead of building thousands of rockets, you’re developing the next big app/website/API. It’s a huge task. If you do everything right you’ll be bought by Google within 5 years, else you’ll take 15 years, never get feature-complete, spend all your money, and get overtaken by somebody else. How does this situation compare to SpaceX’s?
In terms of the processes and ways of thinking, there are a lot of parallels: it’s easy to get carried away solving the engineering problems and ignoring the bigger picture. Engineers tend to want to add loads of functionality that isn’t essential. They like optimise for speed and efficiency without being required to. They stick to a set of best practices and don’t usually question them.
Manufacturing ≈ Going Live
In the design stage your code only needs to work once. When you go live you’re running the same code billions of times, on millions of different computers, for years or decades. There’s a lot that could go badly: will you need constant patches, or tiny changes for every new use case? Will you need to spend forever teaching people how to use it? Will you be able to keep up with demand and scale? Only by solving these issues can you hope to actually achieve anything with your software.
Like with rockets, this is an underrated part of programming. An inexperienced software developer might assume their cobbled together script can be used in production with no changes. This is like building a rocket engine in your garage from spare parts and deciding you’re going to build 100,000 more in the exact same way.
Part = Script, Service, or Program
Cutting out an entire service from your app (e.g. deleting an email server because you can just use an API) is undeniably a great way to reduce complexity and go live faster.
Cost = Cost?
For software, material cost is much lower, personnel cost is much higher, and there are additional computing costs. People often worry about their compute/network/storage bill, but the salaries of the people fixing bugs all day long are likely to be larger.
There’s also the opportunity cost of complexity. If a low priority feature requires a day in total to develop, deploy, debug, document, etc. over the course of a month, then you’ve missed the opportunity to start a killer feature, or present your product at a conference, or have a much needed office party!
Weight = Complexity
For rockets this doesn’t just mean weight of the part, it’s also how much thrust you can get for that weight. Adding complexity to a piece of software means more complex documentation, technical debt, bugs, deployment, etc., all of which increase costs but more importantly, slow you down and kill your momentum.
Reducing complexity allows you to go further with the same amount of resource.
Requirement = Requirement
Requirements often don’t get questioned, and they rarely have an individual owner. It’s far too easy to make assumptions about how software will be used and not seek out the opinion of an actual potential customer.
Feature = Feature
We’re all familiar with dubious feature requests. The Agile process helps to prevent low priority features from being worked on but doesn’t stop us from adding unnecessary complexity.
In-process testing = Unit tests
This one has really made me think. Elon’s right, we don’t need most of these unit tests. We’ve got hundreds that pass every time, and they sometimes get false negatives which take time to investigate. We could probably replace 90% of them with a single integration test.
(I won’t suggest getting rid of them though, just running them less often)
Colony on Mars = __________
It helps to know what you’re working towards. A clear and ambitious goal has certainly helped keep SpaceX focused on what’s important. What’s the ultimate end goal of your project? What will make you say ‘Mission accomplished’?
The optimisation of the productionisation process is vital for any software company. There’s a huge impact on time but it’s also the difference between being proud of your accomplishments, and being embarrassed. We’ve all seen projects that are being carried by one developer who knows how to fix the many reoccurring issues, and has given up hope of ever working on something interesting again.
I guarantee no app ever went live without a single issue, so there’s plenty of room to improve.
What Should We Be Doing Differently?
With all the parallels described above, I think we can take some learnings from the five steps. Looking back at some of my own projects that were difficult to get live, I can now see that we skipped some steps or did them in the wrong order, just as Elon admitted Tesla had done in the past. Based on this, here’s what I think we should all be doing differently:
- Challenge all requirements, no matter how small. A common example would be “We’ve got streaming data therefore we need a streaming data pipeline” but in reality there’s no requirement for data to be processed in realtime. Micro-batch processing is much simpler and cheaper to build.
- Pause once you have a working concept to ask which processes aren’t needed, and just delete them. Don’t spend time on a feature nobody cares about just because you want everything ‘production ready’.
- Recognise the true cost of complexity. Any ‘part’ in your app that could be replaced with something much simpler must also be deleted.
- Don’t optimise for the wrong thing. Is speed your priority or is it lowering cost? Should you spend your time improving features or adding more? Find a real person in the business who can explain what the priority is and why.
- Don’t optimise too soon. Don’t invent a framework before trying something the simple way first. Don’t define a data model when the data isn’t finalised. Be flexible to change throughout.
- Accelerate your code before you automate. You’ll find which parts require further optimisation, and uncover problems that would be impossible to fix after going live.
Most importantly, appreciate just how challenging it is to go from a great concept to a live app for a mass audience. Your approach to productionisation could determine your success or failure. Keep complexity low and momentum high, else you’ll never get to Mars.
The full text of the interview referenced can be found here: https://everydayastronaut.com/starbase-tour-and-interview-with-elon-musk/