How to start a large code project

You’ve learned a programming language, and you’ve tackled exercises up to several hundred or a couple thousand lines of code. Now you have an idea for a project, but how do you get started writing what may take many thousands of lines? What if your idea is vague, both in what exactly the end result will look like as well as how it will be implemented?

It’s easy to feel paralyzed in this situation, to feel like any attempts at getting started will crash and burn before the finish line. What if your code structure (or lack thereof) leads to a tangled mess? What if problems arise for which you simply lack the knowledge or skill to solve?

To overcome paralysis, follow this 3-step process:

1. Take inventory

What language(s) will you use? What libraries or frameworks? What platform API’s? Does the project require a database? What database? Does it involve networking? Graphics? Media encoding/decoding? Etc.

Having answered these questions, ask yourself how familiar you are with each thing involved in the project, and then explore the unfamiliar stuff in isolation before starting the full project. For example, if your program requires displaying graphics in a window but you’ve never done such a thing before, get comfortable with the necessary API’s by doing small exercises: write a program that displays an empty window, then a program that draws a rectangle in the window, then other shapes, then images, then text, etc. If you skip such exercises, your project will be weighed down by big unknowns, and the mental overhead of your project will make it harder to learn these unfamiliar elements.

While it’s good to tackle new challenges, you don’t want too many new challenges at once. In general, don’t start new big projects if you are unfamiliar with more than 10–20% of the major pieces involved. For big complex topics, like say graphics programming with OpenGL, practice with smaller projects before tackling your more ambitious ideas.

2. Sketch your data

The great thing about data is that, by itself, it’s simply inert. Unlike with code, you don’t have to think about logic or time. This is why you should work out a first draft of the data required for your project before writing any code. For example, in a Mario game, you’ll need a data type representing Mario’s coordinates in the world, his number of coins, his number of remaining lives, his collision boundary, his current momentum along the x and y axis, etc. We don’t yet concern ourselves with how the Mario type will be used by code, just that the type can represent all possible Mario states.

During development, your data will certainly require changes and additions, but with a basic data design in place, you can then work out what data transformations will be required, and this gives you a starting point for writing the code.*

As much as possible, I want to think about programming in terms of data and data transformation. That’s my comfort zone. Everything else is scary and weird.

3. Solve the wrong problems

The advice always given to new programmers is to solve large problems by breaking them down into small problems. This is certainly good advice, but it leaves out an important caveat: breaking a problem up often requires distortion; the simple pieces don’t necessarily fit perfectly back together, and even when they do, they don’t necessarily add up to solving the exact same original problem.

This is OK! It’s a natural part of the process to simplify your goals but later incrementally add back in the complications. Instead of accounting for all possible cases from the beginning, start off by accounting for the simpler, more common cases, and work in the edge cases one-by-one. In other words, solve the wrong problems incrementally until you solve the right problems.

What makes writing code hard is that we can’t figure out all of the details up front. (If we could, writing code would be as easy as just sitting down to type.) The only way around this is to ignore known problems: allow yourself to disregard whole features and concerns so that you can get code on the screen. Once you have code solving a distorted, simplified version of your problem, it’s generally much easier to find a solution for your real, complete problem.

Writing long-form code, in a sense, is like writing long-form prose: incremental, out of order, and requiring lots of revising of what’s already been written.


*This is a problem with Object-Oriented Programming: it’s often purported to be a data-first way of programming, but when breaking your problem into classes, you must consider the ‘responsibilities’ of each class, i.e. what the class does. So in fact, OOP doesn’t really let you design data by itself: it jumbles code design and data design together.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.