Work effectively on a large codebase

Zhaojun Zhang
6 min readAug 2, 2017

--

As a codebase becomes larger, it is getting harder to understand everything. I spent a tremendous amount of time before on either open sourced projects or proprietary codebases, oh sorry, I don’t mean “spent”, I mean wasted. So the point of this doc is to give you some of my thoughts on how to explore a codebase.

Curiosity killed a cat. I am not saying that you shouldn’t have curiosity to explore a codebase, on the contrary, you definitely should, but I want to warn you that an overly excessive amount curiosity might easily destroy your productivity.

  1. Realize that you can only understand a small fraction of the codebase. If a codebase contains 1 million lines of code, it will take you 100 days to read everything assuming that you are able to read 10 thousand lines of code per day (btw, this is a lot of code). It was a game changer for me to realize that I simply can’t understand everything. Lots of junior engineers are ambitious and want to read everything. The spirit is good; however, no, you simply can’t, period. It is important to prioritize what code you want to understand, and what code to skip.
  2. Realize that it takes time (probably months) to understand even a small faction of a codebase. If you just simply read the code that you rarely use, you probably are reading the code in an inefficient fashion. Patience is your best friend, and if you want to explore something, try to find opportunities to work on it directly instead of hiding in a corner to read the code. Don’t be too greedy and let your work guide you to the right path of understanding the codebase.
  3. Realize that code is not everything. Code is an imperfect product which lacks a huge amount of information itself. People are working for decades to improve it, and no matter how good the developers are, truth is that often people write obscure code. The context (including motivation, implicit assumptions, design, even mistakes) is often not presented in the code directly, yet they are essential parts of demystifying the code. People often underestimate how much context they need before read the code. If you are interested in solving puzzles, find a magazine and solve the sudoku puzzle in it. Don’t waste your time on the puzzles of reading code if you don’t have the right context to understand.
  4. Have a clear goal when you read code and stop reading the code if you reach your goal. Your goal shouldn’t be “I want to understand everything”. Your goal could be “I want to sharpen my skills by reading this piece of code” or “I don’t know how this particular feature is implemented, and I want to know whether the implementation is useful for my project”. You can explore code without a clear goal, but you shouldn’t spend unlimited time to read code without a clear goal.

The above points help me a lot to reshape my behaviors on reading code. My old habit of reading code was really bad: I was the type of person who would try to understand everything and always found myself uncomfortable if I used code that I hadn’t read yet. I spent more time reading code than working on the tasks. My output was dissatisfactory and I blamed myself for not knowing enough amount of code. I was totally wrong! When I started to think about how many lines of code have a direct impact on my work. I started to realize that the value of reading lots of code is not as significant to me as I originally expected. Being comfortable of working with APIs without worrying about underlying implementations dramatically improves my productivity: It helps me to focus on the things that I want to build and reduce the amount of time to read the code that are irrelevant to my work (though some of the code are used in my work).

I am not discouraging you to read the code, however, the right expectation of how reading code will help you. A bad behavior of reading code is extremely dangerous, but if you are able to cultivate a good set of behaviors, it helps you a really long way. So what behaviors are good in my mind?

  1. Be friend with other engineers. Context matters and context are often missing in the code. Sit down with them during lunch and ask them some insightful questions. Shut up if you realize that you are asking dumb questions (the questions that you can easily answer yourself by reading the code and docs) and go back to read documentations and code. The context of the code is as important as the code itself, and it is often kept in the brain of engineers, not directly reflected in the code. Have a good relationship with other engineers and make sure that they are happy to answer your questions. On the other side, you should also spend time to answer questions, it is part of your work to explain what’s going on. Don’t be annoyed by questions and be respectful to the people who ask you questions, because they are the people who make sure you are valuable to the company.
  2. Understand the basics and the architecture first. You probably need to understand how container technology works before you start reading Docker implementation. You probably need to understand how service oriented architecture works before you start reading any concrete service implementation. If you can’t understand the code, stop and ask yourself whether you are missing some basic information or not. If you feel like banging your brain to the wall when you are reading some code, you probably don’t have the right information yet to really understand the code.
  3. Develop a taste of good code and bad code. Don’t blindly follow the code in a codebase as the golden standard. Expose yourself to high quality code and avoid reading bad code. Any codebase may contain many bad code that you shouldn’t follow. For me, I think good code are the ones that contain proper names, implemented in a straightforward way and are fairly easy to read. I really hate the code that works but very hard to read, and yes they are “smart”, but they are also puzzles.
  4. Don’t just read code, find opportunities to work on it. Ask the owner of the code whether there are some tasks you can pick up. It is a good forcing function for you to fully understand the code. If there is no such task, there are plenty of stuff you can do. Does the code you read have enough test coverage? If not, add some tests. Does some documentation help other people read the code? If so, add some documentation. Do you find a better way to implement code? If so, refactor the code.
  5. Online materials. Luckily, there are many people online post their tips and tricks to read code. In case you haven’t read them yet. Here is a list of pages that I find out to be useful.

Tips for reading code

What are good ways to rapidly become familiar with a large codebase?

A good understanding of a codebase, including all the quirks and pitfalls, will definitely help you advance your impact, skills and career in the long run: your code will be more consistent with the codebase, you will debug issues more quickly, your code will contain less bugs, you will find more opportunities to build impactful projects by taking advantage of in house technologies, so on and so forth. It just requires a little more time and a little more patience.

Satisfaction brought the dead cat back. May the source code always be with you.

--

--