Tips for newbie to read source code

Reading source code is always one big part for software engineers. Just like the writers to learn writing by reading a heck of classic books, like the painters to “read” a bunches of pictures, software engineers will learn a lot from other one’s work. Source code is like the art museum for artist. It’s the technique library for all they need to build and create. Although it’s easy to know the importance of reading source code, it’s really hard at beginning to get used to that kind of technique reading. At least I did pay lots of pain on the initial stage.

The direct questions that a newbie may ask when they meet such giant work are:

  • There’s too much source code, where should I start?
  • As there’s millions of lines of code, how is it possible to go through every part of them?
  • After read a groups of small modules, I just still don’t know why it should exist? Why are they interesting?
  • After a period of time, I still can’t see what the value is from reading source code.

For any person, when they first meet a task with so large number of materials, they’d get dizzy. It’s not only happening in reading source code, it may also happen in any other fields like reading books, writing books, building a website, building a bridge across a river, etc. It’s the typical character of large project.

For any large project, the first headache is solving the problem of large amount. Luckily, in the history of human beings, there’s one field to deal with this kind of problem named engineering. The kernel thinking of engineering is how to manage the complexity. And one of the powerful weapons is decomposing your task. Deal with large problem is tremendous horrible. You need to divide it into small manageable parts to make things easier and system. At least for the concern of psychology, just name clearly the detailed steps you need to overcome to get things done can make yourself feel better. So does read source code. You need to decompose the source code you’re trying to research.

But how to divide it? What rules should you obey to make the divided parts more efficient for you? That’s the second aspect of our discussion, where you should start.

In my opinion, the key point of knowing how to start or decompose your task is trying to understand your project, and get the project’s domain knowledge as much as possible. At the initial stage, you should plan to understand the project that you’re reading clearly. You need to know what the purpose of this project is, what kind of specific problem it dedicates to solve, and what performance standard you can use to judge the excellence of this project. Then, you need to know this project’s domain knowledge, what the problem this domain cares, what kind of thinking way this domain constructs, and how this domain runs.

Why do we need to know these? Because once you’re in the details sea of source code, you may totally lose yourself without any direction. You may not feel the cherish value, high quality of the code snippet you’re reading unless you fully understand what the original problem is, how hard it is, and how valuable if someone can solve this problem. Once you know the meaning of your reading code, the mission of your reading commitment, you’d be motivated hugely with great passion. Your whole reading journey will become funny.

The domain knowledge is not only important for code reading, it also matters about your career. If you don’t know the domain you’re working on, if you don’t accept the value or mission of your task, it’s really hard to make your career successful. You may get confused about the key point of this field. You don’t know how to improve this field in practical way. Your efforts paying may push your career on reverse direction, and get the disdain from community. You’d feel really hard to catch up the trend, and do lots of invalid work due to the lack of domain knowledge. It’s really frustration. So, no matter if you’re reading source code or choose your career, pick up the field, community that you do really trust and believe.

From this viewpoint, the project that you may probably mess up is the one you’re unfamiliar with and without any interest. Doing domain research is not an easy work, especially when you’re not familiar with the domain of programming itself. It’s really tough to do two complex unknown tasks at the same time. So my suggestion is reading the project source code that solves problem in your daily life, or problem you’re really caring and very interested in.

After talking about those motivation stuff, we need to handle the technique parts. As a newbie, it may also be hard to catch the value even after a long period of reading source code. For this part, it often relates the bad way to deal with your reading.

The first issue you need to handle is you must know how to distinguish what the difference is between good code and bad code. The standard determines good or bad is based on how easy you can modify or do something for your code. As you getting more practice, you’d accumulate more experience about failures and frustration. Then, it’s much easier for you to understand convenience. In one period, I’d just get frustration that why the programming is so hard and boring to deal with lots of details. I’m not the machine, how I can handle so huge amount of special cases, and keep adding something from scratch. I got the real problem from my practice, but I did propose a wrong direction question. The frustration I met is not due to the complexity of task. It’s due to the way to handle this task. And that introduce the concept of your code design. Good code is always designed well to make your management on complex task easy. Control the complexity is always the meat of engineering!

According to this perspective, you can know how you read the source code. When you start your reading, the first step is of course figuring out what it executes. But that’s only the prelude. The more important focus you should put on is the design of your reading snippet. What is it constituted? How those part can make the problem easy to be solved? Why this line should exist? Is it due to the concern of macro scope or micro?

Another technique issue you need to address is how low level you should understand about your code. I think you need to dig into your code as low as the machine level, or assembly level. The process of reading source code is very much like reading mathematical proof. The detail always matter. You need to decompose the complex solutions into the atoms. For math, the atom is definition or axiom; for computer science, the atom is the machine level language. Computer is nothing but movement made by machine. It’s just a lot of physical operations, no magic. But if your understanding of any programming operations can’t be connected to machine movement, you’d always be stumble by some black box or black magic. Image the programming’s operation as magic is the sign of your poor understanding, which means you can handle the computer as you want. Probably, you’re handled by computer, or fooled by your own poor understanding. Once you make black box for yourself, you’d never feel you’re qualified on this work.

Reading is always the superficial behavior of getting benefits, but that’s not. The real core part is the thinking that introduced by your reading. So, play more on your practice journey with reading, and think harder.