When first engaging with a project, spend an abnormal amount of time getting familiar with the data. I say abnormal because often you’ll have to take your first estimate and multiply it by 3. This will save you time in the long run.
The overall theme I took away from the 900+ pages of Code Complete is that quality software is produced through a rigorous design and development process. That rigor is often missing from data science, which tends towards convoluted code to get a solution once, rather than code that can be run millions of times without error. Many people who come into data science — myself included — lack the formal training in computer science and software engineering best practices. However, these programming practices are relatively simple to pick up and will pay off far down the road in terms of your ability to write production-level data science code.
! If you don’t measure the effect…in the book that point out supposed performance enhancements that actually had the opposite effect! If you don’t measure the effects of a change, you cannot know if what you are doing is really worthwhile.
sult, I reduc…n help us optimize. For example, when I started tracking my time in my first few months on the job, I noticed I was spending more than 75% of my coding time on writing and debugging tests. This was an unacceptably large share, so I decided to spend time reading how to write good unit tests, practicing writing tests, and I started to think about the tests I would write before coding. As a result, I reduced the percent of time writing tests down to less than 50% and was able to spend more time understanding the problem domain (another critical aspect of data science that is hard to teach).
…ur code much simpler, thereby freeing your mental resources to concentrate on solving tough issues. When explaining technical concepts, the mark of a master is not using complicated jargon, but using simple language that anyone can understand. Likewise, when writing code, an experienced developer’s code may perform a complex task, but it will hide that complexity allowing others to understand and build on it. It can be momentarily satisfying to write tricky code that only you understand, but eventually, you’ll realize that an effective programmer writes the simplest code. Reducing complexity increases code quality and limits the number of decisions you have to make so you can focus on the difficult parts of a program.
The concept of consistency is crucial for reducing code complexity. The argument for having standards/conventions is you don’t have to make multiple small decisions about things tangentially related to coding such as formatting. Pick a standard and apply it across your entire project. Rather than worrying about what capitalization to use for variable names, apply the same rules to all variables in your project and you don’t have to make a decision. The choice of a standard often matters less than the actual standard itself so don’t get too caught up arguing about whether you should use 2 spaces or 4. Just pick one, set up your development environment to automatically apply it, and go to work.