I have another example that shows how important documentation is. Or, in this case, when documentation and code become one.
I have used the R language to build stock portfolio models (e.g., given a large set of stocks or ETFs, select the set with the best return, relative to risk). I have also used R for a couple of machine learning projects.
R is a math language, so it operates on vectors and matrices. This means that a single line of R code can do the same thing as many more lines of Java or C++ code. R also has a huge library of statistics and quantitative finance functions (which is the reason to use R — since the language is not great).
The very power of R makes it difficult to read and understand if you put your R code down for a week or two and then come back to it.
I had this problem with the machine learning models, since I was working on another project at the same time. When I returned to my R code after a couple of weeks, I’d have to spend hours figuring out what my own code did.
Then I started using Knitr. Knitr is an R library that allows you to mix LaTex with R code (LaTex can produce beautiful mathematics documents). Knitr allowed me to write about the ideas behind the R code and what it was doing. I could add a narrative explanation describing the results (plots and tables). This was a huge help when it came to picking up the code again. A nice side effect was that I had a constantly developing report that I could share with my colleagues.
In the case of the financial models, the end result was a paper that I published on SSRN. This paper was written in Knitr. The document is created by “compiling” it and running it to produce the text and all of the tables and plots.
Python has something similar to Knitr, that allows you to mix Python and documentation. I plan to do the my future portfolio models in Python. R has some unpleasant problems when it comes to “dirty data” that is missing values. Like R, Python has a very active quantitative finance community.