To My Alma Mater: Go Open Source
I am proud to report that yesterday news broke that UC San Diego is receiving a $75 million dollar gift to invest in its data science program. Specifically, it will be used to build a new institute for data science, which is expected to touch a variety of disciplines. Since the institute is expected to be interdisciplinary, I hope the culture of data science influences the departments throughout the campus in going open source where possible.
Free is the Best Price
My time at UC San Diego is irreplaceable. The friendships I made have lasted, what I studied was interesting, and I felt like I truly flourished in college compared to high school. However, tuition, rent, books, and incidentals can add up.
Many of my economics and math classes required the use of proprietary software such as Stata and Matlab. Although some employers use these tools, many also use open source software such as Python and R.
When a professor has its students use proprietary software, there are a few options available for the student. They could either buy the package, but that costs a lot of money. They could go to the library to use one of the campus computers, but that is not convenient for learning and does not allow for the student to continue using the software after they have left the university. They could download it illegally, but that is not optimal either.
Many professors were good about creating a learning experience at an optimal price. Instead of making students buy the latest textbook for their subject, some would only compile a booklet of readings for a fraction of the cost or only require online readings.
This kind of behavior should be encouraged for professors, as well as using open source software where possible. Not only because it helps students’ pocketbooks, but I sense that more organizations are trending towards open source solutions as well.
Source Code: See What’s Under the Hood
It is not only important when doing statistical analysis to be able to write the code to execute the task, but also understand what the task is doing itself. I am sure proprietary software has extensive documentation, but it is not the same as being able to see what exactly the code is doing.
For example, one of my instructors at General Assembly sent me an article over the weekend saying one way to see if there is a multicollinearity problem is to check the variable inflation factor. Since I have access to the source code, I can see that the function does a regression of the chosen predictor compared to the other predictors to calculate the statistic.
Proprietary software may document how this is theoretically done, but I suspect that is hard to get to this granular level. For some students, knowing how it is computationally done can help understand the theory behind it.
UC San Diego is well renowned for its scientific research and education, and most majors take a highly quantitative approach. Students having free access to the tools they need, and to look under the hood of those tools will push UC San Diego to the next level.