As far as job titles go, data scientist is kind of the biggest buzzwords of the last few years. It's also one of the more nebulous ones. What actually is data science? Can you even study this? What do data scientists do?
Yes, you can now study data science at some universities (Edinburgh's Data Science program is one of the better ones), but most data scientists come from other fields. Mathematics. Computer Science. Statistics. Physics.
You know, the usual suspects — math-heavy courses that also expose you to a lot of programming and algorithms.
But I want to suggest that economics is — surprisingly perhaps — a great background for data science.
Yes yes yes. Please, hear me out. I know I am biased, but I really believe there aren’t many degrees that give you better training for working in data science than economics.
As a graduate of economics, I've committed possibly the greatest sin of the profession. I switched sides. To machine learning and data science.
I don't think I really switched sides, but the world — at least the economics world — around you would have you believe that econometricians and data scientists are sets without an intersection. Data mining is somewhat of a bad word in econometrics, a field almost religiously seeking causal inference and interpretability of results.
But when you actually look into what data science usually is, the boundaries between more traditional econometrics/statistics and the hip and cool machine learning become less and less clear (this infographic is a great illustration of it: source).
Reading through common data science job descriptions, you may get the idea that economics is the worst training to have. Most economics programs don't teach programming and databases, neither do they come even close to machine learning. WTF is Hadoop? And Hive and Pig? Is this a joke?
Specific skills aren't the most important, though. Solid background is — a background that will let you learn the specific skills quickly. And good economics education is indeed a solid background to have.
So here's 4 reasons why economists make great data scientists:
You already know machine learning!
Before you stop reading, thinking that I must've gone to a very weird economics school to have learned machine learning there, read this:
Machine learning is really just a very fancy term for statistical/predictive modelling that programmers invented to keep away the uninitiated from their elite club (hey, they do know some economics after all — scarcity drives prices up!).
In fact, the first two modules in the most popular machine learning course on Coursera are, wait for it, linear regression and logistic regression.
For the 99% percent of economists who took introductory econometrics, this may surprise you. But you probably have deeper knowledge of linear regression than the average data scientist. Just as you may be freaked out by names like “neural networks” or “support vector machines”, you'd have to work very hard to find the term “heteroskedasticity” anywhere in machine learning syllabi.
And even the terms you may not know, they are often just examples of skilful copywriting. Neural networks are a great example. It's something that sounds incredibly complicated (are we modelling the brain or what?), but on a (basic) fundamental level, they just combine layers of logistic(-like) regressions to model more complex non-linear relationships that a single regression may not capture (for great primers on neural nets, see http://karpathy.github.io/neuralnets/ or http://iamtrask.github.io/2015/07/12/basic-python-network/).
Granted, neural networks can go deep, far deeper than what I've just described. Recurrent nets, convolutional nets, deep learning are all much more complex topics — and much more powerful algorithms. But for most machine learning applications, you should do just fine with far simpler models: basic neural nets, decision trees, regressions, SVMs… And with statistical background from most econometric courses, you are not going to have any trouble grasping these concepts quickly (I highly recommend that Coursera course).
You have higher standards
Of course you can, you nerds.
At least in my experience, econometrics was obsessed with finding causal relationships — and making it really clear how difficult this is without randomized controlled trials. And how sensitive most models are to their basic assumptions. A lecture wouldn't pass without someone mentioning yet another possible source of bias. Attenuation bias. Survivorship bias. Selection bias. Measurement error. Reverse causality. Truncation. Censoring.
For every problem there was another — more complicated — model that was to deal with it. A model that also introduced its own bag of assumptions and issues.
The world of econometrics was messy, uncertain and frustratingly limiting.
Warning: gross exaggeration ahead.
Compared to this, machine learning is beautifully straightforward. Instead of solving models explicitly — relying on strict assumptions to be able to do so — models are estimated iteratively with gradient descent (and its derivatives). Instead of figuring out what the theory is behind the relationship you are trying to study, and carefully selecting explanatory variables and the appropriate model, you try all you can think of and see if it sticks. Get used to cross-validation and testing. Instead of t-statistics, why not try some bootstrapping?
To econometricians, this may seem blasphemous. But that's only because you are expecting the same from ML that you expected from econometrics. Inference and causal interpretation. For the most part, ML strives for prediction and discovering patterns, not causality. For some models, you can't even say which variables are the most important in predicting the results.
Yes, neural networks may not be used in explaining the causal effect of minimum wage on unemployment. But neither can you really expect (multinomial) logit to be used to recognize hand-writing. It's all about using the right tools in the right applications — and I think econometrics taught you a lot about that.
You actually know how to write coherent sentences
Data science isn't just fancy algorithms, though. Unless you are an academic researcher who only writes theoretical papers (in which case you probably wouldn't be reading this anyway), presentation and writing are big parts of data science. Just as they are in economics.
If you work as a data scientist anywhere in the “real world”, you'll have to present your results to non-technical audiences — managers, marketers and copywriters, customers and clients. And you'll have to be able to show why your results matter and how normal folk can use it and act on it.
As economists, I'd wager you've written your fair share of papers, essays, reports, presentations and dissertations in your time at university. Don't underestimate this skill. In fact, it probably puts you well ahead of most of computer scientists and mathematicians when it comes to presenting and explaining your work clearly — and putting together longer pieces of texts that have structure and logic behind them.
Python isn't hard
Alas, you will probably also have to write code, not just words, if you want to work in data science. But it's not like economists don't have to write code, too. True, Stata isn't a “proper” programming language, but it's a great introduction to statistical computing. And if you go on to graduate studies, many economics programs have you learn other languages anyway — Python is very common, as is R and Matlab.
Fortunately, Python's become the programming “lingua franca” of data science. Not only has it got a great selection of libraries (Numpy, Scipy, Scikit-learn, Statsmodels, Pandas, Matplotlib, Seaborn…), but it's also a very legible and easy-to-learn language and you've probably come across it anyway.
And if you haven't, just learn Python. R may be powerful too, but the syntax is an abomination and it's kind of slow with bigger datasets. Matlab is a commercial software, and while it is great (and fast) at mathematical computing and it has an open-source alternative (Octave), it's not that common. Julia is too obscure and still a bit too young.
So why no one tells you this?
Apparently, economists should make great data scientists. So why no one tells them in university that this is a very real career choice? For one, it's all relatively young. And course prospectuses are slow to change — favoring more traditional options in finance, academia, government…
But I also think there is a bit of prejudice in the economics world against data science. That it's beneath an economist to go into data science. That they are concerned with greater issues.
Which is a shame. Because economics gives its graduates a very unique blend of technical/statistical and soft/human skills that are much harder to come by in the mathematic and CS departments. And perhaps data science positions would benefit from having careful econometricians do the job — people aware of all the possible shortcomings of data mining and just trying all that might work. Just as econometricians might learn from ML when it comes to testing and cross-validation and algorithmic approaches to estimation.
So give it a try. Follow the links in this article. See if it catches your fancy. And don't think that just because you don't know what Hessians are, you can't go into machine learning.
(This isn't meant to be a guide for economists on how to become data scientists. But it should give you plenty of things to think about — and expand your range of possible career options. I may write more specific “tutorial” articles later.)