Data Science: Surviving the Shortfalls of a Liberal Arts Education

Gretta Digbeu
6 min readApr 21, 2019

$83,318.78. That’s the balance on my student loan account. If you had told me ten years ago that pursuing a Liberal Arts education at two of the most prestigious universities in the world would be a very poor investment of my time and scant resources, I would have scoffed and pegged you as an old school, narrowminded cynic. Because I blindly and wholeheartedly believed that my flair with words, elaborately constructed ideas about how to change the world, hyper-developed critical thinking skills, and a healthy level of intellectual curiosity (aka soft skills) were all I needed to make it in the workplace and have a well-paying and fulfilling career.

My friends and I like to joke that they “really had us drinking the Kool-Aid”. We were so drunk on the Kool-Aid of Liberal Arts that we actually believed we didn’t need a set of hard skills and technical aptitudes to build successful careers. “A Liberal Arts education isn’t supposed to teach you a job. It’s meant to teach you how to think. And knowing how to think, that’s the key to success.” Ah, the Kool-Aid. We couldn’t have been more wrong. Fast forward ten years and I hold degrees from both Georgetown University and the London School of Economics, but I feel cheated. Cheated out of an opportunity to acquire skills that carry true, remunerative, and quantifiable market value. Skills that are actually high in demand and translate into a salary commensurate with the debt I owe to the U.S Department of Education.

So I decided to go back to school. This time around though, the fees are minimal, the learning takes place exclusively online, and the hours are up to me. For the past four months, while holding my job a data analyst on a contract for USAID, I have been taking back to back Data Science MOOCS. Watching edX videos and completing probability, statistics, visualization, data wrangling, and machine learning exercises has become my second job. I dedicate my weekends an almost all my spare time to studying, and I love it. I finally feel like I am gaining skills that will translate directly into tangible career opportunities, and that makes me feel empowered and highly relevant. Empowered because as a woman of color I am delving into a rich, multidisciplinary field of mathematics, computer science, and statistics, which to this day is overly representative of my white male counterparts. Highly relevant because in this era of big data and vertiginously rapid technological innovation, and just 10 years after the official creation of the Data Scientist position, employers of all types are vying to collect, parse, mine, model, analyze, and visualize data to monetize on its insights and achieve various business goals.

I’m barely beginning my journey of becoming a full-fledged Data Scientist, but I feel energized, motivated to learn, and confident about the return on my investment in a way that I never have before. And it turns out that my liberal arts education may be an asset after all, because it’s one thing to master coding in Python, statistical inference, predictive modeling, pattern recognition, and function approximation, and another thing altogether to effectively convey the results of applying concepts like p-values, confidence intervals, random variability, conditional expectation, variance, cumulative distribution, regression, normality, Type I and Type II errors, sensitivity, specificity, precision, cumulative accuracy, backward, forward, stepwise selection, and loss minimization to a layperson. If there’s one thing I learned how to do expertly during my years of studying history, literature, theology, sociology, political economy, and economics as a social science, it’s how to make anything intelligible by putting it into its appropriate context. So here are a few insights I gleaned from learning statistics for data science over the past few months:

  • There are two systems of inference in statistics: frequentist inference and Bayesian inference. Very simply put, frequentist statistics draw conclusions from sample data by stressing frequencies and proportions (how often outcome X occurs across samples), while Bayesian statistics stresses the relationship between ‘prior” (historical) distribution and sample distributions to build and draw conclusions from a “posterior” (conditional) distribution. Tools like hypothesis testing and confidence intervals are derived from frequentist statistics, while conditional probabilities are a product of Bayesian statistics.
  • Machine learning refers to the use of data to estimate the conditional probability of a given outcome for any combination of a set of features (aka predictors aka covariates). Features are considered to be random variables. Machine learning algorithms take feature values, use them to train a model on known outcomes, then uses that model to predict unknown outcomes. For this reason, linear regression can be considered a machine learning algorithm (though it is way too rigid to be useful).
  • Regression estimation can only be used on normally distributed data, and the regression equation gives us the conditional expectation of our outcome variable given all possible values of our predictor variable(s).
  • The normal distribution is just one kind of theoretical probability distribution and can only be applied to continuous variables like height and weight, which can take on any value in a given range. Other distributions include the binomial, student-t, chi-squared, gamma, exponential, beta, and Poisson distributions. The binomial distribution, for instance, is the basis for the popular tests of statistical significance used in hypothesis testing, from which we derive p-values and significance levels.
  • Who says probability distribution says random variable because probability distributions are only defined for random variables.
  • A random variable is just a variable whose possible values are the outcome of a random phenomenon.
  • Almost all statistics generated in data science (expected values, correlation coefficients, standard errors, variance, least square coefficients, confidence intervals, probabilities, and probability distributions, mean squared errors, the sum of squares, etc.) are random variables. That’s because data scientists work almost exclusively with sample data.
  • When dealing with continuous random variables, there is no such thing as the probability of a single value.
  • There is no such thing as a sophomore slump, in sports, academia, or anything else for that matter — it’s all just regression to the mean.

This is just the beginning, and I’m not stopping anytime soon. I am a lifelong learner, and one of the most important lessons I’ve learned in my short career is that if you want to be successful and professionally fulfilled, it’s not enough to be an expert in something you enjoy. You need to become an expert in something that is relevant and indispensable to the demands of the job market because we live in a capitalistic society that assigns a higher market value to certain classes of skills. We all know that soft skills and qualitative aptitudes are indispensable to professional advancement in any context, place or time. But let’s not fool ourselves. With a few exceptions, the fields of science, technology, engineering, and mathematics remain the most prized and lucrative professional domains, and over the past decade statistics, data analysis, and machine learning have converged to make “Data Scientist” one of the most prized and attractive skillsets in the U.S. I’ve decided to not cheat myself again. So I’ll keep pushing through the often glitchy and ridiculously difficult DataCamp exercises of the data science MOOCS because the prospect of finally getting a return on that 80K investment keeps me energized and eager to learn.

Finally, as navigate the videos, comprehension checks and coding exercises of my courses, I realize that my journey carries a symbolic value that is much bigger than me. Sure, I’m determined to become a legitimate and fully competent Data Scientist with a salary corresponding to my market value, intellectual aptitudes, and my professional and academic achievements. More importantly though, I am eager to become a role model for younger women of color who’ve yet to apply to scholarships and universities, yet to decide on their secondary or post-secondary courses of study. I want to be part of a movement that brings more women of color to the sea of white males in well-paid, quantitatively focused, and technologically intensive professions. So that these young girls can see a reflection of themselves in a cutting-edge STEM role and resolutely know to stay far away from the delusions of a Liberal Arts education.

--

--