Data Science’s Evolution, and Mine

Caroline Clark
The Startup
Published in
5 min readSep 4, 2020

There’s perhaps nothing that sets the 21st century apart from others more than the concept of data. Every interaction we have with a connected device creates a data record, and beams it back to some data store for tracking and analysis. Internet-connected devices are ubiquitous and growing. In 2018, there were approximately 8 connected devices per person in the United States. That number is expected to grow to 13.6 by 2023.¹

The vast amounts of data that are being collected by organizations and individuals have enabled ever more powerful — and transformational — machine learning algorithms. Machine learning and artificial intelligence (AI) shape our experience when we use a search engine, visit a social media website, or interact with a large company’s customer service. AI enables SpaceX to safely land its rockets back on Earth for reuse. It fuels a growing population of robots in manufacturing, generates novel chemical compositions for drug research, and brings the possibility of fully autonomous vehicles closer every day.

Yes, advances in compute power and better algorithms have also been a critical part of this advancement. But without good data, hardware and mathematical equations can only do so much. “Garbage in, garbage out” as the old adage goes.

Data Science vs. Machine Learning vs. Artificial Intelligence

It’s probably useful at this point to discuss what we mean when we talk about data science, machine learning, and artificial intelligence(AI).

Historically, data science has involved the process of analyzing data to gain insights, typically business insights. As Andrew Ng explains in his Coursera course, AI for Everyone, the output of a data science analysis would typically be a PowerPoint presentation (though this isn’t necessarily the case anymore — more on that in a moment).² Such an output would typically serve key stakeholders in an organization or on a project.

One of its pioneers, Arthur Samuel defined machine learning as “the field of study that gives computers the ability to learn without being explicitly programmed”. The output of a machine learning project is typically some type of software, for example an algorithm that automatically optimizes listings you see on a job search site based on a variety of factors. Such an output could serve thousands, millions, or even billions of users.

The Evolution of Data Science

Artificial intelligence is the field of study involving how to build intelligent machines, typically with at least human-level performance on a given task (narrow AI) or on a diverse set of tasks (artificial general intelligence — AGI). We don’t know when we will reach AGI, or how we might know when we reach it.³ But in recent years, researchers and practitioners have achieved human-level or better performance on a variety of tasks using a specific type of machine learning called deep learning. Deep learning leverages an artificial neural network architecture, so you might see deep learning, neural networks, and AI used interchangeably in some settings.

Evolution: Data Science’s and Mine

Advances in deep learning are being increasingly leveraged by data scientists to develop both useful insights and products. Take for example the analyst who uses a a natural language processing algorithm to analyze customer sentiment regarding a new product, and presents the findings to an executive team. Or the data scientist who builds a recommendation engine and delivers this software to an engineering team for back-end integration.

The rapid evolution of these fields, easy access to powerful compute platforms, and ubiquity of high-quality technical MOOCs (Massive Online Open Courses) contribute to the blurring of lines between data scientists, machine learning engineers, and even deep learning engineers.

Google’s search algorithm is probably the most widely used and under-recognized machine learning technology of the past 20 years. I began my career at Google and spent six years working in a variety of roles including on search and analytics teams. A lot of this work came down to helping customers optimize their usage of Google’s algorithms. Even during these early days (2008–2014), we were actively using machine learning-powered tools to provide both insights for our customers and automated campaign solutions. But the truth was this was only the infancy of the AI revolution.

Deep learning took off in the public sphere after deep convolutional neural networks started smashing performance records.⁴ I took notice of the disruption in industry. While working as a consultant, I spoke with folks in the field, and embarked on an self-study journey to transition into a machine learning career, absorbing Andrew Ng’s Deeplearning.ai Coursera specialization, among other courses, research papers, and texts. As I started to work with clients in the space through a consulting firm, the experience was extremely rewarding and interesting.

COVID-19 and General Assembly

Enter COVID-19.

Though I was grateful to be in a better position than many folks out there, COVID-19 still led to some non-negligible disruption. But instead of thinking about this thing happening TO me, I wanted to flip the script and do something with the flexibility that came with working from home. As a lifelong learner in the machine learning and analytics space I had always felt like I was missing the data science portion of the puzzle. Back at Google I loved helping clients understand what was going on and what they should do using analytics, but I had gotten pretty far away from that, not to mention the cornucopia of new tools that are being used now to conduct analysis and relay the information in useful ways. After a lot of different conversations with colleagues and many late nights searching for the right solution to upgrade my data science skills, I settled on General Assembly. Specifically, I enrolled in General Assembly’s 12-week Data Science Immersive.

My goals with this course are:

  1. Become a data wrangling master
  2. Build a solid foundation in statistics
  3. Enhance my machine learning knowledge

I’m excited to bring data science skills to my machine learning work in the future. Deep learning isn’t always feasible or necessary in a project depending on the data set and goal — this is where having a robust machine learning toolkit comes in handy. A solid statistics foundation can also be a boon when collecting and evaluating data quality, or when examining the impact labeling errors have on machine learning algorithm performance.

I’ll be sharing some of my journey on this blog over the coming months. If you’re interested, give me a follow.

¹ https://www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/air-highlights.html#
² https://www.coursera.org/learn/ai-for-everyone
³ For more on the challenges AGI presents, see Max Tegmark’s book, Life 3.0.
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

--

--