Becoming Data Scientist without a Computer Science Major

Muhsina E.
9 min readDec 13, 2018

--

Harvard Business Review described Data Science as the sexiest job of the 21st century.

Data Science is all about manipulating large and unstructured data sources and creates insights from them.

Data Scientist has become one of the most sought-after disciplines in our world of increasing data. And data scientists seem to come from the diverse educational background. Even though people ask questions like,

”Can anyone from a non-computer science background be a data scientist?”

”How can I be data scientist with a non-computer science background?”

The most common questions from some aspiring data scientists. In fact, the same questions which I googled 7 months back.

There are so many people from diverse backgrounds trying to get this hottest job. Some of them think this job is meant for people from the computer science background.
My educational background is in Finance and Accounting. When I started thinking about a career in data science, I had lots of questions flying over my head. What programming language do I have to learn? Which online source will help me learning to program? etc..

I thought it would be better if I get answers for these questions from people who are currently working as data scientists and from the non-computer science background. But I could not get connected to anyone like this to clear my doubts.

When I came across the Kaggle Data Science & Machine Learning Survey 2018, I decided to do an exploratory analysis on the data. Especially, on the responses from people who already working as a data scientist and with a non-computer science major. This is my first independent data analysis project. Doing this analysis, I was trying to narrate some ways that may help aspiring data scientists without a computer science background to achieve their dream.

Types of Data Scientists

Data science is a complex and often confusing field. Data science combines several disciplines, including statistics, data analysis, machine learning, and computer science.
According to a featured article in Udacity, there are four types of data science jobs.

1. The Data Analyst

There are some companies where being a data scientist is synonymous with being a data analyst. Your job might consist of tasks like pulling data out of SQL databases, becoming an Excel or Tableau master, and producing basic data visualizations and reporting dashboards. You may on occasion analyze the results of an A/B test or take the lead on your company’s Google Analytics account.

2. The Data Engineer

Some companies get to the point where they have a lot of traffic (and an increasingly large amount of data), and they start looking for someone to set up a lot of the data infrastructure that the company will need moving forward. They’re also looking for someone to provide analysis. You’ll see job postings listed under both “Data Scientist” and “Data Engineer” for this type of position. Since you’d be (one of) the first data hires, heavy statistics and machine learning expertise is less important than strong software engineering skills.

3. The Machine Learning Engineer

There are a number of companies for whom their data (or their data analysis platform) is their product. In this case, the data analysis or machine learning going on can be pretty intense. This is probably the ideal situation for someone who has a formal mathematics, statistics, or physics background and is hoping to continue down a more academic path.

4. The Data Science Generalist

A lot of companies are looking for a generalist to join an established team of other data scientists. The company you’re interviewing for cares about data but probably isn’t a data company. It’s equally important that you can perform analysis, touch production code, visualize data, etc.

“Some of the most important ‘data generalist’ skills are familiarity with tools designed for ‘big data,’ and experience with messy, ‘real-life’ datasets.”

Note: Throughout this analysis, Data Scientists means people who responded their current job title or role as any of Data scientist, data analyst, or data engineer in the survey.

Data Scientists with Non-Computer Science Major

It’s widely assumed that you will need a formal education in Computer Science to pursue a career in data science. The definition and job description of data scientists vary from company to company. But it’s clear that a data scientist should be able to manipulate large and unstructured data and create insights from them.
Studies have shown that data scientists come from diverse backgrounds.

Out of the 23859 responses in the Kaggle Survey 2018, 25% of respondents are currently working as data scientists.

Data Scientists Responded to Kaggle Survey 2018

I think you have got the answer to the first question.Can anyone from a non-computer science background be a data scientist?

Of course, yes!

67% of the data scientists are from no-computer science backgrounds. They are from diverse backgrounds like Social science, mathematics, and statistics, business disciplines, fine arts, humanities, etc.

It’s time to know how to become a data scientist without a computer science major.

1. Find out if it’s really for you

We know data scientists are from diverse educational backgrounds. Before seeking how to learn the skills needed, make sure it’s really for you. It requires continuous learning and practicing of complex concepts.
Let’s get to know the selected data scientists in detail.

Education

We identified a set of working data scientists with a non-computer science major from the kaggle survey data. They are 4073 in total. What about their highest level of formal education?

95% of the data scientists are having at least a bachelors degree.

Activities Important as a Data Scientist

It’s important to know the daily activities you’ll have to do in your work once you are a data scientist.

The important activities of a data scientist are to analyze and understand data to influence product or business decisions. Along with, you may have to do the following:

  • Build prototypes to explore applying machine learning to new areas
  • Build and/or run a machine learning service that operationally improves my product or workflows
  • Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data
  • Do research that advances the state of the art of machine learning

Time spend for coding

58% of the data scientists use more than half of their time actively coding. If you want to be a data scientist, it’s obvious you’ll have to spend some time coding.

Primary Tools and IDEs

More than half of the data scientists use local or hosted development environments like R Studio, Jupyter Lab, etc.

35% of the data scientists use Jupyter Notebook/ IPython and RStudio.

Programming Languages

Python take the leading position among data scientists. 70% of the data scientists use any of Python, SQL, and R for programming. 78% of the data scientists use Python or R regularly at work.

Majority of the data scientists recommend aspiring data scientists to start by learning python.

Python is the most popular language among data scientists.
What programming language would they recommend as an aspiring data scientist to learn first?

Yeah, It’s Python!

Data Scientists who use R most often also strongly recommend Python to learn first.

If you still need a career in Data Science, better you start with learning Python.

2. Acquire Additional Skills

Even though you are not from a computer science background, you may know some programming languages. Understand what skills you have and what not.
There are many ways to gain knowledge and build a career in data science including online courses, blogs, YouTube videos and more.

Online Learning Platforms

  • Coursera

Coursera courses last approximately four to ten weeks, with one to two hours of video lectures a week. These courses provide quizzes, weekly exercises, peer-graded assignments, and sometimes a final project or exam. Courses are also provided on-demand, in which case users can take their time in completing the course with all of the material available at once.

  • Datacamp

DataCamp is a time flexible, online data science learning platform offering tutorials and courses in data science. You can learn languages like Python and R. In addition to to tutorials you can do projects in DataCamp. This is one of my favorite platform to learn data science.

  • Udemy

Udemy is a learning platform. Unlike academic MOOC programs driven by traditional collegiate coursework, Udemy provides a platform for experts of any kind to create courses which can be offered to the public, either at no charge or for a tuition fee. Udemy provides tools which enable users to create a course, promote it and earn money from student tuition charges.

  • edX

edX is a Massive Open Online Course (MOOC) provider. Courses may consist of video and text content, discussion forums, and a number of problem and assessment types. The majority of edX courses are entirely free to access and most also offer an optional verified certificate track for a fee that varies per course.

  • Udacity

Udacity is built with topic specializations called “Nanodegrees,” and each of these tracks are in collaboration with big companies and ML projects, like Amazon, Google, IBM Watson, etc.

Udacity is a good platform overall, and they do a great job helping students build a portfolio during each program.

  • Kaggle Learn

Kaggle learn is a free, practical, hands-on courses that cover the minimum prerequisites needed to quickly get started in the field. everything is done using Kaggle’s kernels. This means that you can interact and learn.

80% of the data scientists have used at least one online platform of Coursera, DataCamp, Udemy, Udacity, edX and Kaggle Learn. More than half of them have spent most of their time learning from Coursera and DataCamp.

Other Online sources

The field of data science is broad and constantly evolving. Whether you’re a student or new professional working in the field of data science, some resources are valuable for discovering the latest employment opportunities, finding tutorials for the processes and systems you’re using on a daily basis, learning hacks and tricks to boost your performance, and connecting with other professionals in your field.

You will get great stories and updates about data science from Medium, Kaggle Forums, KDnuggets blog, etc.

Your knowledge about data science needs to be updated. You can get knowledge on data science topic by following media like Kaggle forums, Medium, KDnuggets blog, etc.
It’s always better to follow multiple media.

3. End-to-end Projects

Acquiring basic analytical and programming knowledge, and course certification is not enough to get a job. Doing real-world projects helps to boost your knowledge and start your career in data science.By showcasing these projects in your portfolio, recruiters can easily evaluate your potential.
The first step is to find a dataset to work with. You can download lots of public datasets from various sites.

Data scientists deals with different type of data like, numerical, categorical, time series, text, etc. If you find public datasets most of the datasets will be any of these types. These projects may include, cleaning data, data wrangling, training a model with machine learning, creating visualizations,etc. But these will definitely boost your data science skills.

If we asks data scientists about doing projects, majority of their opinion is independent project are important than academic achievements.

The important factor which will highlight you as a data scientist is the projects you have done.

Conclusion

Data Science is interesting, at the same time a challenging career.It doesn’t matter from which educational background you are. If you want to pursue a career in data science, acquire the skills that you don’t have and do real-world projects to polish your skills. The important part is to showcase your skills through projects.

Pick a topic you are curious about, dig in deeper, get your hands dirty with data.

Explore the dataset, analyze and write a story about it.

Get Hired!

Note: All these findings and visualizations are based on the project I have done using the dataset from Kaggle Machine Learning And Data Science Survey 2018.

References

1. Storytelling with Data
2. https://medium.com/activewizards-machine-learning-company/top-20-python-libraries-for-data-science-in-2018-2ae7d1db8049
3. https://www.learndatasci.com/best-data-science-online-courses/
4. https://www.ngdata.com/top-data-science-resources/
6. https://www.sharpsightlabs.com/blog/highlight-data-in-ggplot2/

--

--

Muhsina E.

Data Analyst | Data Science Enthusiast| Commerce Major | Loves Statistics, R, Python, food and movies