# **Back to school to become a Data Scientist. Is it necessary? [Part 1]**

While there are numerous ways to break into data science, pursuing a postgraduate degree appears to be the most formalized path. Before deciding whether it is worthwhile to invest time, money and energy, you may want to understand more about the fundamental concepts of data science and the commonly required skill sets. Then, we can explore possible channels to acquire this knowledge. Finally, we can judge whether a postgraduate degree is a good start for stepping into the domain.

**What is data science?**

To me, data science is “answering questions with data.” In the eBook The Art of Data Science [here], Professors Peng and Matsui illustrated data analysis in five processes.

Admittedly, these processes sound entirely theoretical; therefore, for better understanding, I have summarized this pipeline in seven critical steps based on my learning and training.

**Step 1:** Ask the right questions

What are the objectives in this analysis for you and your organization? Data scientists often work with stakeholders (e.g., business partners) to identify their problem(s) and formulate the right research questions. A stakeholder may come to you and say, I want to boost the sale of a certain product, what should I do? Spending time interviewing your stakeholder, coming up with more concrete questions might help them to solve the problem easier. Better questions could be, for example, which channel has the most clickthrough rate? Which customer group has the most potential to buy our product by age?

**Step 2:** Collect relevant data

What kind of data is needed to solve these questions? Where can you collect this data? What data size will be statistically sufficient to perform this analysis? Is there any privacy concern about the data you are about to collect? There are much more things you may want to consider than just “get the data”. Some companies may have a data engineering team. However, as a data scientist, you still need to be aware of the Extract, Transfer, Load (ETL) process. Good data leads you to good analysis.

**Step 3:** Exploratory Data Analysis (EDA)

It is critical to visualize your data before starting an in-depth analysis. Charts and tables are your friends to help you understand the data set and potentially identify any abnormalities. Spend time to investigate before making any assumptions.

Source: https://medium.com/analytics-vidhya/exploratory-data-analysis-eda-data-science-project-829f00c5716f

**Step 4:** Identify your analytical goals

Is your question a statistical inference problem or a prediction problem?

If it is a prediction problem, is it a regression or classification problem?

This step helps identify the appropriate tools for further analysis.

**Step 5:** Build your model

As you decide your research question and analytical goals, you develop more clarity about the framework; thus, you can select appropriate models and test them out.

**Step 6**: Always visualize and interpret your results

You may have some numbers or machine learning models. Some deep learning approaches can be black boxes, some are not. Anyhow, you should always dedicate some time to interpret the results and see if they make sense to you.

**Step 7:** Review the process. Does it answer your question?

You probably cannot wait to share your findings at this point. However, it is always good to revisit the questions. Can your results deliver all the answers you need? If not, what is missing?

Data analysis is not a straight-through but an iterative process. We keep asking ourselves whether the data, analysis and results can sufficiently answer our question. If not, try to look at the data from another perspective, then analyze it differently. Alternatively, check if anything is absent, collect new data, and repeat the process. Being enthusiastic about finding answers from data is the key to success in data science.

**What skillsets are helpful as a data scientist?**

After having a sense of what a basic data analysis process looks like, you may have doubts about the skills required when working on a data analysis project, particularly a project in the workplace. The most efficient way to find out what qualifications are in demand is by looking into the job descriptions in recruitment posts. It is not uncommon to see data scientists requiring both hard and soft skills.

**Hard skills**

*Execution on different stages of data science analysis*

You should be able to execute all the mentioned steps above, including exploratory data analysis, machine learning model development and model evaluation.

*Statistic knowledge*

In determining your best model and interpretations, you need to understand the fundamental concepts of descriptive statistics and probability theory, which include some critical concepts like probability distribution, statistical significance, hypothesis testing, and regression to make an ideal decision.

*Programming experience*

When working on data visualization and model buildings, they often involve big data handling and complicated algorithms. Programming skills chime in this process. R and Python have libraries specifically designed to ease your work for cases like this. Common data visualization libraries such as Altair, Plotly, Seaborn, and data analysis libraries such as Scikit Learn, Pytorch, NumPy, Pandas are convenient.

*Computational or statistical related degree*

Most of the time, data science jobs require a bachelor’s or master’s degree in computational science or a statistical field. Due to the skill set requirements above, a related degree can help you gain a piece of fundamental knowledge in the field.

**Soft Skills**

*Collaboration skills*

Most of the time, research is not a one-person job, and it makes sense to collaborate with stakeholders and peers to understand the problem.

*Communication skills*

Data science is not just paperwork. Communication skills in written, oral and presentation formats are as essential as hard skills. Stakeholders and your colleagues would want to know your findings from the data.

Here, we talked about data science and what is required to be a data scientist. We can finally answer our question, is a postgraduate degree necessary in the data science domain? Stay tuned for the next chapter.

*This post is contributed by** **Macy Chan**. This article was originally published on** @DataCanOrg**.*

*Stay connected with** DataCan** &** Woman in Data Science Vancouver*!