Achieving Success as a Data Scientist Part 1: So you want to be a Data Scientist?

Byte Brilliance
7 min readJan 6, 2024

Note: This article is part of series designed to guide aspiring Data Scientists. The series is structured in such a way that Part 1 is for complete beginners and increases in complexity later on. Feel free to explore different parts of the series according to your experience level:

Introduction

Welcome to the Achieving Success as a Data Scientist series. As a Senior Data Scientist & Analyst, holding the position of Head of Business Intelligence, I noticed a shortage of Data-related skills, across the African continent. I felt compelled to do my part to try and uplift as many aspiring Data professionals so that together we can begin to position Africa as a global leader.

Anyone who plays Chess, knows that about 95% of the match time is spent by players positioning themselves well enough so that in the last 5% they are simply a few moves away from defeating their opponent. This concept can be applied to achieving success in any realm of human endeavour — if you spend the majority of your time building a strong foundation (i.e. positioning yourself well enough), success is almost guaranteed.

In my first-ever Medium post, Navigating the Data Cosmos: My Odyssey from Graduating to Leading Data Strategies, I recounted the twists and turns that defined my trajectory, leading me to become a Senior Data Scientist & Analyst. In that article, I also provided a roadmap to guide anyone (regardless of background and education) to becoming a strong candidate for a Data-driven career. In this article, I hope to refine that roadmap a little to help those who are literally beginning their journey from scratch. I have made all my posts and tutorials available for free — the only thing I need from you is your time and consistency. If you give me that, I will help you at every step of the way to ensure your journey is as smooth as possible.

Background

Before delving any further into the realm of Data Science, I want to introduce some terminology to get everyone on the same page. Understanding these terms will make it a lot easier to digest the concepts you will be introduced to as you begin to take on courses and projects:

  1. Data:
  • Definition: Raw facts and figures that can be processed to generate information.
  • Importance: The foundation of all data-related work.

2. Database:

  • Definition: An organized collection of data stored electronically.
  • Importance: Centralized storage for efficient data retrieval and management.

3. SQL (Structured Query Language):

  • Definition: A domain-specific language used for managing and manipulating relational databases.
  • Importance: Fundamental for querying and interacting with databases.

4. Data Cleaning:

  • Definition: The process of identifying and correcting errors or inconsistencies in datasets.
  • Importance: Ensures data accuracy and reliability.

5. Machine Learning:

  • Definition: A subset of artificial intelligence that focuses on creating algorithms and models that can learn from and make predictions or decisions based on data.
  • Importance: Integral for predictive analytics and automation.

6. Data Visualisation:

  • Definition: The representation of data in graphical or visual formats to aid understanding.
  • Importance: Facilitates easier interpretation and communication of data insights.

7. Data Analytics:

  • Definition: The systematic computational analysis of data or statistics.
  • Importance: Provides valuable insights for decision-making and strategic planning.

8. Python:

  • Definition: Python is a versatile and high-level programming language known for its simplicity and readability. It has extensive libraries and frameworks that make it widely used in data analysis, machine learning, and other data-related tasks.
  • Importance: Python has become the de facto language for data professionals due to its rich ecosystem of libraries like Pandas, NumPy, Matplotlib, and Scikit-learn. Its ease of learning and readability make it an excellent choice for data manipulation, analysis, and even building machine learning models. Many data-centric roles require proficiency in Python, making it an essential tool in a data professional’s toolkit.

While I am confident that these terms are sufficient for a complete beginner, I will note that there are many others you will encounter as you complete courses and start working on projects. For example, Supervised and Unsupervised Learning are sub-domains of Machine Learning. Further, Classification and Regression are different types of problems that fall under the Supervised Learning banner. To not overwhelm yourself, however, I suggest that you begin with the terminology I have listed here and compile a list of the different terminology you encounter as you work through the various courses I will list later on.

Pros of becoming a data scientist:

  1. High demand: Data scientists are in high demand across various industries, offering ample job opportunities.
  2. Competitive salary: Data scientists often enjoy competitive salaries due to the specialized nature of their skills.
  3. Diverse applications: Data science is applicable in diverse fields such as healthcare, finance, marketing, and more.
  4. Constant learning: The field is dynamic, requiring continuous learning, which can be exciting for those who enjoy staying updated.
  5. Impactful insights: Data scientists play a crucial role in deriving valuable insights from data, influencing decision-making.

Cons of becoming a Data Scientist:

  1. Steep learning curve: Learning the necessary skills in statistics, programming, and data manipulation can be challenging for beginners.
  2. Intensive time commitment: The job often requires long hours, especially when dealing with complex datasets or tight deadlines.
  3. Need for domain knowledge: Effective data analysis often requires understanding the specific domain, which may demand additional learning.
  4. Uncertain outcomes: Not all projects lead to impactful insights, and some may not yield the expected results.
  5. Rapid technological changes: Keeping up with evolving technologies in data science can be demanding and may require continuous adaptation.

One of the motivations for starting this blog series was to help mitigate some of the challenges you may face along the way. Below I have compiled a list of (free) courses for complete beginners to take on. While these may not encompass everything there is to know (because there is so much!), I have personally taken the time to investigate the syllabi (yes, that is a word) of each course and these are sufficient to give you a strong foundation!

Courses:

Remember, this article was written with complete beginners in mind (i.e. people who have little-to-no coding experience and have not completed a Computer Science degree). Although it may seem tempting to start with paid courses to obtain the certifications, I would instead start with free courses to find your footing. It may very well turn out that you do not enjoy the Data Science path and it’s better to figure that out before committing financially. Of course, once you’ve completed the free courses and have completed a few projects, you can always choose to complete a few paid courses to gain the certifications that will increase your chances of landing your first job.

Free Courses:

Data Science has been referred to as the Sexiest job of the 21st century, as such, there are a plethora of courses out there. It can be a bit overwhelming to sift through all of these and pick out a handful that are suitable for your particular background and experience level. My aim is to simplify this process for you:

  1. Python for beginners: If you have not coded before, this Udemy course is an excellent place to start. You will learn the fundamentals of programming, specific to the Python language [2 hours].
    1.1. Object Oriented Programming (OOP) in Python: A Data Scientist who knows the concepts of OOP is an invaluable asset to any organisation [2.5 hours].
  2. The Complete SQL BootCamp for Beginners 2024: You may be an expert in Python, but without SQL you will not get very far in any Data career. Building a strong foundation of SQL is one of the most important things you can do early on [2 hours].
  3. Data Analytics in Python: With a foundation in Python, you are ready to start learning about Data Analysis [1.5 hours].
    3.1. Data Visualisation in Python: Data Visualisation is more about generating pretty pictures. This course will help you use the power of Visualisations to draw meaningful insights from your data [1.5 hours].
  4. Machine Learning with Python: This course will introduce to the realm of Machine Learning [1.5 hours].

Of course, there are hundreds (if not thousands) of other free courses online, however I have taken the time to inspect each of these courses to ensure that you are getting great content to help you understand the basics of each of the abovementioned topics. Feel free to explore any other courses you come across and drop any good ones in the comments for our peers to benefit from as well!

If you would like suggestions on paid courses to gain certifications, please reach out to me via email, LinkedIn or Instagram and I will be glad to work with you to get the best return on investment.

Thank you for reading! Please follow and join me in exploring the boundless possibilities that a career in Data Science can offer. The journey is challenging, but the destination is worth every step.

--

--

Byte Brilliance

Data Science information, tutorials, and advice from an industry expert with multiple years of experience.