My Development Plan as a Junior Data Scientist

Desmanda mbarek
5 min readDec 29, 2019

--

We all had been struggling at the beginning of our career. Professional life is a completely different world than academia.
Your work should bring value to the customers, your code should run in production and of course, you should write unit tests.
As any beginner, I faced ups and downs during the first 9 months working as a junior data scientist.
In this article, I will share with you my personal development plan to improve myself as a data scientist in my first professional experience.

Here are my 8 simple rules of play which are organized into three folds data science, software engineering, and soft skills:
As a beginner in data science you should:

1- Take the time to explore your data:

I like building fancy models and trying out many algorithms to solve a problem but sometimes I miss a very important step: getting to know the data. Many data scientists could recognize this pattern at the beginning of their career (what are the top 5 mistakes junior data scientists do?)
Data is the source of learning, if you don’t know what you have, you can’t deliver results that you expect. Exploring the data is an important step before diving into machine learning models.

2- Choose an area: You ‘can’t’ know everything:

I have been always curious to learn about the latest scientific papers and to follow the latest AI news. But after a couple of months, it was hard for me to recall details or to have the time to explore this flow of knowledge. I asked myself this question do I need to know everything? The answer is: No. Many senior data scientists told me the same:

Choose one thing and be an expert, you don’t need to know everything.

The next question is finding what you like.
This requires more time to know your strengths and your passion. The nature of your job could help you for better orientation, but don’t rush.

3- Actively work on data science projects

You can be a data scientist and you spend two or three weeks putting your new algorithm in production. We can’t have an exact estimation of the percentage of time working only on research tasks. However, you can always keep yourself busy offline working on small challenges. This will keep you actively busy with ML approaches and your knowledge up-to-date. This is can be achieved for example by working on an data science competition on platforms like Kaggle, Zindi or by just doing ML related online courses.

4- Fill in the gaps:

We all know what is Bayes’s theorem. When it comes to more advanced statistical knowledge, I personally need to lookup on some books to recall some details.
This is quite common for a data scientists with a computer science background. It is recommended at this point to refresh the statistical knowledge with some books.
Here is my list of books about statistics that are helpful to dusting off my statistical skills:

Practical statistics for data scientists

Principles of statistics

The art of Statistics Learning from data

5- Learn how to write clean code:

We all rush to write code and test new models (that’s the fun part). Jupyter notebook is a very convenient tool in data science. In production, unfortunately, we can’t ship the notebook. Working in an agile team, your code will be reviewed by other colleagues. Sometimes, you need to rewrite everything from scratch to keep up with code standards. Here my simple checklist before requesting code review:

  • Avoid condensed code using new line breaks and correct formatting, your code should be easy to read
  • Keep your function as simple as possible, break it into small pieces if it looks too complex, let the function tell a story with code
  • The naming of your functions should reflect what the function is doing, but avoid writing a long name, keep it simple but insightful (hard I know :))
  • Use type hint in python
  • Write unit tests for every function you write.

I recommend reading the book clean code to understand code standards and to apply them in your day-to-day work.

6- Learn a second programming language besides python:

Python is the most used language in data science (check this medium article). However, learning another language helps you be a hands-on person.
A recommended language is Scala. Scala is a functional programming language used nowadays with Spark for building data pipelines. It will help you speed up your work by writing faster data pre-processing pipelines.

But now imagine if you have all those skills but you are not able to explain to other people what you are doing. It is like working for an online competition but not submitting your code in the end. Communication is the way to show off (I mean it) your skills. It is very important if you want to ensure that your team understands your results and decisions.

Here is what I am doing now to improve my data science communication skills:

7- Do more presentations

Whenever you work on a new task, start with a presentation to share your ideas. Take this chance to explain to different audience your approach (product managers, software engineers). A follow-up approach is to present your results after working on this task. Present, execute, share is an iterative process that should be included in your day-to-day work. Presentation is helpful to teach yourself and others the subject. In case this is a time-consuming task for your team, try to take 15 minutes on a weekly basis to explain to yourself in a quiet place (your room for example) your task. This will make you master presentation and helps you to easily explain any subject to any audience.

8-Attend meetups: share your ideas get to know new people

Depends in which city you are, they are a bunch of meetups organized for a given purpose, either to learn machine learning, coding, language or even just to share ideas. Those are safe places to talk to other people, know about their experiences. Those out of office meetings will help you know how to introduce yourself, explain your work and communicate. I have talked to many senior data scientists who recommended to go at least one meetup per week. That’s a good practice for a junior data scientist to get in touch with experts of the domain.

To conclude, data science is not an easy career path. You need to dedicate more from your personal time to follow the track.
It is about perseverance, and self-motivation to keep learning!
My advice to all junior data scientists: never give up!
Cheers!

--

--

Desmanda mbarek

Data scientist, ❤️ machine learning and deep learning.