The Golden Path to become A Full-Stack Data Scientist.

Actual report from 180 days of job shifting to ML field

Akira Takezawa
Coldstart.ml
15 min readMar 25, 2019

--

Welcome to Coldstart.ml which provides variable Data Science information for enthusiastic learners!

Why do you have to read this?

from glassdoor.com on 25th of March, 2019

The shortage of data scientists is becoming a serious constraint in some sectors. — Data Scientist: The Sexiest Job of the 21st Century from Havard Business Review

Nowadays Big Data is one of the most important key resources to get a competitive advantage in business, especially for IT companies. All of GAFA, accelerate their business by applying Data Science Technique. Let me explain a little bit more.

For instance, Google, Youtube uses Recommend Engine which will never let you go, because they suggest contents by completely following your taste. Amazon increases its Gross Merchandise Volume(GMS) by matching user and product very efficiently. Facebook shows Advertisements with higher CVR from extracting critical insights from your demographical and behavioral data. (I used their Ads in the previous job and was surprised at how effective it performed.)

Till now I’ve only mentioned about Internet Giants. However, according to Masayoshi Son who is Japanese CEO of SoftBank Vision Fund’s which is one of the biggest founds all over the world, and investments hover near $45 Billion dollars in 2018, left an insightful comment.

image from SoftBank’s unicorn hunter

The impact of Internet is somehow limited in particular domains or industries, like Advertisement, Retail (E-commerce), but AI is different. — Masoyoshi Son

“Next Big Waves” in other industries, which is applying data science methods into Unstructured Data like Image, Natural Language, Sounds, are coming up. I mean, for example, Mobility Industry you can see what Elon Mask does, Robotics you see what’s happening in Amazon’s warehouse, Finance, Healthcare area and so on.

If you are young and wanna be the biggest winner in your career, this Data Science or Machine Learning field could be the most possible choice.

Because from my marketing perspective, a quite simple demand-supply balance analysis, I think some special paid job title like professionals in the Finance field or classic Software Engineer, Business Development position with MBA grad is already taken by our elders. And those skills and knowledge are became commoditized. I mean these are very competitive.

Comparing to them, Data Science field is way messier and still in the mist. Then why not invest your career and passion, intelligence into this challenging field? Welcome to this fantastic ML technology with a full of hope!

In this article, I will focus on how to get a Data Scientist jobs or Machine Learning Engineer posts who will be needed by real industries, or in other words, who will pass the job hunting process with lesser pain. Let’s get started!

— — —

After you read this, you’ll get:

  • The Shortest Path towards A Real Data Scientist
  • The Best Learning Resource everyone trust in this area
  • The Realistic Possibility of your career with ML

— — —

Menu

  1. What is Data Science for Business?
  2. Data scientist vs Machine Learning Engineer
  3. Required Skill for A Full-Stuck Data Scientist
  4. A Golden DS Learning path for a newbie
  5. The Secret Bibles towards A Full-Stack Data Scientist

— — —

1. What is Data Science for Business?

Data Science itself is just a Method, Not a Goal in our business. Then definition of it should be explained in much simpler way. — an unknown data scientists

Data Science itself can never be a purpose, if it becomes, it means you already fail. Rather than that, I would say “We just became able to make some additional or novel values in the real business which we couldn’t 10 years before”.

I highly recommend listening to this podcast, SDS 131: The One Purpose to Data Science and The Truth about Analytics from SuperDataScience.

To define Data Science, I wanted to start from their goal and purpose. The fundamental goals of Data Science in business is pretty clear. I think it can be simply described as followings:

  • To make more profit in your business by using data (as a result)
  • To understand and satisfy your customers by efficiency, better matching
  • To create a new business or startup by using machine learning

The background behind this Game Change

In addition, I think it’s very important to understand the reasons why the significance of Data Science is gradually recognized in recent business. It could be the following three:

  • An explosion the Amount of Data by Internet and Smartphone
  • Improvement of Computational power by GPU and TPU
  • Deep Learning enables to Process Unstructured Data

OK, so now we can take the next step to understand how Data Science jobs are separated in real industries. Let’s figure out where to start your career depends on what you have right now and what you should have to get an ideal position for 5–10 years long-span career plan.

— — —

2. Data Scientist vs Machine Learning Engineer

from www.stoodnt.com

In 2018 summer, I decided to build a Machine Learning related career from general Data Analyst job because I was pretty sure this technological innovation will exactly reproduce what we’ve seen a drastic change which was caused by internet and smartphone.

Suppose you already heard about the average salary of data scientist (117,000USD/year!) or understood enough the potential of ML innovation. Here, I only focus on talking to people who want to switch their expertise domain into ML(data science) by taking several years.

After I built my Data Science portfolio and started my job hunting I realized there are mainly two different job title which can work closely with machine learning technology:

One is Data Scientist. Another is Machine Learning Engineer.

As you can see above image, the difference between these two and classic Data Analyst job and Data Engineer(Big Data Engineer) job is relatively easy to describe because they’ve been already existing for more than 10 years and required skills are very clear.

However, I found the required skills and knowledge of Data Scientists and Machine Learning Engineer is quite duplicated.

The reason is simple. Since Machine Learning is the most impactful innovation in recent Data Science field, which even able to create new business and companies like Chatbot startup or Drone startup, the specialist of Machine Learning itself became an undoubtedly respectable job title.

On the other hand, nowadays we also cannot talk about Data Science without taking Machine Learning into account, Data Scientist also must know ML Theory as well. Because most of the companies are currently interested in to accelerate their business by using ML technology.

In fact, Data Scientists might not only have theoretical knowledge of Machine Learning but also, at least, be able to implement several Machine Learning Algorithm like SVM or Random Forest for classification task by using scikit-learn or build Neural Network by using Keras.

Because ironically or interestingly, unless you understand math and statistics deeply, if we want to understand ML theory and Deep learning, it’s an efficient or required way to write codes and implement it by using Tensorflow and scikit-learn or those high-level API.

2–1. The different Area of Expertise between DS and ML engineer

Related skills in DS field from edureka!

I know this highest rated answer for this question from Quora is not enough for you, to clarify your career in this field.

Finally, I found the wall which is unable to climb over between these two jobs in real industries. It was like this:

Professional Machine Learning Engineer can build an “end-to-end software product” which has machine learning algorithm as a part of them.

Professional Data Scientists can define “the problem which should be solved(or not)” with scalability by using by machine learning and “How” as well.

I hope you get some pictures of what I wanna say. Please don’t forget that the final goal and responsible mission of ML engineer are, I think, finalizing to build a moving software. More clearly said, unless you don’t have experience of backend software engineering, it seems hard to get a comprehensive ML engineering position which we can often find on JDs.

I wanted to tell the newbies from a non-engineering background, like data analyst, the reality is that some serious tech companies write neural network from scratch, I mean they even don’t rely on Keras or Tensorflow.

On the other hand, Data Scientist requires outstanding business understanding which is more vague and difficult to prove though (life is hard). But this is so important because in some case, a classical statistics method like Multiple regression analysis can be applied, ML is even not required. And also the application of ML in software requires an enormous amount of time and human resources. It’s necessary to calculate cost-performance balance before a huge investment decision making.

2–2. So is it impossible to get ML engineering job?

image from www.newtium.com

Well, for the enthusiastic ML fresher, I found a suitable position for us in ML projects. That is Data Preprocessing and Feature Engineering role.

In the real machine learning application, we repeat the following main process again and again until we acquire significant enough accuracy:

  1. Data Preprocessing and Feature Engineering
  2. Modeling ML/DL architecture and Training
  3. Model Validation and Hyperparameter Tuning

Then finally ML engineers put this architecture into existing software or define whole architecture at the same time if you build a new product. In this process, it requires more development experiences and knowledge.

— — —

3. Required Skill for A Full-Stuck Data Scientist

What is the difference between Data Scientist and Data Engineer?

Conclusion: In my definition, A Full-Stack Data Scientist is a perfect mix of Data Scientist and Machine Learning Engineer, who can design and build “End-to-End Machine Learning Project and Software”. — by me

After I analyzed over 200 job description of the Machine Learning related position in Japan and India, Singapore, including companies like Google, Facebook, IBM. I found must-have skill towards A Full-Stack Data Scientist Career.

I will divide them into two different categories, one is a visible and more practical skill(more important in terms of getting a job!), another is theoretical and relatively difficult to prove.

You can use the following checklist before you start making a learning plan. It’s very flexible as well depends on JDs which you want to apply.

3-1. Practical Skill (Easy to prove and visualize)

  • Basic Statistical Language: Python, R, Julia
  • Data Science Library: Numpy, Pandas, Scipy, Seaborn
  • ML/DL Library Experience: Tensorflow, Torch, scikit-learn
  • Unstructured Data Processing: Image, Text, Sounds
  • Relational Database: MySQL, PostgreSQL, SQLite
  • Distributed File System: Hadoop, Spark, AWS, MongoDB
  • Container-type virtual environment: Docker
  • Version control system: GitHub
  • Web Framework: Django, Flask, Ruby on Rails

Additional Tips: to make a difference from others

http://jasonkgoodman.com/

I strongly recommend understanding the significance of visualizing your skills. The serious people who already got Data Scientist Job are always good at not only Data Visualization but also Skill Visualization. Please take a look at this impressive DS portfolio of Jason Goodman as the best example.

3-2. Theoretical Understanding (Difficult to visualize)

  • ML Theory: Algorithm, Hyperparameters, Gradient Descent
  • DL Theory: CNN, LSTM, Back Propagation
  • Math: Algebra, Calculus, Probability
  • Statistics: Bayes, Variance and Standard Deviation
  • Computer Science: OS, Server, CPU, Data Structure, and Algorithm

I think you need some trick to prove your understanding of these theories because it’s not something easy to show. Of course, the most powerful one is having the related academic degrees, next is certifications.

Note: I highly recommend building your own publications and writing articles and show your theoretical understanding of this amazing Medium like my Coldstart.ml. More Efficient way is to publish your article via Towards Data Science or Hacker Noon.

— — —

4. A Golden DS Learning path for a newbie

from Data Camp

You don’t have to master all of these tools at the first step of your career, but better to know than nothing. — by James from somewhere

Since the requirement of data scientist job and ML engineer position needed a wide range of skills with a certain quality. That’s why if the market is quite competitive and you want to get closer to their ideal candidates, you can take a step by step. Organize your career plan, not in short terms! I mean you can also start from Data Base Engineer or Backend Engineer to strengthen your IT background as a first step.

For the person who is totally new to this field, like me in 9 months back, I will describe where to start your endless Data Science Learning to perform well in your real business application. It takes 12 steps and please remember you can also acquire these skill through your job. Let’s take a look:

  1. Basic Statistical Language Handling: Python, R, Julia
  2. Data Science Library: Numpy, Pandas, Scipy, Seaborn
  3. Version control system: GitHub
  4. Math: Algebra, Calculus, Probability
  5. Statistics: Bayes, Variance and Standard Deviation
  6. ML/DL Library Experience: Tensorflow, Torch, scikit-learn
  7. ML Theory: Algorithm, Hyperparameters, Gradient Descent
  8. Unstructured Data Processing: Image, Text, Sounds
  9. DL Theory: CNN, LSTM, Back Propagation
  10. Relational Database: MySQL, PostgreSQL, SQLite
  11. Distributed File System: Hadoop, Spark, AWS, MongoDB
  12. Web Framework: Django, Flask, Ruby on Rails

+Job Hunting Technics: networking, resume, interview

As a note, I applied some principle in the above order to focus on getting a job:

  • Build your own Data Science Portfolio, not only study on MOOC
  • Spend 90% of your time to create visible outputs
  • Understanding theory through coding, so code first

That’s why the coding part is coming before theory, through this I want to tell you that Cousera or Udemy doesn’t help enough when you are applying for a real job. Smart people do something different. It’s good to remember!

— — —

5. Secret Bibles towards A Full-Stack Data Scientist

Understand these 5 basic concepts to sound like a machine learning expert

I spent or even wasted a ton of time to reach out the best resources with clean code example about ML/DL. — by Me

It could be a crucial problem when you want to find the learning resources which is really suitable for you. Because there are a ton of learning materials on Google and easy to be drawn in an ocean of information.

In this chapter, I’m not just going to introduce MOOC courses. Because it’s already done by perfectly. If you want to know that, I recommend knowing A leading person of this learning resource discussion for ML. His name is David Venturi who built a completely hand-made Machine Learning Master Degree program by only free learning resource on the internet. Check it out this his post!

Apart from these famous courses, I wanna summarize additional secret resources a lot of learners worship and I also highly appreciates these. I will introduce the following uniques:

  1. Tutorial web site with the simplest codes
  2. Online Class Session from Harvard and Stanford
  3. Online PDF: which you are going to miss
  4. Top 10 Insightful article from Medium

Now please prepare your Evernote and copy them with the title of “Bibles of Data Science”. It’s masterpieces from DS enthusiasts. Don’t worry it’s all free! (I know everyone loves free lunch...) Let’s take a look.

4. Top 10 Insightful article from Medium

— — —

Conclusion

“Better Data > Fancier Algorithms“ — EliteDataScience.com

Since this new field is changing, nobody knows which skills will be commoditized or keep their values longer than others. In reality, the Data Science consulting company who I met on my job hunting process said, “we are going to sell our company

so fast, i. But we need to constantly update our

— — —

Thank you for reading!

I’m still searching for my first job in Machine Learning related filed! If you enjoyed in my article, please visit my portfolio website, and contact me!

My name is Akira, 25y.o., male, from Tokyo, Japan, with endless curiosity.

— — —

10 Recommendations to encourage you for your ML careers!

--

--

Coldstart.ml

Data Scientist, Rakuten / a discipline of statistical causal inference and time-series modeling / using Python and Stan, R / MLOps is my current concern