Tips on how to start with Machine Learning

Start simple! Avoid falling into hype traps.

Renatagotler
4 min readJul 6, 2022
Female software engineer codes at her desk with computers
Photo by @thisisengineering on Unsplash

Many people have asked me how to start their Machine Learning career. Machine Learning is a very broad field, comprehending Programming, Statistics, Mathematical and Business skills all together. It is easy to get lost with so many topics to explore and learn.

Moreover, graduation programs on machine learning specifically are new, many of the data scientists and machine learning engineers came from different fields and are strongly self-taught. Take me for example, I graduated in Industrial Engineering, which provided me with a strong statistics foundation and also on business skills, but not programming, that I had to learn all by myself. I also work with colleagues from different backgrounds like computer science, electrical engineering and more. This diversity is great, it allows us to have different solutions and ideas and we all can grow together, sharing knowledge!

But with all the knowledge to gain and with all the hype around it, I have seen those same people that wanted to start with machine learning trying to begin straight on learning about neural networks, CNN, GAN, transformers and many others, and so they fail or at least have a hard time trying to learn it without having a strong foundations. Well… it is ok if you want to explore complex algorithms, but that is definitely not where you should start.

Data science is not as sexy as many articles make you believe. It is hard and most of the time you won’t use the complex algorithms that everyone is talking about, you will turn to the simple and basic ones that work.

Here is the current scenario:

  • Most companies are still figuring out what to do with machine learning, what will add value, and once they have the model, how to deploy it to production; MLOps is being explored and is still very new.
  • Unless you are in a big tech company, the datasets you will probably use are tabular data.
  • Data is messy and you need to understand how to clean it and make it useful and insightful, statistics are the base for it.
  • If you are skilled in feature engineering and know how to clean the data, you can achieve great results with simple algorithms, that will make deployment faster and easier, and also reduce latency.

Therefore, my recommendation is: stick to the basics! If you want this career, don’t try to skip steps that are essential to the foundations of machine learning knowledge.

Start by learning:

  • Statistics, how to transform the data and why.
  • Best practices on coding, how to make a clean code (use docstrings, linting, with all SOLID principles — modular and reusable).
  • Best practice on modeling, how not to leak your data, how to design, loss functions to use, evaluation metrics to use to validate your model and why use them.
  • Learn about simple algorithms, what are their strengths and weaknesses, how to use them and how to hyper tune them. Tuning is key to great results!
  • Don’t underestimate the communication skills, you will need it to align the project with stakeholders. The way you communicate the model’s results is key to have it approved for production.

Only when the simple doesn’t work you should go for the more complex options. Remember: your goal as a data scientist is to optimize a process or product, and add value to the business, not to use a particular algorithm.

Furthermore, don’t wait to have all the knowledge you want in order to put it into practice. As I stated in the beginning, machine learning is a very broad field and it is fast evolving, you will never have all the answers and all the knowledge, what is important is that you keep learning and applying. While you practice you can also start your own portfolio on Github, post it and ask for feedback, it will speed up your learning progress. There are many reasons to do it:

  • Documenting is great for fixing the content, writing down your logic, what you did and why you did it that way; try to think if someone sees it, will this person understand?
  • Documenting it gives you the opportunity to check it later and see how much progress you have made and also update the project with new learnings!
  • You can also share your portfolio when looking for a job.

Finally, there are many noteworthy machine learning content creators that can help you with this journey, such as Cassie Kozyrkov, Andrew Ng and many more. They also usually have courses or Youtube Channel, you should check them out! If you like hands-on and structured courses, I would recommend DataCamp.

The old fashion books can also be an amazing asset for learning, if you want more technical ones, O’reilly books are mostly great and with good didactics, but please don’t go straight to Deep Learning, start with Practical Statistics for Data Scientist, Fluent Python and the basic ones. For non technical books, there are great ones that can help you with critical thinking such as How to lie with statistics.

Also, if you feel comfortable, you should engage with communities, such as Kaggle, Github, MLOps community and regional ones, let’s grow and learn together!

Thanks for reading! I hope I could help you in your machine learning path. If you liked it, please follow me for more articles and tips about machine learning.

--

--

Renatagotler

Machine learning engineer, passionated to solve problems with data.