Before getting started with Machine Learning: Online tutorials
This year I’m teaching a module on Applied Machine Learning with over a hundred students. The students come from very different backgrounds and may not be well prepared in advance to successfully complete this module. To get all students up to speed and level the field, I have compiled a list of online resources on three of the areas which are keys to become a successful Machine Learning practitioner: programming (Python), basic Mathematics and the usage of the command-line and virtual environments. In general, all these online courses and tutorials are suited for those interested in data science and machine learning. Moreover, most of these tutorials are short (i.e. they can be successfully completed in no more than a couple of weeks).
The main programming language for machine learning is Python. The advantage of Python with respect to other languages is twofold: it is easy to use for beginners and it contains many suitable machine learning libraries (e.g. scikit-learn, Keras or PyTorch) which can make your job much easier.
- Python Tutorial to Learn Data Science from Scratch: https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/
- Python Data Science for Beginners: https://www.kdnuggets.com/2019/02/python-data-science-beginners.html
- Machine Learning Project in Python Step-By-Step: https://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Machine learning is all about Mathematics. The good news is that you don’t need to be an expert in maths to be a competent machine learning practitioner. However, you should at least have a general knowledge of some areas (particularly linear algebra and probability/statistics) to be a successful one. Plus, it will also make your learning process much smoother.
- Linear algebra for machine learning: https://machinelearningmastery.com/linear-algebra-machine-learning/
- Linear algebra for data scientist: https://www.analyticsvidhya.com/blog/2017/05/comprehensive-guide-to-linear-algebra/
- Linear algebra (full general course from MIT): https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/
Probability and statistics
- 10 Statistical Techniques Data Scientists Need to Master: https://www.kdnuggets.com/2017/11/10-statistical-techniques-data-scientists-need-master.html
- Probability and Statistics (full general course from Stanford): https://online.stanford.edu/courses/gse-yprobstat-probability-and-statistics
Terminal and Virtual Environments
While it is not crucial to dominate your operating system, it is very useful to be able to use the command line (terminal) to quickly inspect data files and perform basic operations on them. The following tutorials are based on Linux, which is the most widely used operating system for machine learning. Similarly, working with virtual environments is not essential but enable you to keep track of the versions and dependencies required by different projects. Also, depending on your working conditions and administrator rights, you may be forced to use virtual environments in your projects.
- Linux command line for beginners: https://tutorials.ubuntu.com/tutorial/command-line-for-beginners#6
- How to Start Using the Linux Terminal: https://www.howtogeek.com/140679/beginner-geek-how-to-start-using-the-linux-terminal/
- Five Command Line Tools for Data Science (advanced): https://towardsdatascience.com/five-command-line-tools-for-data-science-29f04e5b9c16
Virtual environments (Python)
- Python Virtual Environments, a Primer: https://realpython.com/python-virtual-environments-a-primer/
- Pipenv & Virtual Environments: https://docs.python-guide.org/dev/virtualenvs/
This list is by no means exhaustive, and I would appreciate any suggestion of similar online tutorials in the comments!