Becoming a Machine Learning Engineer | Step 4: Practice, Practice, Practice
The best method to pick up essential machine learning skills fast is to practice building your skills with small easy to understand datasets. This technique helps you build your processes using interesting real-world data that are small enough for you to look at in excel or WEKA. In this article, you will learn of a high-quality database with plenty of datasets and some tips to help you focus your time on what matters to you!
Why practice with datasets?
Following online tutorials will keep you trapped in a dependent mindset that will limit your growth because you’re not learning HOW to solve any problem. Your learning how to apply a specific solution to a particular type of problem. It’s the equivalent of overfitting, which we all know leads to poor real-world performance. If you’re interested in becoming a machine learning engineer, you need to make sure you can generalize to real data. Challenge your self every day and attack problems using a defined process. Practicing your skills using datasets is the best way to do this.
Where do I get datasets?
Luckily for everyone, there is a fantastic repository of machine learning problems that you can access for free.
The Center for machine learning and intelligent systems at the University of California, Irvine built the UCI machine learning repository. For 30 years it has been the place to go for machine learning researchers and machine learning students that need datasets to practice. You can download all of the available datasets on their webpage. They also lists all of the details about it including any publications that have used it, which is really useful when you want to learn researchers attacked the problem. The datasets can be downloaded in a few different ways as well (CSV/TXT).
There are only two downsides to the UCI datasets.
- The other downside is that they are small so that you won’t get much experience in large-scale projects, but that shouldn’t matter because you guys are new at this! Start small!
- The most significant downside is that these datasets are cleaned and pre-processed. Cleaning and pre-processing are essential parts of the machine learning process that you will face in your career. Not spending time practicing this skill will hurt you later down the road.
Practicing in a Targeted way
How do you go about practicing in a targeted way when there are so many datasets? An aspiring machine learning engineer would do best to figure out what their goals are and pick a dataset that would best get them to that goal. I’ve developed some questions you can ask your self to help narrow down the number of datasets.
- What kind of problem are you looking to solve?
- Regression,Classification, Regression, Clustering?
- What sized dataset is it? Tens of data points or millions
- How many features does the dataset have?
- What Type of Features?
- What domain is this dataset from?
Figure out what type of datasets you want to focus on to match up with your broader goals. Once you have this, you should be able to filter through the huge number of datasets that are available on the platform.
Don’t worry if your not sure exactly what your trying to learn. Its much better not to get stuck trying to find the perfect study plan. I’ve made a list of some datasets that you might find interesting. There a few types of problems here so give them all a shot.
Health Classification: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29
I don’t think I have the skills for this or I feel like something is stopping me from getting started!
Its OK to doubt your self from time to time, but you can’t let it stop you from your goals of becoming a machine learning engineer. Time to adjust your mindset.
I don’t know how to program!
That’s fine because my article “Becoming a Machine Learning Engineer | Step 3: pick a tool goes over one tool that doesn’t need any programming skills to use and that allows you to implement many Machine learning algorithms.
Where would I even start when it comes to solving the problems?
A process that allows you to look at any problem is super important, and I believe that learning that process is better than learning about how back-propagation works. Check out my article where I go into detail about picking a process Link to pick a process
I don’t think I could do this alone?
Learning machine learning by yourself is not the best way to learn. Joining a group of like-minded individuals will do wonders towards your ability to learn. Check out this article to find out more.
If you’re serious about self-study, consider making a modest list of datasets you want to investigate further. Follow the targeted practice plan to build a valuable foundation for diving into more complex and exciting machine learning problems.