Image for post
Image for post

Becoming a Machine Learning Engineer | Step 4: Practice, Practice, Practice

The best method to pick up essential machine learning skills fast is to practice building your skills with small easy to understand datasets. This technique helps you build your processes using interesting real-world data that are small enough for you to look at in excel or WEKA. In this article, you will learn of a high-quality database with plenty of datasets and some tips to help you focus your time on what matters to you!

Why practice with datasets?

Following online tutorials will keep you trapped in a dependent mindset that will limit your growth because you’re not learning HOW to solve any problem. Your learning how to apply a specific solution to a particular type of problem. It’s the equivalent of overfitting, which we all know leads to poor real-world performance. If you’re interested in becoming a machine learning engineer, you need to make sure you can generalize to real data. Challenge your self every day and attack problems using a defined process. Practicing your skills using datasets is the best way to do this.

Where do I get datasets?

Luckily for everyone, there is a fantastic repository of machine learning problems that you can access for free.

UCI Machine Learning Repository

The Center for machine learning and intelligent systems at the University of California, Irvine built the UCI machine learning repository. For 30 years it has been the place to go for machine learning researchers and machine learning students that need datasets to practice. You can download all of the available datasets on their webpage. They also lists all of the details about it including any publications that have used it, which is really useful when you want to learn researchers attacked the problem. The datasets can be downloaded in a few different ways as well (CSV/TXT).

Image for post
Image for post

There are only two downsides to the UCI datasets.

  1. The other downside is that they are small so that you won’t get much experience in large-scale projects, but that shouldn’t matter because you guys are new at this! Start small!
  2. The most significant downside is that these datasets are cleaned and pre-processed. Cleaning and pre-processing are essential parts of the machine learning process that you will face in your career. Not spending time practicing this skill will hurt you later down the road.

Practicing in a Targeted way

How do you go about practicing in a targeted way when there are so many datasets? An aspiring machine learning engineer would do best to figure out what their goals are and pick a dataset that would best get them to that goal. I’ve developed some questions you can ask your self to help narrow down the number of datasets.

  • What kind of problem are you looking to solve?
  • Regression,Classification, Regression, Clustering?
  • What sized dataset is it? Tens of data points or millions
  • How many features does the dataset have?
  • What Type of Features?
  • What domain is this dataset from?

Figure out what type of datasets you want to focus on to match up with your broader goals. Once you have this, you should be able to filter through the huge number of datasets that are available on the platform.

Example Problems

Don’t worry if your not sure exactly what your trying to learn. Its much better not to get stuck trying to find the perfect study plan. I’ve made a list of some datasets that you might find interesting. There a few types of problems here so give them all a shot.

Regression: http://archive.ics.uci.edu/ml/datasets/Wine+Quality

Clustering: https://archive.ics.uci.edu/ml/datasets/Bag+of+Words

Classification: http://archive.ics.uci.edu/ml/datasets/Wine

Health Classification: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

But..

I don’t think I have the skills for this or I feel like something is stopping me from getting started!
Its OK to doubt your self from time to time, but you can’t let it stop you from your goals of becoming a machine learning engineer. Time to adjust your mindset.

I don’t know how to program!
That’s fine because my article “Becoming a Machine Learning Engineer | Step 3: pick a tool goes over one tool that doesn’t need any programming skills to use and that allows you to implement many Machine learning algorithms.

Where would I even start when it comes to solving the problems?
A process that allows you to look at any problem is super important, and I believe that learning that process is better than learning about how back-propagation works. Check out my article where I go into detail about picking a process Link to pick a process

I don’t think I could do this alone?
Learning machine learning by yourself is not the best way to learn. Joining a group of like-minded individuals will do wonders towards your ability to learn. Check out this article to find out more.

Take Away

If you’re serious about self-study, consider making a modest list of datasets you want to investigate further. Follow the targeted practice plan to build a valuable foundation for diving into more complex and exciting machine learning problems.


Thanks for reading :) If you enjoyed it, hit that clap button below and follow me! It would mean a lot to me and encourage me to write more stories like this

Let’s also connect on Twitter, LinkedIn, or email

AI³ | Theory, Practice, Business

The AI revolution is here!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store