How I Learned About Machine Learning

(and Why It’s Actually Pretty Cool, not scary at all 🥰)

Published in

ILLUMINATION

3 min readSep 24, 2024

You know, I’ve always been a bit skeptical about all this talk of machine learning. It sounded like one of those techy things that was super complicated and only for robots. But then, one day, my curiosity got the best of me. I started thinking, “What’s the big deal with this machine learning thing? Can it really do all the amazing stuff people say?” So, I decided to dig in and figure it out for myself.

Spoiler alert: It’s actually pretty awesome (and not as scary as it sounds)!

Let me tell you about a key part of machine learning that I found super interesting: training data and testing data. Now, don’t get worried about the fancy words — stick with me, and I promise it’ll make sense.

So, what’s this training data and testing data?

Imagine you’re trying to learn a new skill, like baking cookies. First, you need a recipe, right? That’s kind of like training data. It’s all the information you gather to figure out how to make the perfect cookie. You try different ingredients, measure carefully, and practice. In machine learning, this is like feeding lots and lots of examples into a computer program (the model) to teach it how to do something — like recognizing cats in photos or guessing what movie you might want to watch next.

Now, once you’ve practiced baking enough cookies, you want to see if you’re actually any good at it! You give some cookies to your friends to taste and ask, “Do these taste good?” This part is like testing data. You take a new set of ingredients and see if your cookie-making skills hold up without following the exact same steps every time. In machine learning, the testing data is like the fresh cookies — it’s brand new information the computer hasn’t seen before, and it shows whether it really learned anything or if it just memorized the old recipe.

Why use two types of data?

At first, I thought, “Why not just use one set of data and be done with it?” But here’s the thing: If we only use the training data, the model might get too good at that one recipe. It might learn every little detail perfectly, but that doesn’t mean it knows how to bake any other type of cookie. It’s like being able to bake one flavor but failing when you try something new. So, testing data helps us see if the model can handle all sorts of cookies (or problems) without needing specific instructions each time.

The Benefits

Once I got over my initial confusion, I realized how smart this method is. Using training and testing data helps:

Prevent overfitting (That’s when the model gets so good at its practice problems that it can’t solve anything else. It’s like memorizing answers for a test instead of actually learning the material.)
See real-world results (Testing data is what shows us if the model is actually useful in everyday situations. You don’t want a model that only works in a lab — it needs to work with real-world stuff!)

The Drawbacks

Okay, so what’s the downside? Well, sometimes it can be hard to find enough good data. If the training data isn’t diverse enough — meaning you only bake chocolate chip cookies and never try oatmeal raisin — the model might struggle when faced with different types of cookies in the testing stage. Plus, if the data isn’t labeled correctly (think of it as following a recipe with missing steps), the model might learn the wrong thing.

In the End…

Once I understood this, machine learning didn’t seem so complicated. It’s just like how we learn new things! First, we practice a lot (training data), and then we test ourselves to see if we really get it (testing data). Pretty cool, right? Now I’m not as skeptical as I was at the start. ❤️

So, what do you think? Have you ever tried learning something new using a similar process, like practicing with examples and then testing yourself later? I’d love to hear your experiences!