Forecasting Metrics: I’m New, I Tried, Let’s Talk

Navigating the maze of metrics in forecasting isn’t easy — here’s my take, and I’d love to hear yours

Rangga Kusuma Dinata
10 min readAug 27, 2023
The result of AutoRegression model prediction on one sinusoidal test data across metrics by the author.
The result of AutoRegression model prediction on one sinusoidal test data across metrics by the author.

Outlines

Dazed & Confused by Metrics? Same Here! 🤪

Hey, Metric Mavericks and Forecasting Fanatics! Ever feel like you’re tumbling down the rabbit hole of regression and forecasting metrics? Yeah, me too. I mean, who hasn’t passionately calculated a dozen MAE, MSE, and RMSE values only to scratch their head and wonder,

“Why does my MAPE look like it’s aiming for the moon?” 🚀

You too, huh? High five! 🙌 We’ve all stared at those stellar MAE, MSE, and RMSE numbers and wondered why our MAPE was acting like a rebellious teenager.

So, I did what any self-respecting, confused individual would do — I whipped up an experimental notebook that’s part mystery novel, part science fair project. We’re going to crack this metric code together, folks!

Mission: Understand Metrics (Not Mission Impossible!) 🕶️

Alright, so what’s the mission, should we choose to accept it? (Spoiler: We’re accepting it.) Look, I’m not here to declare the “one metric to lord over all datasets.” That’s way too Tolkien for my taste. 🧙‍♂️

Nope, what I’m after is a bit more nuanced: understanding how these metrics act when you throw different conditions at them, kinda like “Metrics High School 101.” 🎓 Because trust me, when you’re tossed into the unpredictable wilderness of real-world data, you’re gonna want your metrics to be your BFFs. 🤝

Meet the Metrics Squad! 🕵️‍♂️

So, what metrics made it onto my VIP list? Well, we’re taking a metric road trip 🚗 through the wild world of regression and forecasting! Buckle up, because here comes the lineup:

Alright, got your popcorn? 🍿 Let’s dive in!

  • Mean Absolute Error (MAE):
    It measures the average magnitude of the errors between predicted and observed values.
  • Mean Squared Error (MSE):
    It measures the average of the squares of the errors between predicted and observed values. It gives more weight to large errors.
  • Root Mean Squared Error (RMSE):
    It represents the sample standard deviation of the differences between predicted and observed values. It’s the square root of MSE.
  • Mean Absolute Percentage Error (MAPE):
    It measures the average of the absolute percentage errors between predicted and observed values.
  • Mean Absolute Scaled Error (MASE):
    It measures the accuracy of forecasts relative to a naive baseline method. If MASE is lower than 1, the forecast is better than the naive forecast.
  • R-squared (Coefficient of Determination):
    It indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
  • Symmetric Mean Absolute Percentage Error (sMAPE):
    It’s a variation of MAPE that addresses some of its issues, especially when the actual value is zero.
  • Mean Bias Deviation (MBD):
    It calculates the average percentage bias in the predicted values.

Emoji Alert! Severity Scale Unveiled 🚨👈

Okay, number nerds, ever felt like you wanted your data to speak to you? Well, here’s where emojis swagger in, turning your screen into a mini Hollywood Squares of metrics. They’ll give you the deets, from “You’re a Genius!” to “Oops, Try Again!”

Tiny Disclaimer: Just remember, these categorizations are great but they’re not gospel. User-defined, y’all.

  • Standard Error Metrics (MAE, MSE, RMSE) Categorization 📊
    To clarify, the concept of Normalized Error Range stems from dividing the error by the range (max — min) of the training data.
  • Percentage Error (MAPE, sMAPE, MBDev) Categorization 📉
  • R2 Score Categorization 📈
  • MASE Categorization 📋
  • Severity Emojis 🚨
  • Directional Emojis ➡️

Lab Rat Diaries: The Method Mania 🧪

Look, we’re not just throwing darts at a board here. There’s some real method to my metric madness, promise! I’ve whipped up datasets that have a mathematical backbone — yeah, we’re talking sine and cosine. Why? Because these functions give us a level of predictability that’s easy to control. It’s like cooking with a recipe!

As for the models, I chose statsmodels.tsa.ar_model.AutoReg and OffsetModel. If AutoReg was a rock band, it would be The Beatles of time series forecasting. Iconic, foundational, and all that jazz. On the other hand, OffsetModel is our cover artist that mimics the greatest hits by shifting test data around.

Here’s the deal: this whole shebang is focused like a laser on forecasting problems. But keep this between you and me: all forecasting issues are basically regression problems dressed up for a fancy ball. They’re just not the Cinderella story in reverse — every regression problem doesn’t turn into a forecasting issue after midnight.

Sneak Peek! Highlight Reel of What I Found 🎬

Ever been lost in a maze? Yeah, me trying to make sense of metrics. So, I drew it all out on a tree graph. And guess what? It’s a forest in there!

The tree graph of the metrics exploration by the author.
The tree graph of the metrics exploration by the author.

The table below serves as a snapshot of the initial stage of my comprehensive exploration into performance metrics. If this tickled your curiosity bone, guess what? There’s more! Get the complete lowdown right here.

The Juice You’ve Been Waiting For 🍹

  1. The R2 Rollercoaster: If AutoReg and OffsetModel were in an R2 contest, they’d both get participation trophies. Just one OffsetModel run on a beefy dataset managed a wink and a nod (`👌`).
  2. Killin’ It in Error Land: In the world of MAE, MSE, and RMSE, both AutoReg and OffsetModel are basically the class valedictorians (`👌`).
  3. The MASE Facepalm: Think our models outperformed a naïve forecast? Think again! They all scored a big ol’ “yikes” (`🤬`) in the MASE department.
  4. MAPE & sMAPE: The Emotional Rollercoasters: Sometimes we’re amazing (`👌`), and sometimes we’re just terrible (`☠` and `💀`). And yes, it’s always a party on the sine wave.
  5. Bias or Biased Not: Our emojis are doing the heavy lifting here, showing us if our models are optimistic overestimators (`📈`) or Debbie Downers (`📉`).
  6. Size Doesn’t Always Matter: In some cases, bigger tests sizes brought us nothing but heartache (`❌` and `💀`).
  7. Sine vs. Cosine: The Ultimate Smackdown: Our models had mood swings depending on whether they were jamming to the sine or cosine wave.
  8. If I Had to Pick a Prom Date: OffsetModel, a big dataset, and the sine wave would make a lovely trio. Just don’t ask about MASE (`🤬`).
  9. Fine Print & Red Flags: Just a tiny note that we’ve been playing in the sandbox with synthetic data. Proceed with caution, k?

Remember, folks, all these insights are fresh from the lab, so handle with care when applying to real-world projects!

Roast Me, But Gently 🦜

Hey, don’t be shy! I want your constructive zingers on:

Did I get these metrics right or should I go back to metric kindergarten?
Any blind spots here, or am I just blinder than a bat?
Got a magic metric I need to know about?
Spotted a typo? Join the club!

Diving into metrics is like being a pirate on a treasure hunt — X marks the spot, but you don’t know if it’s gold doubloons or a cursed relic until you dig it up and give it a good ol’ look-see. That’s why I’m craving your treasure map of feedback! I’ve dropped a few questions like breadcrumbs above, but let’s take that shovel and dig deeper:

  • The Interpretation of Metrics 101: If you think I’ve been staring at the numbers too long and missed the point, spill the tea!
  • Where’s My Bias?: Been staring at numbers so long, I might as well be in The Matrix. Help pull me out?
  • The Hidden Gem Metric: If there’s a secret sauce metric I missed, pour it on me!
  • Typos, My Old Frenemy: If you spot one, give me a shout-out.

🤷‍♂️ So, what’s the verdict? Yay or nay?

Your insights could be the cherries on top of my metric sundae.

The Mic Drop 🎤

And that’s the tea, folks! This has been an eye-opening quest for me, and if I’ve lit even a tiny bulb in your brain, mission accomplished! Whether it’s a clap or a slap, your feedback is my treasure.

If I’ve tickled your curiosity bone, let’s keep this party going. Chat or comment me up below, pen your own saga, or slide into my DMs at here. Want more? The whole enchilada is right here. If you’re feeling adventurous, grab a shovel and start digging yourself on this matter!

Follow my progress and get the 411 on my ever-expanding portfolio of delights right over here.

So, here’s to more epic data journeys! All my life’s links, one click away! 🥂

Just like Taylor Swift is shaking off the old and bringing you 1989 (Taylor’s Version), I’ve got my very own “Director’s Cut” over at Dev.to 🎤🐍. Craving the same delicious metrics but with the reputation of a more formal, business-casual platter? Say no more! Check out my Dev.to Post for the metrics deep-dive, Swiftie style — minus the snake boots and the drama.

A Little Help Goes a Long Way 🤝

A thousand thank yous for your time and for immersing yourself in my tales. Each story is a labour of love, brewed with the hope that it brings a dash of enlightenment, a sprinkle of inspiration, and a whole heap of insight to your day.

Indonesian invite-only on Stripe global availability

As it stands, the Medium Partner Program is playing hard to get here in Indonesia, thanks to a few speed bumps with Stripe, their payment wingman. This means my pen doesn’t exactly pay the bills through Medium. But worry not, there’s another path to lend a helping hand.

beacons.ai/ranggakd

If my narratives have sparked a flame in your heart, I welcome you with open arms to saunter over to my Beacons profile. Here, you can throw some much-appreciated support my way. Your goodwill stokes the fire under my writings, keeping the narratives piping hot and coming your way.

Hey, a moment if you will? 🤓 If the realms of Python, data science, machine learning, AI, data, and documentation make your heart skip a beat, guess what?

I’m scouting for a remote gig!

Feel like you’ve struck gold in the talent mine? Slide into my DMs. While I’m using ChatGPT to fine-tune my English communication skills (I promise, I only come across as rude when I’m deep into code), I can definitely hustle up the language finesse when needed in a professional setting. Call me a Jack-of-All-Domain-Trades or just a fast learner with a penchant for perseverance; either way, let’s talk!

Moreover, if you’re an Indonesian comrade who’s tangoed with similar hurdles, let’s join hands and rally for change. By signing up to Stripe under the Indonesian banner via this link, we can stand together and show our need for Stripe’s full service here at home. The more noise we make, the better our odds of getting Stripe to open up its full suite of services in Indonesia.

Remember, every little bit counts, whether it’s sharing my narratives, dropping a coin or two on my Beacons page, rallying the troops on Stripe, or just penning a heartwarming comment — it all adds up and is deeply cherished.

Thank you for marching alongside me on this journey. As always, thank you for reading, and here’s to a whole library of stories yet to be shared!

--

--

Rangga Kusuma Dinata

aka Retriago Drago | someone between good programmer as a scientist 👨‍🏫 and a good scientist as a programmer 👨‍💻