What Data Scientists can learn from The Avengers

Shweta Doshi
6 min readMay 7, 2019

It was slated to be the movie of the decade, and oh boy… did it live up to the hype! Really, Avengers: Endgame is an intensely satisfying piece of filmmaking by the Russo Brothers, and a powerful finale to the elaborately crafted Marvel cinematic universe.

I’m one of the many die-hard fans who rushed to the theatres the day Endgame released; just to be among the first witnesses of the finale to a superhero saga that has captured people’s imaginations for over 10 years now.

Marvel-ously enough, I discovered that the movie also serves as an excellent allegory to tackling data science problems. Go figure!

Let me share with you the memorable lessons that The Avengers have taught us about problem solving in Data Science!

By the way, major Avengers: Endgame SPOILERS AHEAD! If you’re an ardent fan or haven’t watched Endgame yet, I would advise that you stop reading from this point onwards.

Data Science Is A Team Sport

Not all Avengers bond well with each other all the time. Captain America and Iron Man had trust issues and weren’t always very ‘civil’ with each other. And Star Lord went ‘green’ with insecurity whenever Thor was around.

But when the team becomes ONE, they become a force to reckon with! The Avengers work well together because they have a common goal to defend humanity. Teamwork makes the dream work!

Similarly, whenever you’re working on a data science project, make sure that the team is aligned towards the goal, or the ‘endgame’ if you will.

Learning Data Science is a Lifelong Journey

Data science is ever-evolving. As data scientists, we can’t be in a hurry to ‘snap’ our fingers and solve the world’s problems.

Their right place is on the pulse of what’s happening in the data science community, the latest algorithms and frameworks and much more.

Had Ant-Man left the quantum realm alone after making his own escape, he wouldn’t go on to discover time travel or rescue his mentor’s long-lost partner — the original Wasp.

More importantly, without the new-found knowledge about how the quantum realm works, retrieving the infinity stones wouldn’t have been possible!

Lesson for data scientists? Whether it’s the latest version of Pytorch (machine learning library for Python) getting stable, or Amazon’s Alexa hitting the market — data scientists must stay updated to leverage tech advances to their advantage.

It’s All About Feature Engineering

All the Avengers fighting on the same side isn’t enough. They need to pool their superpowers, foresee opportunities that each Avenger’s superpower opens up, and use them in sync for earth’s greater good.

Features are like superpowers too. But just possessing a rich dataset with a lot of features isn’t enough!

We have to identify missing values, perform scaling, and identify correlation. These methods of feature cleaning and feature selection help utilise the best possible combination features and save the day!

When in doubt — Ensemble!

The Avengers aren’t just about fighting doomsday battles with raw muscle power. There’s always a war of wits going on at the same time!

For example, blocking Thanos in space, with the remaining Avengers staying on Earth to tackle his army was a very wise strategy. Wouldn’t it be a total waste of their capabilities if all Avengers were to assemble at one place waiting for Thanos to attack?

Data scientists too must possess a similar problem-solving approach, because a single ML model rarely leads to the best solution. The optimal way of arriving at it is to employ different ML models on different subsets.

Prove you are worthy

Captain America was always worthy of Thor’s Hammer, Mjolnir. But he wielded it only in the final battle with Thanos. Why? Because Cap is a confident man who doesn’t overreach just to prove what he’s capable of.

Unfortunately, a lot of problem solvers end up employing complex models whenever they encounter a data science problem, or worse, apply deep learning to it. And that amounts to deepening the inefficiency of time and resources — a mistake that contradicts a data scientist’s very job description.

If you’re new to data science, think about it this way: Why use GPUs when prediction can be done using good ol’ CPUs? Why not use high-end hardware and software only when its deployment is justified?

Get a Mentor

It’s really cool to be bitten by a radioactive spider and transform into a friendly neighbourhood vigilante (sorry Aunt Mae!). But without a mentor like Tony Stark, Spidey would still be chasing petty thieves.

Luckily his potential was spotted by Tony and the world got a new superhero who could fight bigger battles and still care about “the small guy”.

Aspiring data scientists too need a mentor who can help them excel. Unlike the confusing maze of online data science learning resources, a mentor proves invaluable in guiding learners towards apt methods, resources and tactics.

By the way, never hesitate to ask for an assist because the data science community is made up of really helpful folks! They’ll be as happy with your success as Iron Man was proud of Spiderman’s homecoming!

Know When To Use Data Science, When Not To

Tony made peace with the reality that he’ll have to sacrifice himself, but only once he had found a real way to defeat Thanos. He never overestimated his tech prowess or kept the team guessing about what Iron Man could do and what he couldn’t.

An important Data Science principle is to know that data science alone is not capable of solving every problem under the sun.

Knowing when to stop wasting resources solving an unsolvable problem is the stuff real data science superheroes are made of. Sometimes plain automation is sufficient to deal with the ‘inevitable’!

Persevere till you get it right

The Avengers faced their biggest challenges before nearing the resolution of their journey. Most of The Avengers were decimated to dust by Thanos. Ultimately, Black Widow had to sacrifice her life. So did our favourite mentor, Iron Man.

But The Avengers never gave up. They persevered through thick and thin, and rallied to face what seemed to be an unstoppable attack. They gutted it out. They won!

In the data science pipeline, ‘the middle mile’ is where you burn the most energy. It is also the phase where you have to try and overcome minor or major setbacks, and learn to deal with long-term fatigue.

Why? Because you can’t expect victory when you quit. You can’t expect success unless the business goal is achieved. “Whatever it takes,” is the mantra a data scientist lives by!

So, there you have it. The Avengers not only saved the world, but also gave us some serious goals as data scientists!

Stan would be so proud.

Shweta Doshi

The author is a Co-Founder of the EdTech startup, GreyAtom (www.greyatom.com).

Co-Author , Nikhil Nair, Associate Data Scientist,GreyAtom

--

--

Shweta Doshi

I am an unapologetic idealist who believes that to gain quality education,we need to transform the way we teach & learn.I am the Co-Founder at www.greyatom.com