Viral or Nah: The Medium Article Success Predictor

Published in

Analytics Vidhya

8 min readMay 14, 2021

If you’re reading this right now, chances are you are one of the 4.6 billion people on Earth that have access to the internet.

And being a regular internet aficionado myself, I’d be willing to wager that you’ve also read an article, maybe even written one.

These short blocks of text on our screens convey to us endless amounts of knowledge, shape our opinions, and give us the inside scoop on what’s really happening in the world around us.

“The Algorithm”

But, if you’ve ever ventured to the second page of Google, I’m sure you’ve realized that not every article makes it to our phones or to our newsfeed.

And, if you’ve ever written an article yourself, you’ve almost certainly been frustrated and confused as to why it is that only 3 people have read your post about how Hawaiian pizza is the best thing in the world, but over 18 million people have somehow collectively agreed to like the exact same picture of an egg.

Now, this is often, and rightfully so, attributed to the algorithm, the internet’s mysterious robotic overlord that controls everything that we see and think — at least on our screens.

From what news notifications we get on our phones to what research papers show up when we're looking for answers to our deepest darkest problems, the algorithm is an inescapable part of everyday modern life.

This is especially true right now, during the pandemic, where many people are working from home primarily through the Internet and many are completely secluded from their friends other than through social media.

Now imagine if we, the every-person, could just turn the algorithm in our favor. What if we could somehow predict what the algorithm would promote and what it would demote?

Viral or Nah: The Medium Success Predictor

That’s where VoN: The Medium Success Predictor comes in. VoN (short for Viral or Nah) is an algorithm I made, which has studied the contents of over 350 Medium articles, learning what does and doesn’t work. VoN takes this knowledge and compares it to your prospective article, telling you how likely you are to go viral.

How it works

Machine Learning

VoN works on a process called Supervised Machine Learning. What this basically means is that we give VoN data that has a right answer, called the labels, and what we want it to use to find that right answer, called the features.

The algorithm is then tasked with looking at the features of each entry and trying to guess what its label is, before being corrected and trying again.

Think of it as akin to how people learn to walk.

Instead of giving toddlers a step-by-step guide on all the dynamics of walking, we just show them how we walk.

After observing adults walk for a little, the baby usually gets up and attempts it themselves. If they fall, then they know that's not how we're supposed to walk but, if they don't fall, then they learn that whatever it is that they just did is walking.

So now it makes sense at a super high level, but how does it really work?

To answer that we need data.

Splitting Up The Data

Data is the backbone of machine learning, with around 80% of making any algorithm being cleaning up the data, so what data we used and how we used it are the most important parts of any algorithm.

For VoN, I used a dataset from Kaggle (a sort of “Airbnb” for data scientists”) that included around 350 medium articles split into 6 columns:

Author Name
Claps
Reading Time
Link
Title
Text

(You can find it here)

For the purposes that I had, I decided to isolate just the “Claps” as the labels and the “Text” as the features. This meant that only the content of the article would be judged.

Next, in order to stop the algorithm from just memorizing the answers, we have to split up the data into training data, which we use to teach the algorithm, and testing data, which we use to test it.

PreProcessing

Computers, at their most basic level, are just super powerful calculators. Everything you see on your screen is just a product of a bunch of math and Natural Language Processing like VoN is no different.

That is where preprocessing comes in.

The goal of preprocessing is to turn our language data into something that the computer can do its calculations on.

Tokenization

We do this first with a process called Tokenization, which is just fancy talk for breaking the text up into sequences of words.

Different bots break up their text in different ways but, VoN specifically, breaks each sentence up into word-sized tokens.

That means that a sentence like “Hello my name is Chris” would be broken up into the sequence of words, [“Hello”, “my”, “name”, “is”, “Chris”].

Lower Casing

Next, we convert each word into its lower case equivalent. This is to stop words with capitalizations, which usually don’t affect the meaning, from being registered as different words from those without.

This would make our example line into, [“hello”, “my”, “name”, “is”, “chris”].

Encoding

Once we make each sentence into a list of words, we have to Encode them. Encoding refers to the process of turning words into numbers to make them readable by the computer.

To do this, the algorithm assigns a number to each new word it sees and changes each occurrence of that word to its corresponding number.

So for example, if encode the line, [“hello”, “my”, “name”, “is”, “chris”], as [1,2,3,4,5]

Then we would have to encode the line, [ “my”, “name”, “is”, “chris”, “hello”] as [2,3,4,5,1]

Padding

Finally, in order to put our sequences into the neural network (we’ll talk about that next), we need to make their size consistent.

To accomplish that, we add placeholder values into each list up to a certain size in a process called padding.

If we were padding to the end of our example sentence with the value 0 to a word count of 6 it would become [1,2,3,4,5,0].

Basic Neural Networks

Now that our data is in the correct format, we feed it into an Artificial Neural Network, which is the part of the algorithm that is actually responsible for learning and predicting the labels from the data, in order to actually make our prediction.

Neural networks, like the one above, are split into multiple layers of calculations.

The first layer, also known as the input layer, takes each input value and holds it in its own node, denoted by the circles.

This then gets fed forward. Where each of the nodes in the second layer, also known as the hidden layer, holds the product of all the nodes in the layer before it and a certain weight, or adjustment.

All of the nodes in the hidden layer are then multiplied into the nodes of the output layer, which holds all the output(s) of the neural network.

This output is then compared to the labeled data that we had from earlier and the weights of the hidden layer and output layers are adjusted accordingly.

What VoN Uses

Now while this is great, a major flaw of feed-forward ANNs like this is that they don’t take the context of each word into account.

For that, we need RNNs, aka Recurrent Neural Networks.

While I won’t get into the specifics of how RNNs work, at a very high level, RNNs have the ability to take in the context of each word based on the words that have preceded them. For VoN I used a specific type of RNN called an LSTM, short for Long Short Term Memory Neural Network.

I opted for an LSTM because traditional RNNs have the problem of remembering every single word in a sentence.

Now while that sounds like it wouldn’t be a problem, where it starts to really hurt is when your algorithm starts to run out of memory and “forget” important words in order to store non-important ones.

Think of it like this, you’re reading the box of your favorite cereal, RetrO’s.

You read all the words and you look away.

Now if I were to ask you to recite some of the words you saw on the box, you’d probably remember the main idea, “oat cereal”, “crunchy”, “good way to start each day”, even if you didn’t necessarily get all of the words right.

This is because you were able to pick out the most important words and remember those instead of the less important words, like “the” and “to”. Traditional RNNs don't have this ability and instead remember each and every word the same amount.

LSTM’s don’t have this same problem and, like us, can intentionally forget the unimportant parts of the sentence so that they have more memory for the important stuff.

Does it work?

Currently, VoN has about 70–80% accuracy.

This means that in testing data, it is right about 75% of the time and shouldn’t necessarily be hailed as the holy grail of success predictions just yet.

Despite this, I definitely invite you to play around with VoN and see if it approves of your writing style!

User testing is super important so leave your prediction in the results down below and include whether it was right or not as well!

And yes… VoN does think this will go viral.