Generating text automatically for social networks with algorithms and Markov chains

Esteve segura
5 min readOct 6, 2019

--

Nowadays, artificial intelligence is very advanced and has a lot of community behind … but perhaps it is not the best way to do specific tasks taking into account the time required for ‘x’ projects.

We are going to create comments for social networks, on specific topics, in the example im going to show we are going to generate comments for instagram food photos.

Conditions we want to impose on our system

  • We want to use Javascript
  • Comments must be short
  • They should talk about food
  • We do not want to use artificial intelligence

Making your own phrase generator

To start we will create a function that allows us to create phrases, in a programmatic way. This function will allow us to create sentences, replacing certain words. Let’s see an animated example.

Animation

This may seem very simple, but it will help us later. Having a dictionary generated by ourselves is very useful. We go with the code on this function.

generateCommentsFromHandMadeData.js

And we will also create a utils.js file to have some useful functions inside: randomInt( ); saveToJson( ); readFileLineByLine( ); …. etc

utils.js

We will prepare two lists, one in which you have a phrase in which an adjective is used, but we will substitute this adjective for “$”… So we have a list like that

  • This picture looks $
  • Looks $, i love your pictures
  • I want to try that food, it looks $
  • It seems $

And on the other hand we will create a list of adjectives that are synonymous with each other

  • yummy
  • appetizing
  • tasty
  • delicious

… We prepare a simple code, with all this information

And when we run this program, we get this output

generatingSentences.js output

As we can see, this is a poor result, we only have 44 results. But now is when we are going to improve our software. Let’s add more things to our list, more lists and mix the lists together.

At the moment we will add more lists

This gives us 170 results, much better than the previous ones. And now we are going to create randomly composed phrases by putting together all the phrases we have

If we now mix the sentences with each other, our number of phrases can grow exponentially. We will mix our sentences with each other with the following code.

This code is capable of generating more than 5500 unique sentences. And these will be saved in a .txt file by calling sentences.txt

generating sentences

Now we are going to do magic, but we have to explain a term not mentioned in this article “Markov chains

Markov Chains

A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event

Explained in a more naive way, Markov chain is the algorithm that uses WhatsApp in its predictive text

We will explore with Markov chains, to generate texts outside of what we expect, with current data.

But for this we cannot do it with the current “data set” that we have generated. We have to add more text that does not belong to our “data set”. For this we are going to use kaggle, kaggle is a place where we can find data sets of many types.

We are going to download a data set of “Amazon Fine Food Reviews” containing 568,454 reviews from 256,059 users in 74,258 products. You can access it by clicking here

Data set “Amazon Fine Food Reviews” AmazonDataSet.txt

We will start by installing a package with npm, which will make it easier for us to work with markov

npm install titlegen --save

We will create a simple function to be able to work with the titlegen package that we have installed.

markov.js

And we are ready to use it in our workflow.

Our workflow will be like this:

  • Read the amazon data set
  • Read our data set
  • Filter the Amazon data set
  • Generate markov chains with this new data

Amazon data will be filtered, by the comments that contain certain adjectives and we will not bring all the results of the Amazon dataset, we will only use 3500, so that the two data sets can be mixed well.

markov workflow

If we run this script, we will get some self-generated sentences. The text that we will obtain does not have to be a mixture of the two data sets exactly, sometimes the phrase generated may be just a mixture of sentences from the Amazon data set

Setences by markov (our data set + amazon data set)

If we use our own dataset to get phrases with markov, we get more variety within our data set.

our dataset processed by MarkovChain

As you can see in the image above, the phrases we get are very diverse, much more than our original dataset. This has many more applications, I invite you all to try different phrases, poems, comments extracted from reddit … etc, etc

If we test a dataset with movie titles the result can be very interesting

movies markov

Some results are very curious … No Country for Vendetta, Raiders of Glory, Pirates of Oz, V - The Graduate.

Thank you very much for your time and I hope all this can be of your use…The whole project is available on github.

--

--

Esteve segura

Hi there! I'm Esteve Segura, Software Engineer based in Barcelona. I work at Voicemod as Tech Lead. I love swimming and cycling.