Generating text automatically for social networks with algorithms and Markov chains
Nowadays, artificial intelligence is very advanced and has a lot of community behind … but perhaps it is not the best way to do specific tasks taking into account the time required for ‘x’ projects.
We are going to create comments for social networks, on specific topics, in the example im going to show we are going to generate comments for instagram food photos.
Conditions we want to impose on our system
- We want to use Javascript
- Comments must be short
- They should talk about food
- We do not want to use artificial intelligence
Making your own phrase generator
To start we will create a function that allows us to create phrases, in a programmatic way. This function will allow us to create sentences, replacing certain words. Let’s see an animated example.
This may seem very simple, but it will help us later. Having a dictionary generated by ourselves is very useful. We go with the code on this function.
And we will also create a utils.js file to have some useful functions inside: randomInt( ); saveToJson( ); readFileLineByLine( ); …. etc
We will prepare two lists, one in which you have a phrase in which an adjective is used, but we will substitute this adjective for “$”… So we have a list like that
- This picture looks $
- Looks $, i love your pictures
- I want to try that food, it looks $
- It seems $
And on the other hand we will create a list of adjectives that are synonymous with each other
- yummy
- appetizing
- tasty
- delicious
… We prepare a simple code, with all this information
And when we run this program, we get this output
As we can see, this is a poor result, we only have 44 results. But now is when we are going to improve our software. Let’s add more things to our list, more lists and mix the lists together.
At the moment we will add more lists
This gives us 170 results, much better than the previous ones. And now we are going to create randomly composed phrases by putting together all the phrases we have
If we now mix the sentences with each other, our number of phrases can grow exponentially. We will mix our sentences with each other with the following code.
This code is capable of generating more than 5500 unique sentences. And these will be saved in a .txt file by calling sentences.txt
Now we are going to do magic, but we have to explain a term not mentioned in this article “Markov chains”
Markov Chains
Explained in a more naive way, Markov chain is the algorithm that uses WhatsApp in its predictive text
We will explore with Markov chains, to generate texts outside of what we expect, with current data.
But for this we cannot do it with the current “data set” that we have generated. We have to add more text that does not belong to our “data set”. For this we are going to use kaggle, kaggle is a place where we can find data sets of many types.
We are going to download a data set of “Amazon Fine Food Reviews” containing 568,454 reviews from 256,059 users in 74,258 products. You can access it by clicking here
We will start by installing a package with npm, which will make it easier for us to work with markov
npm install titlegen --save
We will create a simple function to be able to work with the titlegen package that we have installed.
And we are ready to use it in our workflow.
Our workflow will be like this:
- Read the amazon data set
- Read our data set
- Filter the Amazon data set
- Generate markov chains with this new data
Amazon data will be filtered, by the comments that contain certain adjectives and we will not bring all the results of the Amazon dataset, we will only use 3500, so that the two data sets can be mixed well.
If we run this script, we will get some self-generated sentences. The text that we will obtain does not have to be a mixture of the two data sets exactly, sometimes the phrase generated may be just a mixture of sentences from the Amazon data set
If we use our own dataset to get phrases with markov, we get more variety within our data set.
As you can see in the image above, the phrases we get are very diverse, much more than our original dataset. This has many more applications, I invite you all to try different phrases, poems, comments extracted from reddit … etc, etc
If we test a dataset with movie titles the result can be very interesting
Some results are very curious … No Country for Vendetta, Raiders of Glory, Pirates of Oz, V - The Graduate.
Thank you very much for your time and I hope all this can be of your use…The whole project is available on github.