Text Content Generation based on Markov Chains — a short Overview

AI will overtake journalism in the next few years

Text is part of our daily life — we consume it while browsing on Facebook, while creating Tweets on Twitter, sharing new websites with Social Media Plattform Pinterest or during our WhatsApp sessions with beloved ones — Text is part of our daily communication habits.

In our personal life we create texts mainly for expressing our opinion to other people. Some persons are professional writers and create texts for broader audiences: for information, for pleasure, for entertainment — these professionals do a living from creating text for you and the world.

Professional journalists use different tools from research (Google, Bing, Yahoo, Twitter), to writing (Microsoft Office Suite, Libre Office Suite or Google Business Apps) and publishing (Twitter, Medium, Facebook Instant Articles). Some video content is distributed via live streaming on Youtube, Twitch or Facebook.

Besides professional journalism that involves long hours of research, text writing, text correction and publishing on different plattforms there is some software available on the (open source) market that allows the creation of new text by just implementing a software library. One of those software tools I will show you today and its name is Markovify coded in Python3 and offered for free on Github. Markovify is a simple, extensible Markov chain generator thats use is for building Markov models of large corpora of text, and generating random sentences from that.

This tools is written completly in the open source programming language Python and is as easy to handle like this:

First step:

pip3 install markovify

Second step:

# import the library to your Python program
 import markovify 
 # read the whole training data set as text corpus into one string
 with open(“/home/basti/text_generation/text_corpus.txt”) as f:
 texte = f.read()

# Build the Markovify model
 text_model = markovify.Text(texte)

# Print fifteen randomly-generated sentences
 for i in range(15):

Third step:

Play with the internal parameter of the software library to improve the output text quality results

The german website ArtikelSchreiber.com is currently using different NLP/NLTK Tools like Gensim, SpacyIO, TextBlob or NewsPaper to create german text articles on the fly where there are long word texts prefered over short articles. This project was part of my studies in Python and my over 15 years long lasting interest in natural language processing and text generation. Currently the Artikel Schreiber can only create german texts but it is implemented in a way that new languages can be supported with ease. 
The german SaaS software Artikel Schreiber www.artikelschreiber.com/ is the german and free of costs equivalent of Articoolo that bill you a dollar per created article.

Future Plans and timeline for the german (SaaS content creation tool) are:
- March 2018: Release of the english version of Artikel Schreiber that creates free of cost english text articles
- May 2018: Creation of an API that allows Artikel Scheiber to be integrated into Wordpress Plugins, Stand Alone Tools or for your own purpose via brain dead simple REST API
- End of 2018: Support for French, Italian & Spanish texts and articles