I trained a text prediction algorithm with the lyrics of 1,796 bluegrass songs; here are some phrases it produced.

Recently, for no reason other than my own amusement, I decided to create a Twitter bot called horse_bluegrass which generates random text from a text predictive engine trained solely on the lyrics of 1,796 bluegrass, old-time, and classic country songs. The results are quite amusing: some sound like realistic lyrics that could be used in song lyrics; others result in non-sensical mess. Interesting? Stupid? Nonsense? I’ll let you be the judge; but first I’d like to quickly introduce how the text gets generated.

The code (via mispy/twitter_ebooks) takes text, parses it into individual words, to create a model where the algorithm knows the likeliness that one word will follow another or end a phrase. For instance starting with the word “in” it knows that a likely word to follow will be “the”, “a”, or 43 other different words. The algorithm decides to go with “the” due to the statistical likeliness and randomness. It then continues and chooses the next word after “the” using the same process… and so on until the algorithm decides the phrase should end. Once it has a complete phrase, it publishes the text to Twitter.

Note: I didn’t investigate this too much; however I believe this is a Markov chain. I also didn’t want to get too technical here but did want to give a quick overview how the text is being generated.

To get the training text, I wrote a web scraper which took all the songs from http://www.bluegrasslyrics.com/ and outputted the song title and lyrics into this single text file.

Once I had the text file, a whooping 1.3 megabytes and 37,887 lines, I trained the bot, set it tweet out every so often, sent the process into the background on my server, then scurried up to Harrisburg to watch The Travelin’ McCourys play some of that great human-generated bluegrass music.

With much delight, it’s first generated text was the following introduction — which to me sounds like something you’d hear on an old live Bill Monroe recording:

*ahem*.. mic drop..

So far, the bot has produced phrases that touch upon the subject matter of the lyrics it was trained with quite well: love, loss, death, heartache, joy, religion, suffering, etc... It’s my hope that maybe something from this will spark a song from a songwriter or otherwise just give anyone insight into how random computer-generated content can still end up being profound.

Here are some of my favorites:

that sounds fun!
classic subject of old timey love
this could actually belong in a gospel song
hell yeah horse_bluegrass, you are the man!
not sure what this means, but damn it sounds cool
yeah… that always seems to happen in old songs
of course we can’t
so sad
more sadness
found this one really funny due to the change in frequency
more crying
quite hilarious mashup between two songs!
just let it be known..
more sadness
sadness with a weird twist ending
combine this with top one and it ends up kinda happy
I feel this phrase happens all the time in old songs
a nice twist on Long Black Veil
this is just a lovely phrase
um
uhhh.. no comment
some happiness!
loud music!
more loving gospelgrass
Tillie has to want to be happy
all over
has the makings of a good song

To continue to saga of horse_bluegrass lyrics, feel free to check out https://twitter.com/horse_bluegrass .

I’m going to leave it on generating phrases (once an hour for now, but it will later be more sparse in a few days).

If you have any questions or amusing ideas about this feel free to respond here or hit me up on Twitter at @jwenerd.

Like what you read? Give Jared Wenerd a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.