I trained a text prediction algorithm with the lyrics of 1,796 bluegrass songs; here are some phrases it produced.
Recently, for no reason other than my own amusement, I decided to create a Twitter bot called horse_bluegrass which generates random text from a text predictive engine trained solely on the lyrics of 1,796 bluegrass, old-time, and classic country songs. The results are quite amusing: some sound like realistic lyrics that could be used in song lyrics; others result in non-sensical mess. Interesting? Stupid? Nonsense? I’ll let you be the judge; but first I’d like to quickly introduce how the text gets generated.
The code (via mispy/twitter_ebooks) takes text, parses it into individual words, to create a model where the algorithm knows the likeliness that one word will follow another or end a phrase. For instance starting with the word “in” it knows that a likely word to follow will be “the”, “a”, or 43 other different words. The algorithm decides to go with “the” due to the statistical likeliness and randomness. It then continues and chooses the next word after “the” using the same process… and so on until the algorithm decides the phrase should end. Once it has a complete phrase, it publishes the text to Twitter.
Note: I didn’t investigate this too much; however I believe this is a Markov chain. I also didn’t want to get too technical here but did want to give a quick overview how the text is being generated.
Once I had the text file, a whooping 1.3 megabytes and 37,887 lines, I trained the bot, set it tweet out every so often, sent the process into the background on my server, then scurried up to Harrisburg to watch The Travelin’ McCourys play some of that great human-generated bluegrass music.
With much delight, it’s first generated text was the following introduction — which to me sounds like something you’d hear on an old live Bill Monroe recording:
So far, the bot has produced phrases that touch upon the subject matter of the lyrics it was trained with quite well: love, loss, death, heartache, joy, religion, suffering, etc... It’s my hope that maybe something from this will spark a song from a songwriter or otherwise just give anyone insight into how random computer-generated content can still end up being profound.
Here are some of my favorites:
To continue to saga of horse_bluegrass lyrics, feel free to check out https://twitter.com/horse_bluegrass .
I’m going to leave it on generating phrases (once an hour for now, but it will later be more sparse in a few days).
If you have any questions or amusing ideas about this feel free to respond here or hit me up on Twitter at @jwenerd.