Machine Desi Hip Hop: A Fun Transliteration Experiment with RNN


There has been a lot exciting developments on the use of Recurrent Neural Networks lately. After Andrej Karpathy’s post on the Unreasonable effectiveness of RNNs and a lot of cool experiments followed. Some of which are: TEDRnn, Find Your Dream Job , RNN Bible, DrumpfRNN. All of them make use of LSTM, a special kind of RNN that enables to connect previous information to the present task even in situations where gap between relevant information and place of prediction is large. You can find more about LSTMs in these amazing blogs written by Christopher Olah & Nikhil Buduma.

I was aware that LSTM is pretty popular for generating text that sounded grammatically accurate albeit less meaningful. This made me wonder, would this be any good for transliteration? I had to scratch the itch. Having been heavily influenced by Bollywood, a monkey in my head wanted to try testing it with Desi Hip-Hop party songs. Probably because it thinks most of the lyrics are meaningless anyways. Will non-native speakers figure out if it is gibberish right away? May be not.

Transliteration is conversion of a text from one script to another. For instance, the English transliteration of Hindi term ‘नमस्ते’ is ‘NAMASTE’, while its translation is ‘HELLO’.

I managed to scrape just above 100 song lyrics transliterated in English by various bollywood hip-hop artists. The data consists of only about 157000 characters (certainly not a huge data-set). In case you are not very familiar with this genre of songs, most of them although written in Hindi are swayed by Punjabi with many English terms creeping in every now and then.

The model generates lyrics by predicting one character after the other and this is where LSTM’s long term context memory comes into play. This LSTM model was built using Keras with theano in the backend. I rented an Amazon’s AWS GPU based g2x instance (using grid K520) and ran about 120 iterations on the data that ran approximately for about 6 hours. I did try running the algorithm on my local machine with 4 cores and compared the time taken for each epoch . It was approximately 10 times slower than GPU.

I set the seed text (required to start text generation) to something like “Ish your boy Ierr”, hoping that it will pickup some rapper’s pattern and generate something interesting. The model started learning and for each iteration it generated a lyrics, character by character. I ignored the first few iterations since it was undecipherable. I waited and checked the outputs after 30 iterations, then 60 and then 100, only to find garbage text that sounded close to no language. I began to wonder if lstm is really not that good for transliteration or was it me doing something funny.

Sure enough, it was the latter case. It was realized that the seed given to the model contained only English words and that the already sparse training data mainly consisted of Hindi/Punjabi words and very few English terms. I decided to give it one more try with randomly chosen seed (of 20 characters) from the training data. As expected, the first few iterations had results like:

Iteration1: Seed: “r rani, teri jawani”
 r rani, teri jawani
 e r tuit
 uioner b6 hltuh
 a¸aned ieyeuaoutuvd t. dua.s m6 pa irvendnaij
 n epneaalwa. btt raamqe m kdhinaa

d ynea ahhd a teaa
 iapa ch
 ao nhaoeyant o i imo
 ih h lihl mioy
 ¢ i
 t ?t
 urkho?,s ny a a .h s u bra_y
 n ga c l ecmo
 aa np ghdoeuaana oimarhaq, aat
 egooanoyhnaaa rutan.hto srd
 ie e a
 unojo efmadkevenh a
 wpoadao mwruuh
 animy soht godedeiaasu2ouaaaakzm tubehmee oagk

I checked up the results after the 10th iteration, still nothing very distinctly seemed like Hindi. It was the 19th iteration when I could see a ray of hope. The text did sound very much like a Hindi/Punjabi sentence and as expected it made absolutely no sense.

Iteration 19: Seed: “ata been mujhko toh”

ata been
mujhko toh na saanu jaane lakh te hai
te hain bab saate hai
aas konge gara pe gava paana hai

aaja je baban meri hoos
mera baal laghi hai
pehi paar te peeth pata loogl na chaladi
ka sunda mere saan da main te saani naar gaana main taa karang

mere maan rati, mera naal te ni labni
mere laal na waalh kao
tere yaar main taan pakh li lai laaye
ni chaalo te lo khar de sapte ho saaye
na lo tu main to

As the iterations progressed it sounded like what was intended. But as it was noticed earlier, for iterations with seed terms that extensively contained English terms, the words in the lyrics generated were difficult to read. The outputs of all the iterations could be found here .Having seen the results and considering the fact that this was run on a scanty-dataset, it would be reasonable to say that LSTM could very well be utilized for transliteration problems.

The generated set of lyrics were taken one step further. Few rhyming couplets(aabb and abab form) were randomly chosen and manually ordered. I now had a Machine generated Desi-Hip Hop song lyrics ready. To do justice to the lyrics, I collaborated with some talented people and came up with a Desi-Hip Hop music video. Don’t understand the lyrics? Don’t panic, Nobody does…!

Machine Desi Hip-Hop Music Video

Machine Desi Hip Hop Lyrics:

main hoon meri naat hai
 mere saa mujhe di lagaar
 hai manna se magaa jaaya
 jam san meri gala maar

disco vich ghaa pe gaya
 disco vich ghaa pe gaya
 disco vich ghaa pe gaya
 disco vich ghaa pe gaya

bad te mere gee main sebaati
 hai apna dil tohaaroor beti
 par jaane weh laalu kich jar
 oh seakh i’ta saapa raho kar
 mera baa mera naal mera jaan nahi naar
 nachre challe nikh tere nari tere jaar
 sboni main tune mainu na baan hai
 meri gal meri nahi rani da jaan hai

naal ni bhaam bara dhoon
 hai aan hooj shaub de khoon
 hi sanna tunne saar laar hai
 mundeya de saad bab yehar hai

choda co paari hai, palle khila
 apre kara hoon kishi jide gila
 khwaban vich khona ni, kinni raata soya ni
 tere ghar ke hoya ni

kubi keri annihon phori na choon
 niscon te boon la di kon ku joon
 rako kar de vod mera naah, nahi,
 dal ee lyee kichune saan hai nahi

khir tere choori nachre phoom lega saye
 chaki samriyaale kehon lond, nakhaye
 mujhke kudiya makhle ni kare ishare
 nakhre dikhaave ji main ke main laare

disco vich ghaa pe gaya
 disco vich ghaa pe gaya
 disco vich ghaa pe gaya
 disco vich ghaa pe gaya

The model was over-fit on the line ‘disco vich ghaa pe gaya’ from the training dataset (a line from an existing song ‘Take your sandals off’) which is also the reason why it kept popping up repeatedly in the generated output.

This write-up was originally written here.