Quick timeline catchup
In 2018, Google open-sourced BERT (Bidirectional Encoder Representations from Transformer) that became quite a benchmark in NLP. It has 340M parameters. Later, In 2019, OpenAI released their GPT-2 which has 1500M parameters. Furthermore, In 2020, OpenAI announced their third-generation language prediction model in the GPT-n series, GPT-3. It has 175B machine learning parameters. yeah, you heard it right 🤯!! I mean there is a hell of computation and training data included. If you want to see some cool experiments with GPT-3, check out the link.
So how do WE leverage these colossal networks to perform experiments on our custom data as pre-training of these would be computationally expensive? the answer is Transfer learning.
Building AI lyricist with GPT-2
I’m a big fan of retro Hindi songs especially of Mohammed Rafi, Kishor Kumar, Mukesh Ji. Every time I wonder how the words in songs (tokens 😁)are twining, lines are rhyming, stanzas are repeating so perfectly…as these are the artistic skill of lyricists and singers after all. And as NLP Practitioner, Intent grasping, contextual information extraction out of textual data is my day-to-day work. So I thought how about a well-versed model GPT-2 learn these lyrics twining task and write some lyrics for us? Sounds cool!? Okay potshot, let’s try this.
- Data preparation: So for that, I crawled on the internet. I took the lyrics in Hinglish language and the lyrics are majority from Kishor Kumar, Mohammed Rafi, Mukesh Kumar’s songs (Both Mix, singles). Though training samples were small approx 2500, but that’s the thing, I just to want to see how well GPT-2 able to generalize on this small amount of custom data. Below is the data frame after crawling
After this, I did some preprocessing (obviously) to curtail noise that followed by two txt file creation i.e. train.txt and eval.txt (stored in line-by-line fashion).
2. Modelling & Training: SimpleTransformers (built on top of Huggingface’s Transformers library) is the savior here. It gives so much abstraction & pace in using those big models in NLP, and that pace is needed when you want to focus more on research than on development (cause it handles ’em all). I fine-tuned the GPT-2 Model on my custom Hinglish data with just a few lines of code.
Below is the training info:
a. Environment & Infra: Google Collab Notebook
b. Epoch: 5 (initially I’ve put on 10 epoch, but collab notebook kernel was crashing, if you’re a Data scientist or ML Engineer, you get it, right!? 😜)
c. Architecture: gpt2-medium
3. Evaluation: It took 1 hr to model to fine-tune on that small dataset. I did the 80/20 split between training & testing. Lib created an eval_results.txt file at the end of the evaluation process. And whoa!!!, we got pretty close on that small dataset.
eval_loss = 3.159255956494531
perplexity = tensor(11.5531)
4. Inferencing & Serving: Though this step is not needed, but it is a good practice to present your machine learning results on some web app UI. And for that, I’ve used streamlit (my go-to lib as an NLP practitioner). let’s see how our lyricist writing Hindi songs 😋
Note: I’ve put max_length attribute=200 here.
As you can see how AI wrote some songs for you, some are really interesting and some are very weird as well. But I think it did a pretty decent job on that small dataset. So now all I need is a musician to compose these AI-generated Hindi songs 😎
So this was my weekend experiment and it really showed the powerfulness of Artificial Intelligence in learning cognitive skills. After these results, I’m thinking to do some more exploration in these models. I would be very happy if AI write some greek philosophy ;)
Thanks for reading. Any feedback or suggestion would be welcomed. Enjoy the day!