Dank Learning -Making memes with Deep Learning

Jayvardhan Rathi
SyntechX
Published in
3 min readFeb 7, 2020
A meme generated by a Deep Learning Model

Memes who hasn’t heard of ‘em ? If you have been living under a rock for the past decade, lemme explain. A meme is an image with a caption trying to explain a humorous situation , a pop culture reference or any idea in general. Well the meme is dank when it makes a lot of people laugh and is funny in general. hence the name Dank Learning.

A new breakthrough has been made by a pair of Stanford researchers -Abel L. Peirson V and E. Meltem Tolunay. They have made a system which automatically generates memes like the ones shown above. According to them, their system “Produces original memes which cannot be differentiated from real ones.”

When Pierson ran the model with his won face as an image.

The system uses a pre-trained Inception-v3 network using long short term memory model to produce captions that are applicable to a particular picture.

So what is an Inception-v3 network ?

It is a Convolutional Neural Network (CNN) which are a class of Deep Neural Networks , most commonly used to analyzing visual imagery. It is trained on more than a million images from the ImageNet database. This network is 48 layers deep and can classify images into 1000 object categories such as keyboard , mouse and many animals. Hence this model has been taken into consideration for the project.

Long Short Term Memory (LSTM) what does that mean ?

It is an artificial recurrent neural network (RNN) architecture used in deep learning . Unlike other Deep Learning models LSTM has feedback connections which means it can not only process single points of data such as images but also entire sequences of data such as audio and video. Hence LSTMs are best suited for connected Handwriting recognition and speech recognition.

The Abstract from the research paper was as follows :

“ We introduce a novel meme generation system, which given any image can produce a humorous and relevant caption. Furthermore, the system can be conditioned on not only an image but also a user-defined label relating to the meme template, giving a handle to the user on meme content. The system uses a pre-trained Inception-v3 network to return an image embedding which is passed to an attention-based deep-layer LSTM model producing the caption — inspired by the widely recognized Show and Tell Model. We implement a modified beam search to encourage diversity in the captions. We evaluate the quality of our model using perplexity and human assessment on both the quality of memes generated and whether they can be differentiated from real ones. Our model produces original memes that cannot on the whole be differentiated from real ones.”

Testers are asked to attempt to differentiate generated memes from real ones and/or rank generated memes on their hilarity , because at the end of the day their purpose is to be funny.

Obtained results indicate that memes can be generated that in general cannot be easily distinguished from naturally produced ones, if at all, using human evaluations.

Another example of memes generated by this system

--

--