Applied Deep Learning Using Uber’s Ludwig Library

Experimenting with an image dataset

Ayush Tiwari
The Research Nest
5 min readJun 11, 2021

--

Photo by Markus Spiske on Unsplash

Introduction

Ludwig is developed at Uber. Ludwig is a toolbox that allows to train and test deep learning models without the need to write code. It is built on top of TensorFlow.

I am going to use Google Colab and see how it works. I am going to prepare a model for image caption. I chose the image caption because we can get an idea of both CNNs and LSTMs using it.

What is an Image caption?

image source — https://miro.medium.com/max/700/1*6BFOIdSHlk24Z3DFEakvnQ.png

Image captioning is a task that involves computer vision and natural language processing concepts to recognize the context of an image and describe them in a natural language like English.

For more details see:

Starting My Experiment:

Installing :

!pip install ludwig

Dataset:

(Contains 8k images with 5 captions per image)

Preparing dataset.csv:

One of the few things that will need to train our model. We have to pass a dataset via files like JSON, CSV.

Hence I prepared a .csv file in python using pandas.

import os
import pandas as pd
import numpy as np
data = pd.read_csv(‘/content/Flickr8k_text/Flickr8k.token.txt’, sep=”\t”, header=None)
x= data.iloc[:,:].values
img = [ ]
caption = [ ]
for i in range(len(x)):
img.append(
os.path.join(‘/content/Flickr8k_Dataset/Flicker8k_Dataset’,
x[i,0].split(‘.jpg’,-1)[0]+’.jpg'))
caption.append(x[i,1])pd.DataFrame(np.concatenate([np.asarray(img[:6000]).reshape(-1,1),np.asarray(caption[:6000]).reshape(-1,1)],axis = 1)).to_csv(‘./dataset.csv’,index = False,header = [‘image_path’,’caption’])

It looks like this:

First Training

  • Firstly I tried to load all captions(40k) at once. Shows memory warning on a large dataset, check following for solution.
  • I used the Ludwig experiment because it also tests our model and divides the dataset accordingly.
  • As the dimension of images were different so you also have to preprocess

!ludwig experiment — dataset ‘/content/dataset.csv’ — data_format csv — config “{input_features: [{name: image_path, type: image, encoder: stacked_cnn , preprocessing: {height: 375 , width: 500 } }], output_features: [{name: caption ,type: text , level: word , decoder: generator , cell_type: lstm}], training: {epochs: 30 } }”

Second Training

6k captions in total epoch 30 but stopped at 10 (due to early stopping)

Early stopping is like a trigger that uses a monitored performance metric to decide when to stop training. This is often the performance of the model on the holdout dataset, such as the loss.

Predictions:

As we see it predicted ‘a’,’man’,’in’ and ‘dog’.

So I checked the occurrence of prediction in our original CSV file and found that they were quite recurring(total length was 6k)

1652 times man

1658 times dog

Conclusion — Indeed man and dog are best friends (xD)

  • After that, I shuffled the dataset but it was of no use.
  • At last, I prepared a dataset with 5 captions per image (all captions in our dataset), but again results were almost the same.

Visualize:

As in Colab, you can display the image directly, hence in order to save it

  • -od, — output_directory OUTPUT_DIRECTORY

(the directory where to save plots. If not specified, plots will be displayed in a window)

  • -ff png (for getting it in png)

!ludwig visualize — visualization learning_curves -od ‘./’ — training_statistics /content/results/experiment_run/training_statistics.json -ff png

Well, that model just predicted ‘a’ and ‘.’ so that much loss is justified.

Conclusion

  • I find it similar to transfer learning (Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task [https://machinelearningmastery.com/transfer-learning-for-deep-learning/]).
  • Easy to use
  • The model can not be modified at the layer level. It can only be modified via changing blocks of CNNs or LSTMs.

In the end, it’s an effort to simplify deep learning and make it more common. Who knows what will happen in the future, maybe we will have a model that will understand the common language and it will run current Ludwig, or maybe it will be run by our thoughts.

My Colab notebook

Feel free to check my notebook.

I case of any suggestion ping me at-

For more information about Ludwig, visit-

--

--