Applied Deep Learning Using Uber’s Ludwig Library

Experimenting with an image dataset

Ayush Tiwari

Published in

The Research Nest

5 min readJun 11, 2021

Introduction

Ludwig is developed at Uber. Ludwig is a toolbox that allows to train and test deep learning models without the need to write code. It is built on top of TensorFlow.

I am going to use Google Colab and see how it works. I am going to prepare a model for image caption. I chose the image caption because we can get an idea of both CNNs and LSTMs using it.

What is an Image caption?

image source — https://miro.medium.com/max/700/1*6BFOIdSHlk24Z3DFEakvnQ.png

Image captioning is a task that involves computer vision and natural language processing concepts to recognize the context of an image and describe them in a natural language like English.

For more details see:

Image Captioning in Deep Learning

What is Image Captioning?

towardsdatascience.com

Starting My Experiment:

Installing :

!pip install ludwig

Dataset:

Flicker8k - Image Captioning

Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data…

www.kaggle.com

(Contains 8k images with 5 captions per image)

Preparing dataset.csv:

One of the few things that will need to train our model. We have to pass a dataset via files like JSON, CSV.

Hence I prepared a .csv file in python using pandas.

import os
import pandas as pd
import numpy as npdata = pd.read_csv(‘/content/Flickr8k_text/Flickr8k.token.txt’, sep=”\t”, header=None)
x= data.iloc[:,:].values
img = [ ]
caption = [ ]for i in range(len(x)):
   img.append(
   os.path.join(‘/content/Flickr8k_Dataset/Flicker8k_Dataset’,
                   x[i,0].split(‘.jpg’,-1)[0]+’.jpg'))caption.append(x[i,1])pd.DataFrame(np.concatenate([np.asarray(img[:6000]).reshape(-1,1),np.asarray(caption[:6000]).reshape(-1,1)],axis = 1)).to_csv(‘./dataset.csv’,index = False,header = [‘image_path’,’caption’])

It looks like this:

First Training

Firstly I tried to load all captions(40k) at once. Shows memory warning on a large dataset, check following for solution.

is tcmalloc: large alloc a warning or error in Python

Thanks for contributing an answer to Stack Overflow! Please be sure to answer the question. Provide details and share…

stackoverflow.com

I used the Ludwig experiment because it also tests our model and divides the dataset accordingly.
As the dimension of images were different so you also have to preprocess

!ludwig experiment — dataset ‘/content/dataset.csv’ — data_format csv — config “{input_features: [{name: image_path, type: image, encoder: stacked_cnn , preprocessing: {height: 375 , width: 500 } }], output_features: [{name: caption ,type: text , level: word , decoder: generator , cell_type: lstm}], training: {epochs: 30 } }”

Second Training

6k captions in total epoch 30 but stopped at 10 (due to early stopping)

Early stopping is like a trigger that uses a monitored performance metric to decide when to stop training. This is often the performance of the model on the holdout dataset, such as the loss.

A Gentle Introduction to Early Stopping to Avoid Overtraining Neural Networks - Machine Learning…

A major challenge in training neural networks is how long to train them. Too little training will mean that the model…

machinelearningmastery.com

Predictions:

As we see it predicted ‘a’,’man’,’in’ and ‘dog’.

So I checked the occurrence of prediction in our original CSV file and found that they were quite recurring(total length was 6k)

1652 times man

1658 times dog

Conclusion — Indeed man and dog are best friends (xD)

After that, I shuffled the dataset but it was of no use.
At last, I prepared a dataset with 5 captions per image (all captions in our dataset), but again results were almost the same.

Visualize:

As in Colab, you can display the image directly, hence in order to save it

-od, — output_directory OUTPUT_DIRECTORY

(the directory where to save plots. If not specified, plots will be displayed in a window)

-ff png (for getting it in png)

!ludwig visualize — visualization learning_curves -od ‘./’ — training_statistics /content/results/experiment_run/training_statistics.json -ff png

Well, that model just predicted ‘a’ and ‘.’ so that much loss is justified.

Conclusion

I find it similar to transfer learning (Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task [https://machinelearningmastery.com/transfer-learning-for-deep-learning/]).
Easy to use
The model can not be modified at the layer level. It can only be modified via changing blocks of CNNs or LSTMs.

In the end, it’s an effort to simplify deep learning and make it more common. Who knows what will happen in the future, maybe we will have a model that will understand the common language and it will run current Ludwig, or maybe it will be run by our thoughts.

My Colab notebook

Feel free to check my notebook.

Google Colaboratory

Edit description

colab.research.google.com

I case of any suggestion ping me at-

Ayush Tiwari - National Institute of Technology Delhi - Delhi, India | LinkedIn

View Ayush Tiwari's profile on LinkedIn, the world's largest professional community. Ayush's education is listed on…

www.linkedin.com

For more information about Ludwig, visit-

uber/ludwig

Translated in / Ludwig is a toolbox that allows users to train and test deep learning models without the need to write…

github.com