Recognizing Real-Time Creativity of User using Deep Learning

Devashi Choudhary
Analytics Vidhya
Published in
6 min readJul 2, 2020

What is Real-Time Creativity?

Ever played Pictionary???

It is a game where one person draws a shape or picture of an object in the air, and another has to guess it. Just like Pictionary, Real-Time Creativity is a game where you draw a pattern in front of the web camera using your finger and let the computer guess what you have drawn. Therefore, the goal of today’s article is to recognize the real-time creativity of user using deep learning.

Table of Content

  1. About Dataset
  2. Flow of Real-Time Creativity
  3. Experiments and Results
  4. Takeaway
  5. What’s Next?

About Dataset

Origin of this idea was developed by Magenta team at Google Research. In fact, the game “Quick, Draw!” was initially featured at Google I/O in 2016, later the team has trained Convolutional Neural Network (CNN) based model to predict drawing pattern. They made this game online. The dataset to train model consist of 50 million drawings across 345 categories. These 50 million patterns look like shown in figure below.

Sample of Images

The team made the database available for public use to help researchers to train own CNN model. Full dataset is separated in 4 types of categories:

Raw files (.ndjson)

Simplified drawings files (.ndjson)

Binary files (.bin)

Numpy bitmap files (.npy)

Currently, We sampled out 10 random categories from the whole dataset to reduce computational time, but it can be extended later anytime for arbitrary any number of categories.

Flow of Real Time Creativity

Flow of Real Time Creativity

Hand Detection and Finger Tracking :

Detect hand region out of the frame using running average differences among the frames. Then, to remove noise such as single pixels or small gaps we refine it a bit by first smoothing it with a blurring operator and threshold it to obtain a binary mask . Next, we tell OpenCV to find all contours in the mask and it returns the largest contour. Now, we have detected the contour ,we detect the fingertips and the number of fingers. To achieve this we compute the convex hull as well as the convexity defect regions of the hand contour.

Now, hand gestures are detected and we draw object using a single finger only. Track the drawing by detecting the finger in each frame and connecting the pin-point coordinates of the finger. Extract the drawn image from the window and pass it to the recognizer.

For recognizing the drawn image, we have used Convolutional Neural Network (CNN), let’s understand it’s architecture and training.

CNN Architecture

The convolutional neural network or convnets is a major back through in the field of deep learning. CNN is a kind of neural network, and they are widely used for image recognition and classification., They are mainly used for identifying patterns in the image. We don’t feed features into it, they identify features by themselves. The main operations of CNN are Convolution, Pooling or Sub Sampling, Non-Linearity, and Classification.

  1. Convolution: The primary purpose of Convolution is to extract features from the input image. Convolution preserves the spatial relationship between pixels by learning image features using small squares of input data.
  2. Pooling: The main purpose of pooling is to reduce the size of the input image but retains the important information. It can be done in different types like Max, Sum, Average, etc.
  3. Non Linearity: Activation function like ReLU(Rectified Linear Unit) is a function that is added into a neural network to help the network learn complex patterns in the data i.e. to introduce non-linearity.
  4. Fully Connected: It is a traditional Multi-Layer Perceptron that uses a softmax activation function in the output layer (other classifiers like SVM can also be used). It’s purpose is to use these features for classifying the input image into various classes based on the training dataset.

Training and Evaluation of CNN model

The training of the CNN model requires the following steps :

  1. Model Creation: The CNN model (Line 1–12) includes two convolutional layers followed by activation function ReLU(to add non-linearity) and Max Pooling(to reduce the feature map). Dropout is added to Prevent Neural Networks from Over-fitting. Then, fully connected layers are added at the end. Finally, we compiled our model to the loss function, the optimizer, and the metrics (Line 15). The loss function is used to find error or deviation in the learning process. Keras requires loss function during the model compilation process. Optimization is an important process that optimizes the input weights by comparing the prediction and the loss function and Metrics is used to evaluate the performance of your model.
  2. Model Training: Before start training of model we need to split the data into Train-Test data. In our case, there was 60% of training data and 40% of testing data (Line 18). Models are trained (Line 32) by NumPy arrays using the fit function. The main purpose of this fit function is used to evaluate your model on training.
  3. Model Evaluation: This is the final step, in which we will evaluate the model’s performance by predicting the test data labels (Line 35).

Experiments and Results

  1. Analysis of CNN model.
Model’s Accuracy for train-test data curve
Model’s Training Loss Curve

2.The drawn image is recognized by the CNN model as shown below.

3. Real Time Creativity of User.

Takeaway

  1. Using a finger to draw creativity instead of pin-point pen, increases the user convenience and also the target audience for the system.
  2. Background subtraction using running averages over the frames can distinguish fingers from other background objects more reliably.
  3. Tracking of a drawn image by detecting the hand in each frame makes the drawing accurate and doesn’t lack the finger point in between which was occurring very frequently in the case of the feature tracking algorithm.

What’s Next?

In this article, the real-time creativity of user is recognized accurately and also the use of finger over the pen/object, use of running averages concept for background subtraction, follow the finger detection in every frame for tracking, and use of CNN as a model for recognizing the drawn creativity accurately.

We have used 10 categories, this can be extended to train the model for more categories, and also this proposed approach can be extended to track the real-time drawn creativity of user over any social messaging platform to encourage less typing and more expressive theme.

References

It’s always good to give references.

  1. Finger Detection and Tracking
  2. Dataset

The code is available at github.com/Devashi-Choudhary/Real-Time-Creativity. For any questions or doubts, feel free to contact me directly at github.com/Devashi-Choudhary.

--

--

Devashi Choudhary
Analytics Vidhya

Data Scientist at NetApp | IIIT-DELHI | IET-DAVV | Machine Learner, Researcher and Part Time Photographer💯