Stories by Soumik Rakshit on Medium

Getting Started With Kaggle Digit Recognizer Competition

Soumik Rakshit — Thu, 18 Apr 2019 01:53:40 GMT

Getting Started With Kaggle Digit Recognizer Competition

What’s this competition all about???

MNIST (“Modified National Institute of Standards and Technology”) is the de facto “Hello World” dataset of computer vision. Since its release in 1999, this classic dataset of handwritten images has served as the basis for benchmarking classification algorithms. As new machine learning techniques emerge, MNIST remains a reliable resource for researchers and learners alike.

In the Digit Reconizer competition hosted by Kaggle, your goal is to correctly identify digits from a dataset of tens of thousands of handwritten images. Link to the Competition: https://www.kaggle.com/c/digit-recognizer.

Currently, I reside on 41st position on the public leaderboard for the competition among the top 2% of contestants. It took me about a year to achieve the score of 1.00000 on this competition and I thought that it would be the best time to share my experience with the competition.

How to start???

The first approach that I could think of was the classic old LeNet-5.

Originally proposed by Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner as a convolutional neural network (CNN) architecture for handwritten and machine-printed character recognition in 1990’s, it was used on large scale to automatically classify hand-written digits on bank cheques in the United States. Nowadays CNNs serve as the state-of-the art deep learning-based computer vision algorithms. These networks are built upon 3 main ideas: local receptive fields, shared weights and spacial sub-sampling. Local receptive fields with shared weights are the essence of the convolutional layer and most architectures described below use convolutional layers in one form or another.

LeNet-5 Architecture

When I originally submitted my first submission, I had implemented LeNet-5 using only Numpy in Python. The code was quite messy and I did not want to reproduce it. That’s why I recreated the solution using Tensorflow with the help of my friend Sohom Dey. The notebook for the solution can be found at

Koderunners/Convolutional-Neural-Networks

So let’s get started with the actual code.

On With the Code….

We did not use a wide range of libraries other that Numpy, Pandas, Matplotlib and Tensorflow which are available by default with the Kaggle Kernels environment.

https://medium.com/media/12ab439fdd0895b73ee04f5228cbf701/href

Output:

['train.csv', 'sample_submission.csv', 'test.csv']

We will now implement a couple of utility functions that will be used both in the model training and visualization phase.

https://medium.com/media/459ad4177c91d950d1e641616bf75337/href

Output:

Exploring the data

https://medium.com/media/38f1a768bf713881d30d93851386d74a/href

Output:

Number of images in training dataset: 42000
Number of pixels in each image in training dataset: 784
Number of images in test dataset: 28000
Number of pixels in each image in test dataset: 784

Preprocessing the Data

The images in the dataset have been given in 28 x 28 resolution. Since the LeNet-5 Architecture requires 32 x 32 images, we would convert the given images into 32 x 32 by applying extra zero padding on the images.

https://medium.com/media/abf6fcc2729a70806a4724e462bf9f65/href

Output:

((42000, 1024), (28000, 1024))

We also need to convert the target labels into their respective One-Hot Encoded format.

https://medium.com/media/a5ed68ed84e3b7b277ec7e4601654050/href

Output:

Shape of Training Labels: (42000,)
Shape of y_train after encoding: (42000, 10)

Building LeNet-5

We will first declare the training parameters and hyperparameters for the Neural Network.

https://medium.com/media/01cdbf8a191f54ce23d6e40673e2b3b4/href

We will start building the model by creating the placeholders. Placeholders are a type of Tensorflow objects that are not inintialized with any value, rather receive their value during execution of the Tensorflow graph inside a Tensorflow Session by a feed dictionary. The placeholders that we will declare will correspond to the images and the one hot encoded training labels.

https://medium.com/media/c73c8c48721d661126e7569591247450/href

We will not declare and initialize the Weights and Biases corresponding to each layer of the network that will be optimized during the training of the model.

https://medium.com/media/066de2fdfa747d4db51f4d1fccff264e/href

Now we will be building the actual model. We will create two utility functions for 2D Convolution and 2D Maxpooling with Valid padding. Then we will create the layers of the neural network.

https://medium.com/media/e09bd71f570c9151b3d82829a168473a/href

Now we will build the Tensorflow Graph for all of our operations. We will start by declaring the logits as out model, the loss operation (Softmax Cross Entropy with Logits), Optimizer (Adam optimizer). Then we will create the training operation that will be using the Optimizer to minimize the loss. We also create an operation to calculate the model accuracy and a Global Variable Initializer that initializes the global variables which in this case are the weights and biases.

https://medium.com/media/8c926adafad74813d0f92a0d210836e1/href

Trainig LeNet-5

We will train the network inside a Tensorflow Session.

https://medium.com/media/eb6fc13a96172d30cf19f1b418983651/href

Output:

Epoch 500, Cost: 28595476.4296875, Accuracy: 73.4375 %
Epoch 1000, Cost: 5947898.984375, Accuracy: 86.71875 %
Epoch 1500, Cost: 7127918.71875, Accuracy: 88.28125 %
Epoch 2000, Cost: 3046355.96875, Accuracy: 92.96875 %
Epoch 2500, Cost: 3755678.59375, Accuracy: 93.75 %
Epoch 3000, Cost: 1928981.6875, Accuracy: 92.1875 %
Epoch 3500, Cost: 769532.8125, Accuracy: 96.875 %
Epoch 4000, Cost: 1833259.3125, Accuracy: 93.75 %
Epoch 4500, Cost: 1317497.5, Accuracy: 96.09375 %
Epoch 5000, Cost: 1188782.34375, Accuracy: 93.75 %
Epoch 5500, Cost: 267834.515625, Accuracy: 98.4375 %
Epoch 6000, Cost: 1112221.875, Accuracy: 96.09375 %
Epoch 6500, Cost: 467607.857421875, Accuracy: 94.53125 %
Epoch 7000, Cost: 400827.03125, Accuracy: 97.65625 %
Epoch 7500, Cost: 22324.25, Accuracy: 99.21875 %
Epoch 8000, Cost: 394928.5625, Accuracy: 98.4375 %
Epoch 8500, Cost: 71348.0625, Accuracy: 99.21875 %
Epoch 9000, Cost: 0.0, Accuracy: 100.0 %
Epoch 9500, Cost: 24381.53125, Accuracy: 99.21875 %
Epoch 10000, Cost: 10489.375, Accuracy: 98.4375 %
----------------------------------------------------------------------

Optimization Finished

Accuracy on Training Data: 98.4071433544159 %

Let’s visualize the training history, that is, the change is loss and accuracy during each epoch.

https://medium.com/media/abcf872837714154d42851c215222ec9/href

Output:

https://medium.com/media/37d440c9185aab19651c739d423ac9b1/href

Output:

Making Predictions:

In order to get the predictions, we will have to first reinitialize the weights and bias variables using their optimized values. Then we would forward propagation through the neural network using these optimum weights and biases learnt during training.

https://medium.com/media/fb33fc4c43d97aed9f330b5db7312143/href

Output:

array([2, 0, 9, ..., 3, 9, 2])

Making Kaggle Submission

We need to first save our predictions for the test images in the format given in the sample submission file which we will upload as our submission.

https://medium.com/media/6eefddfbe8899748624295f4dfef4b9e/href

This solution, being one of my earliest does not fetch a very good position on the leaderboard. Given the current scenario on the competition leaderboard, you might get which is around top 80%. Still, I think LeNet-5 is a good way to get started with the Digit Recognizer Competition in particular and Convolutional Neural Networks in general.

In my subsequent articles I will share better techniques that hepls you achieve much better positions on the leaderboard. So, stay tuned for more articles :)

Getting Started With Kaggle Digit Recognizer Competition was originally published in Koderunners on Medium, where people are continuing the conversation by highlighting and responding to this story.

Preprocessing Text Data for Machine Learning: Part 1

Soumik Rakshit — Mon, 15 Apr 2019 17:48:31 GMT

Introduction

In the previous article we discussed various methods to perform Semantic Slot Filling, a very common problem in the field of Natural Language Processing. We discussed various methods for tackling such problems such as Rule Based Approaches and Machine Learning Approaches(including Deep Learning) and also discussed pros and cons of each method. Since Natural Language is a highly unstructured form of data, it needs to be preprocessed a lot to remove dialect-based or idiomatic inconsistencies to attain a state of uniformity and then converted to a mathematical form that can be then used to feed to a Machine Learning Models. In this article we would discuss various methods to preprocess text data.

Machine Learning with Text Data

Let us consider a Natural Language Problem that can be solved using Machine Learning: Sentiment Classification. Sentiment Classification based on user reviews is a very popular application as many businesses all over the world rely on the insights gathered from user reviews to take major decisions. In many cases, such decision-making is automated using Recommendation Systems. This means that we would have to create a system that would take as input raw text from user reviews (on a product or service) and output the class of sentiment, usually positive and negative. The possible outputs can be more than two is number or even a range of number if we would treat the problem as a Regression problem, but for the sake of simplicity we would consider the problem to be a Binary Classification problem. For example,

“The watch was very stylish and comfortable to wear and also keeps time accurately” — is considered to be a positive review.
“The band of the watch is loose and it kept loosing time 2 days after getting delivered. Best quality my ass -_-” — is considered a negative review.

Note that we won’t be considering sarcasm a seperate class in this example. So, reviews like “The headphone is so good that I can listen to music from other galaxies” or “I robbed a bank to buy this headphone and now listening music inside jail” would be classified positive or negative depending on our model.

Text Preprocessing

First of all we have to understand what text really is. We can consider text to be a sequence of low-level features like characters and words or high-level representations such as phrases, named entities or even bigger chunks like sentences, paragraphs, etc. Considering text as a sequence of words might seem a reasonable choice. In fact, we can easily find boundaries of words by splitting sentences by spaces of punctuation.

Defining boundaries of words could be much more difficult in many languages like German where there are ridiculously long compound words that are written without spaces or punctuation. For example, the 63 characters long word rindfleischetikettierungsüberwachungsaufgabenübertragungsgesetz means the law for the delegation of monitoring beef labeling. For the analysis of these words, it might be beneficial to break them up into separate words

Also, in some languages such as Japanese, there are no spaces.

Tokenization

Tokenization is a process that splits an input sequence into several small chunks or tokens. You can think of a token as a useful unit for semantic processing. The most important thing to be noted is that a token in case of text need not always be a word, it can be singular characters, words, sentences, paragraphs etc.

Let us see the examples of some popular tokenization methods:

Whitespace Tokenizer: It defines whitespaces as the boundary between tokens. For example,

https://medium.com/media/50fac598e69feb55ae302c9bd7642f6e/href

Output:

['This', 'is', 'an', 'example.', 'I', 'am', 'writing', 'gibberish.', 'By', 'the', 'way,', 'who', 'gave', 'Zack', 'Snyder', 'the', 'idea', 'of', 'Martha?', 'If', "that's", 'Ben', 'Affleck', 'he', 'sure', 'is', 'crazy.']

The problem in this case is that the tokens “example” and “example.” are treated differently although they have the same meaning.

Word Punct Tokenizer: It seperates tokens by splitting the input sequence on the basis of punction and whitespace. We can easily implement it using Regular Expressions.

https://medium.com/media/826ee6e91df9225eb45ed11961bcd1ea/href

Output:

['This', 'is', 'an', 'example', '.', 'I', 'am', 'writing', 'gibberish', '.', 'By', 'the', 'way', ',', 'who', 'gave', 'Zack', 'Snyder', 'the', 'idea', 'of', 'Martha', '?', 'If', 'that', "'", 's', 'Ben', 'Affleck', 'he', 'sure', 'is', 'crazy', '.']

The problem in this case is that it splits the word “that’s” into three seperate tokens that, ', and s which is undesirable.

Tree Bank Word Tokenizer: It uses a set of rules or heuristics that defines the grammar of the English language to produce tokenization that actually makes sense for further analysis. It can be implemented using the NLTK or Natural Language Toolkit in Python.

https://medium.com/media/29c9a141e228c00d81ca8151c3a2f232/href

Output:

['This', 'is', 'an', 'example.', 'I', 'am', 'writing', 'gibberish.', 'By', 'the', 'way', ',', 'who', 'gave', 'Zack', 'Snyder', 'the', 'idea', 'of', 'Martha', '?', 'If', 'that', "'s", 'Ben', 'Affleck', 'he', 'sure', 'is', 'crazy', '.']

Tree Bank Tokenizer produces the most meaningful tokens from text in the English Language.

Token Normalization

Token Normalization refers to converting every token into a standard canonical form which it might not have had before. Normalizing Tokens ensures consistency in the data that is further preprocessed or analyzed. There are two processes of Normalizing Tokens:

Stemming: It is the process of removing and replacing suffixes from the token to get the root form of the word which is called a stem. Stemming usually refers to heuristics that chop off suffixes.

Porter’s Stemmer is the oldest stemmer for the English language. It has five heuristic phases of word reduction applied sequentially. These rules are simple rules like Regular Grammar or Context Free Grammar. Following are some example of rules in the first phase of Porter’s Stemmer:

╔════════════╦═══════════════════╗
║   RULE     ║    EXAMPLE        ║
╠════════════╬═══════════════════╣
║ SSES -> SS ║ caresses -> caress║
║ IES -> I   ║ ponies -> poni    ║
║ SS -> SS   ║ caress -> caress  ║
║ S -> null  ║ cats -> cat       ║
╚════════════╩═══════════════════╝

https://medium.com/media/0b2f85406db218b572d6f2e7fa77ace3/href

Output:

['the', 'wolv', 'will', 'not', 'eat', 'the', 'fish', ',', 'they', 'will', 'have', 'onli', 'lamb', '.']

The problem with porterstemmer is that due to strict heuristics it often produces irregularities, an example of which is wolves -> wolv or feet -> feet.

Lemmatization: It is a Token Normalization process that uses liguistic processes to find the base or dictionary form of the word for every token which is also called a lemma. Lemmatization utilizes a vocabulary and morphological analysis for finding the lemma of every token.

The Wordnet Lemmatizer is a commonly used lemmatizer implemented in the NLTK library. It uses the Wordnet Database to look up lemmas which is a large lexical database of the English language created and maintained by Princeton University.

https://medium.com/media/c12082318cfc06e9961b8b58870fadcd/href

Output:

['The', 'wolf', 'will', 'not', 'eat', 'the', 'fish', ',', 'they', 'will', 'have', 'only', 'lamb', '.']

Note that Wordnet Lemmatizer tackles some irregularities raised by Porter Stemmer such as wolves -> wolf or feet -> foot.

Although Wordnet Lemmatizer does a pretty good job, its not 100% accurate. It does a good job normalizing nouns, but might fail for verbs sometimes. Neither Stemming nor Lemmatization is perfect and we need to chose our normalization methodology depending on our track.

Normalizing Upper Case Letters:

Us, us -> us if both are pronouns but us, US could also be a pronoun or the name of a country.
We could define heuristics to solve this problem:
* lowercasing beginnning of the sentence
* lowercasing words in titles
* leave mid-sentence words as they are
We could also use Machine Learning to retrieve true case of tokens which would be quite complex.

Stay tuned for subsequent articles :)

Preprocessing Text Data for Machine Learning: Part 1 was originally published in Koderunners on Medium, where people are continuing the conversation by highlighting and responding to this story.

Semantic Slot Filling: Part 1

Soumik Rakshit — Mon, 15 Apr 2019 14:25:44 GMT

Semantic Slot Filling: Part 1

One way of making sense of a piece of text is to tag the words or tokens which carry meaning to the sentences. In the field of Natural Language Processing, this problem is known as Semantic Slot Filling. There are three main approaches to solve this problem:

Rule Based Approaches
Machine Learning Approaches
Deep Learning Approaches

Let us consider the following query text:

Show me all the Buses from Kolkata to Bhubanshwar on Friday.

Given this piece of text, we have to find some slots which may be Destination, City, Date etc. We will see how to fill these slots using the previously mentioned approaches.

1. Rule Based Approach

This approach consists on Semantic Slot Filling techniques using Regular Grammars or Context Free Grammars.

A Grammar is defined as a set of production rules which are used to generate strings of a language, which in this case may be a Regular or a Context Free Language.

In this case we would use a Context Free Grammar(CFG) to do Semantic Slot Filling. Effectively we can always chose to use Context Free Grammars since they are a super-set of Regular Grammar.

Let us consider the following CFG:

S -> SHOW BUSES ORIGIN DESTINATION DATE|...
SHOW -> show me|i want|can i see|...
BUSES -> bus|a bus|buses
ORIGIN -> from CITY
DESTINATION -> to CITY
CITY -> kolkata|bhubaneshwar|Ahmedabad|...
DATE -> sunday|monday|...|saturday

Note that the Non-terminal strings in this case are the slots. Let us create the Parse tree of the above grammar

Parse Tree

Now let us derive our query text from the parse tree

Derivation of Query Text from Parse Tree

Since we can use this Context Free Grammar to parse our query string, we would know exactly which non-terminal strings produced the terminal tokens in the query string and we can tag the tokens in our string accordingly. This would result in the following tagging:

Show me
all the Buses
from Kolkata
to Bhubanshwar
on Friday.

Advantage: : This approach has very high precision.

Disadvantage:

Someone(usually a linguist) would have to write down the rules manually which is very time consuming.
The recall of this process will not be good because, it would not be practical to write down every possible date.

2. Machine Learning Approach

For a machine learning approach to solve this problem, we need some kind of data to learn from. This data usually comes in the form of a Training Corpus which is a large body of text with the necessary tags present in the form of Markup. For example,

Are any buses leaving for Kolkata today?

Once we have our training data, we have to do some feature engineering, and extract some useful features such as

Is the word in uppercase?
Is the word present in the name of cities?
What are the previous words?
What are the next words?
What is the previous slots?

and so on….

Feature engineering usually depends on the training corpus, the application at hand (which in this case is tagging words) and most importantly the creativity of the engineer.

Now we need to define our model. The model may be a probabilistic model that gives the probability of the tags associated with a given word. This probability will depend of the features representing some text and some parameters.

model -> p(tags|words) -> function of features and parameters

The parameters of the model should be trained. So we will need to take your train data and fit the model to this data and maximize the probability of what we see, by the parameters. Once we have the trained parameters, we can take the feature corresponding to a tag and infer that the tag with the maximum probability is associated with the word.

predicted_tag = argmax(p(tags|words))

3. Deep Learning Approach

The Deep Learning approach is largely similar to the machine learning approach except in the Deep Learning methodology, features do not need to be manually engineered, rather we feed an encoded sequence of words into a Neural Network and the different hidden layers of the Neural Network act as feature extractors.

Usually for Semantic Slot Filling and other Sequence Related Tasks, the most popular Deep Learning Models are Recrrent Neural Networks(RNNs).

RNN Architecture

For practical purposes, Gated Recurrent Units or GRU blocks and Long Short Term Memory or LSTM blocks are used in the RNN architecture.

Stay tuned for subsequent articles :)

Semantic Slot Filling: Part 1 was originally published in Koderunners on Medium, where people are continuing the conversation by highlighting and responding to this story.

Intersection Over Union

Soumik Rakshit — Mon, 15 Apr 2019 14:04:26 GMT

In the previous article, Indroduction to Object Detection, we have seen how a single object can be detected in an image by predicting a bounding box for the object. Like all Machine Learning Tasks, prediction of bounding box requires an accuracy metric to tell us how accurate the predictions are. In this article we will be discussing an accuracy metric that can be used for Object Detection.

I Owe You

Let us consider the following photo of a grumpy cat.

Now, in ideal case, an object detection algorithm should should be identifying him somewhat like this.

But, if an object detection algorithm outputs the following blue bounding box, how do we tell how much off the mark our prediction is???

In this case, we calculate IOU or Intersection Over Union for the following bounding boxes.

IOU(Box1, Box2) = Intersection_Size(Box1, Box2) / Union_Size(Box1, Box2)

Implementing IOU

The bounding box coordinates are in the form (x, y, width, height). We will first calculate the width and height of the Intersection Box and size of Intersection will be area of the Intersection Box. We can get the Union size by subtracting the Intersection size from total area. All the code is part of the following Kaggle Notebook.

Code:

https://medium.com/media/2ac01449f9afd8d833623aca92197413/href

We will modify the visualization code to also show the respective IOU.

https://medium.com/media/e826e0307a98bebecb2cbe63f6086fa1/href

Output:

Intersection Over Union was originally published in Koderunners on Medium, where people are continuing the conversation by highlighting and responding to this story.

Introduction to Object Detection

Soumik Rakshit — Mon, 15 Apr 2019 13:45:12 GMT

Image Classification

In Image Classification problems we classify an image to a specific class. The whole image represents one class. We don’t want to know exactly where are the object. Usually only one object is presented.

Object Detection

Sometimes we need more information from an image rather than just a predicted class. Given an image we want to learn the class of the image and where are the class location in the image. We need to detect a class and a region of interest(usually a rectangle) of where that object is. This is especially useful if a single object is placed in a very small area of an image or multiple objects of same or different classes are present in the image.

Semantic Segmentation

Semantic Segmentation allows us to gather even more information compared to Object Detection. While in Object Detection we usually identify a rectangular region of interest containing a classified object, in Semantic Segmentation we label each pixel in the image with a category label. Semantic Segmentation doesn’t differentiate instances, it only cares about individual pixels.

Rectangle Detector

We will be building a very simple object detection module for detecting single rectangles in images. We will be training a simple classifier using Keras that will be predicting the bounding boxes of the rectangles. The code used in this article can be found here on this Kaggle Notebook.

Importing The Necessary Libraries

We will be using Numpy for linear algebra, Matplotlib for visualization, Scikit-Learn for splitting the data into training and test set, Keras for building the classifier and Tensorflow as the backend of Keras.

Code:

https://medium.com/media/a86f4399088eeb072911167069020f91/href

Output:

Using TensorFlow backend.

Generating the Dataset

We will be generating 50000 images of height and width 16, where each image will contain an object. Minimum size of an object will be 1 and maximum size will be 8.

We will generate 50000 numpy arrays of shape (16, 16), initialize them with 0 and thus we get the background. Then we will take a random patch from the image and set it to 1. This patch will become our object and the coordinates of the object serve as the bounding boxes.

https://medium.com/media/8ba78227b018210427cb1f0b12c7ac18/href

Output:

Images shape: (50000, 16, 16)
Bounding Boxes shape: (50000, 1, 4)

Visualizing Samples from Generated Images

We will display the image in binary format(0 = White and 1 = Black) and plot a rectangular patch on the image denoting the bounding box.

Code:

https://medium.com/media/a6d995959cbcc0711a91c62bb71a8ee5/href

Output:

Preprocessing

In the preprocessing step, we will first flatten the images and the bounding boxes, then normalize the images and then split the dataset into training and test set with 33% of the data being in the test set.

Code:

https://medium.com/media/03fb835b0432096b8ea9131179d9066e/href

Output:

(50000, 256), (50000, 4)

Model Training

We will build a very simple neural network with fully connected layers.

The Input Layer consists of 256 nodes, since each image has 256 pixels.
The hidden layer consists of 256 nodes, with an activation function ReLu(Rectified Linear Unit).
The Output Layer consists of 4 nodes corresponding to the 4 coordinates of a bounding box.

Code:

https://medium.com/media/922a0b766ddf1bcc75609e93987afc69/href

Output:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
activation_1 (Activation)    (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 1028      
=================================================================
Total params: 66,820
Trainable params: 66,820
Non-trainable params: 0
_________________________________________________________________

We will train the network with adadelta optimizer and the loss function will be mean_squared_error.

Code:

We will implement a callback to store the learning rate at each epoch and train the model till 30 epochs.

https://medium.com/media/ce3536adb44578bef3f46f477266419c/href

Output:

Train on 30150 samples, validate on 3350 samples
Epoch 1/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 63us/step - loss: 0.0386 - acc: 0.6658 - val_loss: 0.0257 - val_acc: 0.7651
Epoch 2/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 0.0058 - acc: 0.8290 - val_loss: 0.0042 - val_acc: 0.8490
Epoch 3/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 0.0026 - acc: 0.8766 - val_loss: 0.0031 - val_acc: 0.8215
Epoch 4/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 0.0019 - acc: 0.8901 - val_loss: 0.0019 - val_acc: 0.8991
Epoch 5/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 0.0015 - acc: 0.9048 - val_loss: 0.0016 - val_acc: 0.8669
Epoch 6/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 53us/step - loss: 0.0013 - acc: 0.9059 - val_loss: 0.0012 - val_acc: 0.8940
Epoch 7/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 0.0011 - acc: 0.9066 - val_loss: 0.0014 - val_acc: 0.8764
Epoch 8/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 9.7049e-04 - acc: 0.9162 - val_loss: 0.0015 - val_acc: 0.8919
Epoch 9/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 8.9787e-04 - acc: 0.9156 - val_loss: 0.0011 - val_acc: 0.9107
Epoch 10/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 8.0017e-04 - acc: 0.9162 - val_loss: 9.8927e-04 - val_acc: 0.9364
Epoch 11/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 7.3458e-04 - acc: 0.9198 - val_loss: 6.2795e-04 - val_acc: 0.9325
Epoch 12/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 6.7894e-04 - acc: 0.9202 - val_loss: 6.7405e-04 - val_acc: 0.9063
Epoch 13/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 6.3128e-04 - acc: 0.9209 - val_loss: 8.1824e-04 - val_acc: 0.9304
Epoch 14/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 5.7864e-04 - acc: 0.9208 - val_loss: 5.4598e-04 - val_acc: 0.9534
Epoch 15/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 5.3993e-04 - acc: 0.9206 - val_loss: 6.0319e-04 - val_acc: 0.9099
Epoch 16/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 5.1695e-04 - acc: 0.9234 - val_loss: 4.6435e-04 - val_acc: 0.9242
Epoch 17/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 4.7829e-04 - acc: 0.9239 - val_loss: 7.4663e-04 - val_acc: 0.9239
Epoch 18/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 4.5803e-04 - acc: 0.9237 - val_loss: 6.2423e-04 - val_acc: 0.9110
Epoch 19/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 4.3221e-04 - acc: 0.9238 - val_loss: 4.8555e-04 - val_acc: 0.9316
Epoch 20/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 53us/step - loss: 4.1499e-04 - acc: 0.9235 - val_loss: 4.5414e-04 - val_acc: 0.9469
Epoch 21/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 3.8647e-04 - acc: 0.9258 - val_loss: 5.5298e-04 - val_acc: 0.9116
Epoch 22/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 3.7609e-04 - acc: 0.9251 - val_loss: 4.3545e-04 - val_acc: 0.9269
Epoch 23/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 3.4303e-04 - acc: 0.9255 - val_loss: 5.7299e-04 - val_acc: 0.9218
Epoch 24/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 3.3132e-04 - acc: 0.9272 - val_loss: 8.2355e-04 - val_acc: 0.9173
Epoch 25/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 3.1925e-04 - acc: 0.9266 - val_loss: 5.7306e-04 - val_acc: 0.9230
Epoch 26/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 3.0340e-04 - acc: 0.9280 - val_loss: 4.8713e-04 - val_acc: 0.9364
Epoch 27/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 51us/step - loss: 2.9643e-04 - acc: 0.9249 - val_loss: 3.5696e-04 - val_acc: 0.8884
Epoch 28/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 2.9189e-04 - acc: 0.9269 - val_loss: 3.7836e-04 - val_acc: 0.9555
Epoch 29/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 2.7267e-04 - acc: 0.9261 - val_loss: 3.0140e-04 - val_acc: 0.9099
Epoch 30/30
Learning Rate: 1.0
30150/30150 [==============================] - 2s 52us/step - loss: 2.6856e-04 - acc: 0.9250 - val_loss: 3.7276e-04 - val_acc: 0.9469

Prediction and Visualization on Test Set

Now we will be get the predicted bounding boxes on the Test set.

Code:

https://medium.com/media/6856a2616d16e2b0dd0b08f50a910595/href

Output:

(16500, 4)

Let us visualize the predicted bounding boxes on test data.

Code:

https://medium.com/media/5f968c789d6808403e5d82c2d7bcc347/href

Output:

Thank you for reading and stay tuned for subsequent articles :)

Introduction to Object Detection was originally published in Koderunners on Medium, where people are continuing the conversation by highlighting and responding to this story.

Pyside Chapter 1: GUI Development in Python

Soumik Rakshit — Sun, 25 Nov 2018 21:20:43 GMT

What is PySide???

PySide is a Python binding of the cross-platform GUI development toolkit Qt, currently developed by the Qt company under the Qt for Python Project. Pyside provides LGPL-licensed Python bindings for the Qt 4. It also includes complete toolchain for rapidly generating bindings for any Qt-based C++ class hierarchies. PySide Qt bindings allow both free open source and proprietary software development and ultimately aim to support Qt platforms. Pyside is a really useful framework used for developing cool looking Graphical User Interfaces easily for your python applications. We would be using Pyside2, the latest version of Pyside.

Installing Pyside2

Create a new conda environment using conda create — name pyside. This will create a new conda environment by the name pyside, which we will be using for installing the necessary dependencies.
Activate the environment created in the last step using activate pyside. In case you are on Linux, use source activate pyside.
Install PySide2 in this environment using conda install -c conda-forge pyside2 . It would install all the necessary dependencies along with PySide2.

Introduction to Pyside2

We will start by creating a simple GUI application displaying a Hello World text on the interface. We will start by importing the necessary libraries.

https://medium.com/media/f8bed6217814bf1c2571a6af8d5b3465/href

While developing a PySide application, we would usually start by importing the PySide2.QtWidgets classes since they have the main functions for implementing a Qt GUI in Python. Next we will start creating our application.

https://medium.com/media/3af535d7cdb80f65730f39fa8c358fa2/href

We start by creating an app which is an instance of the PySide2.QtWidgets.QApplication class. It manages the GUI applications control flow and main settings.

Next we will be creating a widget. A widget is the smallest unit of a user interface; it receives mouse, keyboard and other events from the window system, and paints a representation of itself which is rectangular in shape on the screen.

https://medium.com/media/320102607f374efea6e24dbce04ddab1/href

We create a widget wid using PySide2.QtWidgets.QWidget class and display it on the screen using the show() method. The method app.exec_() lets us enter the Qt main loop and start executing the Qt code. After execution of the Qt code, when we close the application, the program is terminated by sys.exit().

We can also change the title of the GUI application using the setWindowTitle() method.

https://medium.com/media/5af747d870b6f293c87fbed9e70fe3e0/href

When we execute the above script, we get the following result:

Now we will try to display some text on the User Interface. This can be achieved using the PySide2.QtWidgets.QLabel . While QWidget creates a blank widget, QLabel creates a widget that provides text or image display.

https://medium.com/media/edaef794783094a9f6732be3195ae301/href

We create a label initialized with out display string “Hello World!”. Then we resize the label widget and set a title for it and display it using show(). The result on running this script will be:

As you can see that the display string is not only small in size but is also not properly aligned which surely does not make it look good. We can fix this by initializing the label with custom HTML instead of pain string.

https://medium.com/media/411db0ed100251b49329ca08b757c10d/href

The output this time will be:

For the next articles on PySide and more such articles, stay tuned on https://geekyrakshit.ml.

Pyside Chapter 1: GUI Development in Python was originally published in GeekyRakshit on Medium, where people are continuing the conversation by highlighting and responding to this story.