Stories by Roland Meertens on Medium

Dataset in a day

Roland Meertens — Tue, 28 Nov 2023 17:33:30 GMT

A clustering-based approach to create deep learning datasets in a day

Introduction

Understanding what’s happening in an image is both an important task, as well as a costly one. In the last few years, the field of computer vision has greatly accelerated due to the advances in neural networks. At Bumble Inc., we see potential value in computer vision for a variety of use cases, such as improving the safety of our platform and providing our members with a better user experience.

The most common way to train these neural networks is by showing it many images with the corresponding label. Unfortunately, this can be a costly task. Not only does one need to build and train the model, one also wants to do hyperparameter search over multiple configurations of possible networks, and — of course — one needs to find or build a dataset suitable for the task at hand.

Building the dataset is both the most important task, as well as a very time consuming one. Gathering data, setting up labelling requirements, and of course the labelling itself all take a lot of time and money. This normally leads to trade-offs, by choosing either to build only a small dataset, or by trying to fit existing datasets into your specific use-case.

One alternative is of course to not build a dataset at all, to instead use zero-shot learning for your use case. I argued in the past that this is unreasonably effective, and allows you to test your use-case before even training a model. When using zero-shot learning one predicts labels without explicitly training on the classes you are trying to learn. One example of this can be achieved by using the CLIP model, which is trained to have a strong association between text and images. By looking at the distance between the description of your class and the image you can run inference without training anything. However, there are some use cases where we need the strongest possible model by fine-tuning it to our specific data.

Using foundational models for data selection

Foundational models, such as GPT-3 for text and CLIP for images, provide seemingly amazing understanding of the world around us. If your data represents something which can be abundantly found on the internet one can immediately start using these models for their use case. However, these models on their own are not providing you with a dataset.

In our experience, foundational models are great at retrieval of specific data. This is amazing in case you want to find rare or very specific examples of data. For example, for a self-driving car you might be interested in retrieving examples of people in wheelchairs, or ambulances and other emergency vehicles. However, we noticed that foundational models are not achieving a high classification accuracy at zero-shot learning.

Clustering in latent space

The big trick for foundational models is understanding that similar concepts are close together in the so-called “latent space”. The output of a neural network might be a number of class predictions, but in the layers above that one has a list of numbers. However, there is a logic to these numbers in that similar concepts will be close together. That way the later layers are able to differentiate between the underlying concepts the network uses for predictions. Note that the underlying concepts are never explicitly given to the network. We don’t say that “a dog has four legs and a furry skin”, it simply learns that some things have legs, some things have a furry skin, and that something which has all of that might be a dog. That also means that we don’t really know what aspects are learned, and why certain things are close together in an embedding space.

The approach we developed at Bumble Inc. hinges on the fact that there is meaning in the latent space of foundational models. Ideally, if we label one image, we would like to immediately label all similar images to the one we labelled. In this case, similar would be all the images which are in the same area in the latent space of a foundational model.

Unfortunately, we already explained that foundational models such as CLIP don’t explicitly tell us what ‘aspect’ of an image is the reason for clustering two images together. For example, look at the two images below. There are multiple aspects for which these images could be ‘close’ together. For example, both are taken in the mountains, both are showing me doing sports, both are selfies posing with others, and both are photos where people are wearing helmets. In practice we see that if we cluster our data in the CLIP embedding space we get clusters which are not always the clusters we wanted or expected — if as humans this is an easy task to perform, it isn’t as trivial when done iteratively at scale. Interestingly enough, we found clusters like ‘people leaning onto things’, ‘people who try to look like angels’, and ‘people posing with a flag’.

This is where CoCa comes in. CoCa is a network which can automatically create captions for images. For example, the above images are captioned as:

a man and a woman posing for a picture on a ski slope.
a man and a woman standing on top of a mountain.

We can see that the captions are far from perfect, but at this point we don’t really care. We can at least ‘explain’ to a certain extent what is in a photo, and can do so for each photo in a cluster. CoCa is built on top of the embeddings which CLIP generates. This is great for our use-case, as it means that clusters in the CLIP embedding space can be automatically described using CoCa.

However, reading this for each photo is a lot of work. We want to have a summary of what is happening in a cluster. This is where Bumble’s open-source Buzzwords library comes in. The library allows us to take all captions in a cluster to summarise it. We notice that this gives a reasonable description of clusters of various nature. For example, we can assume that the cluster with the buzzwords “mountain skiing goggles” is a cluster of photos taken on a snowy mountain, and the cluster ‘selfie standing mountains hiking sunglasses group is probably a collection of hiking photos.

With the above descriptions one can simply label the entire cluster at once. If one needs a classifier for ‘cats vs dogs’ one can immediately search for all clusters which contain the keywords ‘cat’ and ‘dog’.

Our experiment

One way we experimented with our dataset is by training a neural network on several classes to determine what is happening inside of a photo. We chose relatively broad classes to demonstrate that our dataset manages to capture a wide variety of contents in photos. The classes are relevant to what people have in their dating photos, and explain what kind of lifestyle they have. The classes we chose were:

“Animal”: for photos of pet lovers with their fur babies
“Children”: for photos people took with or around children
“Food/drink”: for photos taken in bars and restaurants by foodies.
“Music“: for photos with musical instruments and at concerts
“Outdoor activities”: for the ones who like to be outside for anything from skiing and hiking to laying on the beach
“Sport”: for photos of people doing anything from riding a bike outdoors to playing soccer in a hall
“Staying in”: for any activity which is performed inside, such as playing boardgames
“Vehicles”: for those attached to their car, van, or bike.

During training the network we predict all labels at the same time, phrase it as a multi label classification problem, and use a binary cross entropy loss. Note that not every photo has to have one of these labels. In fact, most photos in our dataset don’t have any label attached to them. The most common reason is that they are selfies without the subject of the photo doing anything we could act on.

The dataset we created contains a million photos which are created by inspecting the above mentioned clusters. We labelled 821 clusters manually to apply the above mentioned tags to each of the photos in the clusters. This gives us 163 clusters with any tag (some clusters get multiple tags, such as outdoor sports) and 651 clusters which clearly do not fall into any of these categories. Not every photo in the dataset is used during training though, we only use photos which have at least one of the corresponding categories. This gives us 221.385 images, which we split into a train and test set.

The model we train is a ResNet50 model which we train for 50 epochs (with early stopping enabled) and evaluate it on a hold-out dataset. Above you can see the performance of the model trained on 1 million images. We can see that it learns all classes reasonably well. The hardest class to learn is our class “staying in”. This is also one of the more diverse classes and contains a very wide range of activities which thus also makes it hard to generalise.

We also see continuous improvements through the addition of more data. Although the rule ‘more data = better’ was already a staple of machine learning it’s good to see that more data from auto-generated clusters also keeps improving the final model performance. Note that in this case we don’t need to label extra data to actually get more data. Because we are labelling whole clusters we can simply gather more unlabelled data, see if it belongs to any clusters we already labelled, and assign the same label to these images.

This model is useful for us for several reasons. We could use it either for matching purposes (e.g. suggest pet lovers to others with the same interest), or use it for feedback on profiles (e.g. “you say you like dogs, but we don’t see any pets in your photo”).

Privacy by design

The last feature this approach has is that one can create a dataset of photos without having to look at every single photo. Only looking at a few images from each cluster gives you an idea of what the photos in the cluster represent, and one could even choose to not look at a single image but only look at the descriptions. Naturally this is beneficial if one is working with privacy-sensitive photos — there is no need for anyone to look at what is happening inside every single photo specifically if one can simply infer what is happening by the buzzwords topics. The job of image moderation can be very emotionally taxing, and simply reading what is happening in an image rather than having to see it goes a long way.

Conclusion

We presented an efficient way to create a large dataset for any computer vision application using unannotated data. When we are using this approach we always get great results, even for tasks where the object to classify can be hard to spot. Although we acknowledge that there will be some noise in the data, we believe that there is a large benefit of creating a large dataset in a short amount of time. Additionally, we hope more companies will be inspired to take this approach and continue to improve their processes while protecting the privacy of their user base.

Dataset in a day was originally published in Bumble Tech on Medium, where people are continuing the conversation by highlighting and responding to this story.

Managing our budget with Excel and machine learning

Roland Meertens — Thu, 14 Dec 2017 14:03:48 GMT

A little over a year ago my girlfriend Lisette and I moved in together. A big part of living together was getting used to managing a budget, and knowing where our money went. Lisette made one of the coolest Excel spreadsheets I ever saw, the only thing we needed to do was… actually fill in what expense belongs to what category. This is where things went wrong…Every month we have about 100 shared expenses, and labeling them turned out to be a boring job we both didn’t want to do (and thus ignored for the last 10 months…). Last weekend I made an attempt at automating this task using the power of machine learning!

The first step to training a classifier is getting your training data! My bank gives you the option to download a spreadsheet with all (unlabeled) expenses. I imported this into a Google spreadsheet and added two columns: one with my own (optional) labels and one for the computer-generated labels.

Getting Excel data into Python

Although writing a classifier in Excel is probably possible I used Python with the NLTK and SKLearn library. To do this I needed to get all transactions and labels I added in my Jupyter Notebook. Thanks to Greg Baugues this turned out to be surprisingly easy! His blog post was a great help, and made this process pretty smooth.

In [1]:

import gspread
from oauth2client.service_account import ServiceAccountCredentials

creds = ServiceAccountCredentials.from_json_keyfile_name('google_account.json', ['https://spreadsheets.google.com/feeds'])
client = gspread.authorize(creds)
temp = client.open("rolands budgetvariant")
sheet = temp.worksheet("ALLES")

For each transaction, I made a feature vector with a boolean for each of the most common words in the transaction. I made separate lists for words in the description, the number of the account money was transferred to, and whether we receive money or not.

In [2]:

import nltk
def get_freq_dist_for_sheet(sheet, key, max_words=30):
    records = sheet.get_all_records()
    words = list()
    for record in records:
        words.extend(w.lower() for w in record[key].split())
    
    all_words = nltk.FreqDist(words)
    word_features = list(all_words)[:max_words]
    return word_features

interesting_features = ["Naam / Omschrijving", "Tegenrekening", "Af Bij", "Mededelingen"]
freq_dists = dict()
for feature in interesting_features:
    freq_dists[feature] = get_freq_dist_for_sheet(sheet, feature)

In [3]:

def record_features(record, key, doc_features): 
    document_words = set(w.lower() for w in record[key].split()) 
    features = {}
    for word in doc_features:
        features['contains({},{})'.format(key,word)] = (word in document_words)
    return features

def all_record_features(record):
    input_data = dict()
    for categorie_name in freq_dists:
        ## dict.update means that you merge dictionaries
        input_data.update(record_features(record, categorie_name, freq_dists[categorie_name]))
    return input_data

def get_traindata(sheet):
    records = sheet.get_all_records()
    traindata = list()
    for record in records:
        if record["Categorie"]:
            input_data = all_record_features(record)
            traindata.append((input_data, record["Categorie"]))
    return traindata

In [4]:

training_data = get_traindata(sheet)
all_labels = set([x[1] for x in training_data])
print("Training with " + str(len(training_data)) + " entries")
print(all_labels)

Training with 365 entries
{'reizen', 'inleg roland', 'overig', 'benzine', 'goede doelen', 'sport', 'uit eten', 'electriciteit', 'internet', 'huur', 'zorgverzekering roland', 'water', 'verzekering roland', 'sport lisette', 'wegenbelasting', 'tanken', 'cash', 'parkeren', 'openbaar vervoer', 'auto', 'abonnementen', 'verzekering auto', 'boodschappen', 'inleg lisette', 'waterschapsbelasting'}

After selecting all the transactions I labeled, and converting them to these feature vectors I could select and train a classifier! I decided to go for a simple decision tree. Not only did I expect this to work reasonably well for my features (only recognizing where I do groceries and who I pay my rent to would remove 80% of transactions I normally have to label!). Conveniently the NLTK library I used to create the frequency distribution also contains a class that allows you to import any SKLearn classifier. This reduced training to one line of code

In [5]:

from nltk.classify import SklearnClassifier
from sklearn import tree

classifier = SklearnClassifier(tree.DecisionTreeClassifier(), sparse=False).train(training_data)

Visualising the decision tree

The SKLearn tree classifier has a function to write the decision tree as a graphviz file. This function requires the classifier NLTK created, the feature names, and the class labels. Getting these required a bit of documentation reading as there are no clear functions to get these (and there is no way to know what the classifier did with your data). Eventually, the following code was able to get what I needed. Python can even write the whole tree itself if you install and import the graphviz library.

In [6]:

import graphviz 

dot_data = tree.export_graphviz(classifier._clf, out_file=None, 
                         feature_names=classifier._vectorizer.get_feature_names(),  
                         class_names=classifier.labels(),  
                         filled=True, rounded=True,  
                         special_characters=True)
graph = graphviz.Source(dot_data) 
graph.render("budget_decisiontree")

Out[6]:

'budget_decisiontree.pdf'

Below is a part of the decision tree the algorithm generated. It correctly discovered that I do my grocery shopping at the “Albert Heijn” (https://www.youtube.com/watch?v=GiZJa_Ctkr4), where I rent my apartment, where my internet money goes to, and much more!

Labeling data

And now the most important part of this project: classify each of my transactions! As described at the start of this article I added a column for the computer prediction. The Google Sheets API allows you to write a single cell at a time which for some reason takes around a second per edit. Although it’s annoying if you try to iterate quickly, it gives some cool visualizations while your algorithm is working!

In [7]:

records = sheet.get_all_records()
for row, record in enumerate(records):
    try:
        row += 2 # rows start at 1... first row is a header
        input_data = all_record_features(record)
        but = classifier.classify(input_data)
        if but != record["Computer guessed"]:
            sheet.update_cell(row, 11, but)
    except Exception as e:
        print("Exception at row " + str(row))

Conclusion

Although not everything is filled in correctly, about 80% of my transactions are now correctly labeled! It saved me a lot of time, was an interesting challenge, and makes the awesome Excel sheet way more usable now.

If you are interested in the code, it’s available on Github: https://github.com/rmeertens/python_budget_classifier . I always love to hear feedback from people so please reach out to me!

Autonomous vehicles will lead others through congested cities

Roland Meertens — Thu, 21 Sep 2017 08:56:33 GMT

This weekend we got the second place in the Hack the Road Hackathon with our idea to let connected vehicles lead other vehicles through a “green wave”. As there will be a long period in which smart vehicles and “dumb” vehicles drive through the same streets, building this system would reduce a lot of traffic problems in the city without for a low price!

The green wave

In traffic, a green wave is a phenomenon that occurs when series of traffic lights are green when you approach them. Riding a green wave, you never have to stop for a red traffic light. As stopping and accelerating takes a lot of time and wastes energy, such a green wave is beneficial for you and the road users behind you. If you know your distance to the next intersection and the time it will take for the traffic sign to become green you can calculate your ideal speed, and keep driving this speed. Sometimes, by driving a little slower, you actually are quicker at your destination, with more fuel to spare.

Existing solutions

A few intersections in the Netherlands have traffic signs that indicate what speed you need to drive to hit the next green light. Unfortunately, many people ignore these. Perhaps people don’t pay attention to them, perhaps because they don’t understand what the signs mean.

We have the same problem with traffic jams. In the Netherlands, we have so-called “matrix signs” above the road that indicate your maximum speed. If there is a traffic jam up ahead the matrix signs try to slow people down to prevent the “shockwave” traffic jams you often encounter. Unfortunately, almost everyone ignores these signs. Many people don’t understand them, and it’s not clear that abiding these signals will result in you waiting less.

Using vehicles AS infrastructure

The idea we came up with during the Hack the Road hackathon was to use autonomous/smart/connected vehicles to indicate where the green wave starts and ends. These vehicles are able to request the time-to-green and time-to-red of upcoming intersections from an API. They are also able to determine their distance to this intersection using GPS and an internal map. If they use this information to slow down or speed up to their ideal speed it would be great if other drivers could benefit from this knowledge! We proposed adding a simple light on the back of each vehicle to signal this to other users!

This photoshopped image shows how we envision our prototype: a self-driving car is driving to a red light, but indicates that if you stay behind it you don’t have to brake for the red light.

The prototype: a wifi chip and a LED strip

During the hackathon, we spent a lot of time thinking about how to signal green wave information to other drivers. We wanted to convey a simple message in a way that would both be understood and followed. We went for a LED strip and programmed a wifi chip to accept messages from our computers (I used a build similar to the one I made for my bed ).

To get time-to-green and time-to-red information we interfaced with Dynniq’s API. We looked at one intersection in the Dutch city Helmond whose information was available to us during this weekend. As data currently came in in a continuous stream we had to write our own parsers in Python that found relevant data. We also made a “mock-up” datastream we could use during a demo and which we use in the video below.

Putting a demonstration together

To demonstrate that building this product would be viable we had to show that our prototype was able to change color based on car location and live traffic information. We visualized the start and end of the green wave on a map, and let the color of our prototype change when we clicked on the map (indicating where our connected vehicle would drive).

Our tech demo is ready #hacktheroad #jointhegoldenwave pic.twitter.com/yx7fkyxLOK

— Roland Meertens (@rolandmeertens) September 16, 2017

Getting our product on the road

With our idea, prototype, and demonstration we got the second price of this hackathon! This means our project will be incubated by Dynniq: the company that provided us with the data we needed to make it work. To improve our product we have to wire the prototype directly into a car, improve the design of the lights, and think of a better way than wi-fi to receive traffic information in the car.

Perhaps even more important than building the prototype is talking to stakeholders. As we only placed second we don’t get to fly to California to talk with companies that could help us. If you read this and think you could help us with a prototype, connect us to relevant people, can provide funding, or have any questions: please send us an email! We think it’s possible to get the first units in cars by 2018!

Acknowledgements

We would like to thank the organisers of the Hackathon for all the effort they put into this event! The event was organised by the province of Noord Holland: thanks for inviting us over to come up with ideas for the roads in the Netherlands! We would also like to thank the BeMyApp staff who was really helpful during the event they set up: thank you very much Claire, Marc, and Su!

TRADR SIKS summerschool 2017

Roland Meertens — Mon, 18 Sep 2017 11:21:05 GMT

A few weeks ago I gave an introductory course to reinforcement learning with the OpenAI Gym environment. As content, I used the writeups I already put on my site several weeks ago. I asked Jasper van der Waa (TNO), who co-organized it, to write a short summary of the summer school.

For some years now it is quite clear that robots will be part of our future. However, we still have a long way to go before we have robots like those in “I, Robot” or “Ex Machina”. The research in the TRADR project is one step towards this future. More specific, one step towards a future where robots aid rescue workers right after major disasters like hurricanes and earthquakes by locating victims and identifying dangers. The TRADR SIKS Summerschool of 2017 focused on the vital issue of how we can let robots and humans work together as teammates.

TRADR; A step towards rescue robots

In the week of the summer school, several major experts addressed various issues. Such as how you design and build a robot to aid in disaster response missions with the current technology by Ivana Kruijff-Korbayová from DFKI and Robin Murphy from the Texas A&M University. One of the main issues is not that the technology is not there yet, but how to integrate all previous research that has been done on robotics in a group of robots that can be used by actual firefighters. See for example this playlist where the robots from the TRADR project are used after the earthquake in Amatrice, Italy.

Tools for robot design; sCE and Coactive Design

Matt Johnson from IHMC talked about the design methodology called Coactive Design that gives you the tools to design a working human-machine team. The talk aimed to provide the next generation of researchers in robotics with a tool that helps them to create robots that work in harmony with their human users. A similar talk by Mark Neerincx from the Delft University of Technology gave his audience a tool to do research in iterations without scope creep. This method, the situated Cognitive Engineering method (sCE), is in some ways similar to the Agile methodology in software engineering. Mark Neerincx also discussed the practicalities and difficulties of making sensible working agreements with robots and smart software agents (like the three Rules of Robotics by Asimov).

Recent developments in robotics

Both David Abbink and Pieter Jonker, from the Delft University of Technology, shared their rich experiences with building robots. David Abbink started a lab that focuses on haptic feedback and control in robotics and researches interesting methods to solve the driver-readiness problem in self-driving cars. Whilst Pieter Jonker gave us a glimpse of the future robotic roller walkers for elderly.

Teaching the next-gen roboticists

Finally, multiple workshops during the week taught many new state of the art techniques. Ranging from constructive thinking in robotics by Nanda van der Stap and others from TNO, to how to implement your own ‘rules of robotics’ without conflicts and strange behavior. Among the workshops was also the one from Roland Meertens about Deep Reinforcement Learning. It taught the basic principles of high-end reinforcement learning to the generation that will most likely develop robots such as those from “Ex Machina” and “I, Robot”.

Detecting bats by recognising their sound with Tensorflow

Roland Meertens — Wed, 02 Aug 2017 07:58:39 GMT

Last week I discovered that there are bats behind my appartment. I immediately grabbed my “bat detector”: a device that converts the ultrasound signals bats use to echolocate from an inaudible frequency range to an audible one. The name “bat detector” thus is a lie: you can use it to detect bats, but it does not detect bats itself. In this tutorial I will show you how to build a real bat detector using Tensorflow.

Unfortunately Medium does not support audio files. Go to my blog for the version WITH sounds.

Problem statement

To approach this problem I hooked up the bat detector to my laptop and recorded several clips. In a seperate Jupyter notebook I created a labeling program. This program creates “soundbites” of one second, which I classified as either containing the sound of a bat, or not containing the sound of a bat. I take the data and labels to create a classifier that can distinguish them.

Libraries to recognize sound

There are some very useful libraries I imported to be able to build a sound recognition pipeline. Obvious libraries I imported are Tensorflow, Keras, and scikit. A sound-specific library I like is librosa, which helps me load and analyze the data.

In [1]:

import random
import sys
import glob
import os
import time

import IPython
import matplotlib.pyplot as plt
from matplotlib.pyplot import specgram

import librosa
import librosa.display

from sklearn.preprocessing import normalize
import numpy as np
import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, MaxPooling2D, Dropout, Flatten

Using TensorFlow backend.

Loading sound data with Python

In the data labeling notebook we typed in labels, and saved soundbytes to the folder we typed in. By loading from these folders I can load bat sounds and non-batsound files. Depending on how many soundfiles there are loading this data can take a long time. I uploaded all files in a zipped folder to the Google Cloud Platform.

Note that this notebook itself can also be downloaded from its Git repository. Apparently sounds in a Jupyter notebook are scaled and way louder than in wordpress/medium. You might have to turn your sound up a lot!

In [2]:

# Note: SR stands for sampling rate, the rate at which my audio files were recorded and saved. 
SR = 22050 # All audio files are saved like this

def load_sounds_in_folder(foldername):
    """ Loads all sounds in a folder"""
    sounds = []
    for filename in os.listdir(foldername):
        X, sr = librosa.load(os.path.join(foldername,filename))
        assert sr == SR
        sounds.append(X)
    return sounds

## Sounds in which you can hear a bat are in the folder called "1". Others are in a folder called "0". 
batsounds = load_sounds_in_folder('labeled/1')
noisesounds = load_sounds_in_folder('labeled/0')

print("With bat: %d without: %d total: %d " % (len(batsounds), len(noisesounds), len(batsounds)+len(noisesounds)))
print("Example of a sound with a bat:")
IPython.display.display(IPython.display.Audio(random.choice(batsounds), rate=SR,autoplay=True))
print("Example of a sound without a bat:")
IPython.display.display(IPython.display.Audio(random.choice(noisesounds), rate=SR,autoplay=True))

With bat: 96 without: 1133 total: 1229 
Example of a sound with a bat:

Your browser does not support the audio element.

Example of a sound without a bat:

Your browser does not support the audio element.

Visualizing sounds with Librosa

When listening to the bats with your headphones you can hear a clear noise when one flies by. The Librosa library can perform a Fourier transform to extract the frequencies the sound is composed of.
Before building any machine learning algorithm it is very important to carefully inspect the data you are dealing with. In this case I decided to:

listen to the sounds
plot the soundwave
plot the spectogram (a visual representation of the amplitude of frequencies through time).

In [3]:

def get_short_time_fourier_transform(soundwave):
    return librosa.stft(soundwave, n_fft=256)

def short_time_fourier_transform_amplitude_to_db(stft):
    return librosa.amplitude_to_db(stft)

def soundwave_to_np_spectogram(soundwave):
    step1 = get_short_time_fourier_transform(soundwave)
    step2 = short_time_fourier_transform_amplitude_to_db(step1)
    step3 = step2/100
    return step3

def inspect_data(sound):
    plt.figure()
    plt.plot(sound)
    IPython.display.display(IPython.display.Audio(sound, rate=SR))
    a = get_short_time_fourier_transform(sound)
    Xdb = short_time_fourier_transform_amplitude_to_db(a)
    plt.figure()
    plt.imshow(Xdb)
    plt.show()
    print("Length per sample: %d, shape of spectogram: %s, max: %f min: %f" % (len(sound), str(Xdb.shape), Xdb.max(), Xdb.min()))

inspect_data(batsounds[0])
inspect_data(noisesounds[0])

Your browser does not support the audio element.

Length per sample: 22050, shape of spectogram: (129, 345), max: -22.786959 min: -100.000000

Your browser does not support the audio element.

Length per sample: 22050, shape of spectogram: (129, 345), max: -58.154167 min: -100.000000

Data analysis

First of all it’s important to note that the data we are dealing with is not exactly big data… With only around 100 positive samples, deep neural networks are very likely to overfit on this daa. A problem we are dealing with is that it is easy to gather negative samples (just record a whole day without bats) and difficult to gather positive samples (bats are only here for about 15–20 minutes a day, and I need to manually label data). The low amount of positive samples is something we take into consideration when determining how we are going to classify the data.

Audio signal

As we can see above the amplitude of the signal is low with the noise, while the signal has high amplitudes. However, this does not mean that everything with a sound in it is a bat. At this frequency you also pick up other noises, such as rubbing your fingers together or telephone signals.
I decided to put every negative signal onto one big “negative” pile, combining telephone signals, finger-induced noise, and other stuff in one big pile.

Spectrogram

I was hoping the see the exact frequency bats produce back in our spectogram. Unfortunately it looks like my sensor picks it up as noise over ALL frequencies. Looking at the spectrogram you can still see a clear difference between bat-sound and noise. My first attempt was to use this spectrogram as input for a convolutional neural network. Unfortunately, using only a few positive samples, it was very difficult to train this network. I thus gave up on this approach.

In the end I decided to go with a “metadata approach”. I divide every second of sound in 22 parts. For each part I determine the max, min, mean, standard deviation, and max-min of the sample. The reason I take this approach is because the “bat signals” DO clearly show up as a not of high-amplitude signals in the audio visualisation. By analyzing different parts of the audio signal, I can find out if multiple parts of the signal have certain features (such as a high standard deviation), and thus detect a bat call.

In [4]:

WINDOW_WIDTH = 10
AUDIO_WINDOW_WIDTH = 1000 # With sampling rate of 22050 we get 22 samples for our second of audio
def audio_to_metadata(audio):
    """ Takes windows of audio data, per window it takes the max value, min value, mean and stdev values"""
    features = []
    for start in range(0,len(audio)-AUDIO_WINDOW_WIDTH,AUDIO_WINDOW_WIDTH):
        subpart = audio[start:start+AUDIO_WINDOW_WIDTH]
        maxval = max(subpart)
        minval = min(subpart)
        mean = np.mean(subpart)
        stdev = np.std(subpart)
        features.extend([maxval,minval,mean,stdev,maxval-minval])
    return features

metadata = audio_to_metadata(batsounds[0])
print(metadata)
print(len(metadata))

[0.00088500977, -0.00076293945, 6.7962646e-05, 0.00010915515, 0.0016479492, 0.0002746582, 3.0517578e-05, 0.00017904663, 5.4772983e-05, 0.00024414062, 0.00057983398, -0.00057983398, -2.8137207e-05, 8.1624778e-05, 0.001159668, -9.1552734e-05, -0.0002746582, -0.00019345093, 3.922523e-05, 0.00018310547, 0.00048828125, -0.00076293945, -0.00036187744, 0.00015121402, 0.0012512207, -3.0517578e-05, -0.00057983398, -0.00027001952, 0.00015006117, 0.00054931641, 0.00045776367, -0.00036621094, 5.9234619e-05, 5.0381914e-05, 0.00082397461, 0.00015258789, 6.1035156e-05, 0.00011447143, 1.7610495e-05, 9.1552734e-05, 0.00015258789, 6.1035156e-05, 9.3963623e-05, 1.8880468e-05, 9.1552734e-05, 0.00082397461, -0.00048828125, 7.7423094e-05, 8.6975793e-05, 0.0013122559, 0.00021362305, 6.1035156e-05, 0.00014205933, 2.5201958e-05, 0.00015258789, 0.00054931641, -0.00061035156, 2.8991699e-05, 9.5112577e-05, 0.001159668, -3.0517578e-05, -0.00018310547, -0.00010638428, 2.9584806e-05, 0.00015258789, 3.0517578e-05, -9.1552734e-05, -2.7862548e-05, 2.323009e-05, 0.00012207031, 6.1035156e-05, -3.0517578e-05, 1.8341065e-05, 1.905331e-05, 9.1552734e-05, 0.00018310547, -0.00039672852, 4.9438477e-05, 4.7997077e-05, 0.00057983398, 0.00021362305, 9.1552734e-05, 0.00017184448, 2.1811828e-05, 0.00012207031, 0.00015258789, -6.1035156e-05, 5.0659179e-05, 4.6846228e-05, 0.00021362305, 0.0, -0.00015258789, -5.4656983e-05, 2.7488175e-05, 0.00015258789, -3.0517578e-05, -0.00012207031, -9.0820315e-05, 1.7085047e-05, 9.1552734e-05, 0.0, -0.00012207031, -7.2296141e-05, 1.917609e-05, 0.00012207031, 0.0, -9.1552734e-05, -4.4189452e-05, 1.8292634e-05, 9.1552734e-05]
110

Data management

As with every machine learning project it’s important to make an input-output pipeline. We defined functions to get “metadata” from our sound files: we can make audio spectograms, and simply take multiple samples of meta-features in the audio data. The next step is to map our preprocessing function to our training and test data. I first apply a preprocessing step to each audio sample, and keep the bat and nonbat sounds in two different lists. Later I join the sounds and labels.

In this case we are dealing with few “positive” samples, and a lot of negative samples. In such a case it’s a really good idea to normalise all your data. My positive samples will probably differ from the normal distribution, and will be easy to detect. To do this I use the scikit learn sklearn.preprocessing function “normalize”. During training I found out that my idea of standardization and normalization are exactly opposite of the scikit definitions. In this case this probably won’t be a problem, as normalizing a bat sound probably still yields a different result than normalizing a noise sound.

In [5]:

# Meta-feature based batsounds and their labels
preprocessed_batsounds = list()
preprocessed_noisesounds = list()

for sound in batsounds:
    expandedsound = audio_to_metadata(sound)
    preprocessed_batsounds.append(expandedsound)
for sound in noisesounds:
    expandedsound = audio_to_metadata(sound)
    preprocessed_noisesounds.append(expandedsound)

labels = [0]*len(preprocessed_noisesounds) + [1]*len(preprocessed_batsounds)
assert len(labels) == len(preprocessed_noisesounds) + len(preprocessed_batsounds)
allsounds = preprocessed_noisesounds + preprocessed_batsounds
allsounds_normalized = normalize(np.array(allsounds),axis=1)
one_hot_labels = keras.utils.to_categorical(labels)
print(allsounds_normalized.shape)
print("Total noise: %d total bat: %d total: %d" % (len(allsounds_normalized), len(preprocessed_batsounds), len(allsounds)))

## Now zip the sounds and labels, shuffle them, and split into a train and testdataset
zipped_data = zip(allsounds_normalized, one_hot_labels)
np.random.shuffle(zipped_data)
random_zipped_data = zipped_data
VALIDATION_PERCENT = 0.8 # use X percent for training, the rest for validation
traindata = random_zipped_data[0:int(VALIDATION_PERCENT*len(random_zipped_data))]
valdata = random_zipped_data[int(VALIDATION_PERCENT*len(random_zipped_data))::]
indata = [x[0] for x in traindata]
outdata = [x[1] for x in traindata]
valin = [x[0] for x in valdata]
valout = [x[1] for x in valdata]

(1229, 110)
Total noise: 1229 total bat: 96 total: 1229

Machine learning model

To detect the bats I decided to try a very simple neural network with three hidden layers. With too little trainable parameters the network can only make a distinction between no-sound and sound. With too many trainable parameters the network will easily overfit on the small dataset we have.

I decided to implement this network in Keras, this libary gives me the best functions to easily try different neural network architectures on this simple problem.

In [6]:

LEN_SOUND = len(preprocessed_batsounds[0])
NUM_CLASSES = 2 # Bat or no bat

model = Sequential()
model.add(Dense(128, activation='relu',input_shape=(LEN_SOUND,)))
model.add(Dense(32, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(2))
model.compile(loss="mean_squared_error", optimizer='adam', metrics=['mae','accuracy'])
model.summary()
model.fit(np.array(indata), np.array(outdata), batch_size=64, epochs=10,verbose=2, shuffle=True) 
valresults = model.evaluate(np.array(valin), np.array(valout), verbose=0)
res_and_name = zip(valresults, model.metrics_names)
for result,name in res_and_name: 
    print("Validation " + name + ": " + str(result))

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 128)               14208     
_________________________________________________________________
dense_2 (Dense)              (None, 32)                4128      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_4 (Dense)              (None, 2)                 66        
=================================================================
Total params: 19,458
Trainable params: 19,458
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
0s - loss: 0.2835 - mean_absolute_error: 0.4101 - acc: 0.9237
Epoch 2/10
0s - loss: 0.0743 - mean_absolute_error: 0.1625 - acc: 0.9237
Epoch 3/10
0s - loss: 0.0599 - mean_absolute_error: 0.1270 - acc: 0.9237
Epoch 4/10
0s - loss: 0.0554 - mean_absolute_error: 0.1116 - acc: 0.9237
Epoch 5/10
0s - loss: 0.0524 - mean_absolute_error: 0.1071 - acc: 0.9237
Epoch 6/10
0s - loss: 0.0484 - mean_absolute_error: 0.1024 - acc: 0.9237
Epoch 7/10
0s - loss: 0.0436 - mean_absolute_error: 0.1036 - acc: 0.9329
Epoch 8/10
0s - loss: 0.0375 - mean_absolute_error: 0.0983 - acc: 0.9481
Epoch 9/10
0s - loss: 0.0327 - mean_absolute_error: 0.0923 - acc: 0.9624
Epoch 10/10
0s - loss: 0.0290 - mean_absolute_error: 0.0869 - acc: 0.9644
Validation loss: 0.0440898474639
Validation mean_absolute_error: 0.101937913192
Validation acc: 0.930894308458

Results and implementation of detection pipeline

With an accuracy of 95 percent on the validation set it looks like we are doing really well. The next step is checking if we can any bats in a longer piece of audio we never processed before. I took a recording I made after the bats were pretty much gone, let’s see if we can find any:

In [7]:

soundarray, sr = librosa.load("batsounds/bats9.m4a")
maxseconds = int(len(soundarray)/sr)
for second in range(maxseconds-1):
    audiosample = np.array(soundarray[second*sr:(second+1)*sr])
    metadata = audio_to_metadata(audiosample)
    testinput = normalize(np.array([metadata]),axis=1)
    prediction = model.predict(testinput)

    if np.argmax(prediction) ==1:
        IPython.display.display(IPython.display.Audio(audiosample, rate=sr,autoplay=True))
        time.sleep(2)
        print("Detected a bat at " + str(second) + " out of " + str(maxseconds) + " seconds")
        print(prediction)

Your browser does not support the audio element.

Detected a bat at 514 out of 669 seconds
[[ 0.45205975  0.50231218]]

Conclusion, and similar projects

In the end my sensor detected 1 bat at a time when there was probably no bat outside (but I can’t verify this) in 26 minutes of audio. I will conclude that my program works! Now we are able to integrate this program in a small pipeline to warn me whenever there is a bat outside, or we can make a recording every day and measure the bat activity day to day.

While working on this project the Nature Smart Cities project created the Bats London project. Per sensor you can see the bat activity. Also interesting is that their sensor is able to capture way more interesting sounds, such as this social call made by a bat. It is great to see others are also interested in this subject, and it’s great to compare approaches. The bats London project built nice boxes with a computer in it that does all processing based on a spectogram. They use convolutional neural networks based on 3-second sound files they record every 6 seconds. In the future they even want to start to make a distinction between different species of bats! They did a great job with a very interesting project!

Detecting bats by recognising their sound with Tensorflow was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

OpenAI Universe part 3: playing Space Invaders with deep reinforcement learning

Roland Meertens — Mon, 31 Jul 2017 09:49:22 GMT

OpenAI Gym part 3: playing Space Invaders with deep reinforcement learning

In part 1 we got to know the openAI Gym environment, and in part 2 we explored deep q-networks. We implemented a simple network that, if everything went well, was able to solve the Cartpole environment. Atari games are more fun than the CartPole environment, but are also harder to solve. This session is dedicated to playing Atari with deep reinforcement learning.

A first warning before you are disappointed is that playing Atari games is more difficult than cartpole, and training times are way longer. This is the reason we toyed around with CartPole in the previous session.

In this session I will show how you can use OpenAI gym to replicate the paper Playing Atari with Deep Reinforcement Learning. A video of a Breakout playing robot can be found on Youtube, as well as a video of a Enduro playing robot. Demis Hassabis, the CEO of DeepMind, can explain what happend in their experiments in a very entertaining way.

A big difference between the CartPole and Atari task is that the Atari environment gives you the raw pixels as observation. Instead of 4 variables you are now dealing with $latex 210 * 160 * 3 = 100.800$variables as input. The network you build in part 2 is not going to play very well. This means you can either improve your network yourself, or you can replicate the DeepMind layout. This session is only dedicated to showing what the DeepMind network is able to do.

Flood Sung was able to put the network in Tensorflow and put the code on GitHub. I downloaded his network architecture, updated it to the latest Tensorflow version, changed some parameters and added it to the Git repository of this summerschool session.

This tutorial has dependencies on Tensorflow, OpenCV, OpenAI Gym, and some other things. Just as with part 1 and 2 the best thing to do is run this code using Docker. Run the following command to download my prepared docker image and navigate to http://localhost:8888 to view your Jupyter notebook.

docker run -p 8888:8888 -v rmeertens/tensorflowgym

In [1]:

%matplotlib inline
import matplotlib.pyplot as plt

from ipywidgets import widgets
from IPython.display import display

from matplotlib import animation
from JSAnimation.IPython_display import display_animation
from time import gmtime, strftime
import random
import cv2
import sys
from BrainDQN_Nature import *
import numpy as np 

import gym


env = gym.make('SpaceInvaders-v0')
env.reset()
actions = env.action_space.n
brain = BrainDQN(actions)

[2017-07-11 13:46:00,813] Making new env: SpaceInvaders-v0

dimension: 3136
dimension: 3136
Successfully loaded: ./savedweights/network-dqn-7580000

Image preprocessing

As mentioned above we are dealing with $latex 210 * 160 * 3 = 100.800$variables. The authors of the Playing Atari with DRL solve this by turning the image to grayscale, resizing to 84 x 110, and removing the first 26 rows as they only contain the score. This gives you $latex 84 * 84 = 7.056$variables per image.

Unfortunately, you need to have a sense of time for some Atari games. For example, what is happening in this image? Is the ball going up? Going down? Left or right? That’s why we concatenate the last four “images” of 84x84 to get an 84x84x4 image as input (which is $latex 84*84*4=28.224$input variables for our neural network.

In [2]:

def preprocess(observation):
    observation = cv2.cvtColor(cv2.resize(observation, (84, 110)), cv2.COLOR_BGR2GRAY)
    observation = observation[26:110,:]
    ret, observation = cv2.threshold(observation,1,255,cv2.THRESH_BINARY)
    return np.reshape(observation,(84,84,1))


action0 = 0  # do nothing
observation0, reward0, terminal, info = env.step(action0)
print("Before processing: " + str(np.array(observation0).shape))
plt.imshow(np.array(observation0))
plt.show()
observation0 = preprocess(observation0)
print("After processing: " + str(np.array(observation0).shape))
plt.imshow(np.array(np.squeeze(observation0)))
plt.show()

brain.setInitState(observation0)
brain.currentState = np.squeeze(brain.currentState)

Before processing: (210, 160, 3)

After processing: (84, 84, 1)

Network layout

Open the file BrainDQN_Nature.py and take a look at the function createQNetwork. You will see that this network consists of:

3 convolational layers
2 fully connected layers

The convolutional layers might be new to you. The best way to learn about them is by taking a look at the Udacity course “Deep Learning”, or if you quickly want to know what a conv layer is, watch this video.

Also note that this implementation uses a target network (discussed in part 2) that is regularly updated.

Learning

Most interesting things happen in the BrainDQN_Nature.py script. We ask the brain for an action, process the new observation, and give this back to the brain. This means we only have to use a few lines to start the learning of the network.

Note that learning can take a very long time. This script is set to run forever, so start it in the evening and see what the network learned in the morning!

In [1]:

while True:
    action = brain.getAction()
    actionmax = np.argmax(np.array(action))
    
    nextObservation,reward,terminal, info = env.step(actionmax)
    
    if terminal:
        nextObservation = env.reset()
    nextObservation = preprocess(nextObservation)
    brain.setPerception(nextObservation,action,reward,terminal)

Evaluation

After you let your network train for some hours, interrupt the python kernel and run the following script.
It is important to set the epsilon value of the brain to a low value (0.0 or 0.1), otherwise your brain might keep performing random actions…

In [8]:

def display_frames_as_gif(frames, filename_gif = None):
    """
    Displays a list of frames as a gif, with controls
    """
    plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
    patch = plt.imshow(frames[0])
    plt.axis('off')

    def animate(i):
        patch.set_data(frames[i])

    anim = animation.FuncAnimation(plt.gcf(), animate, frames = len(frames), interval=50)
    if filename_gif: 
        anim.save(filename_gif, writer = 'imagemagick', fps=20)
    display(display_animation(anim, default_mode='loop'))

    
frameshistory = []
observation = env.reset()
backupepsilon = brain.epsilon

brain.epsilon = 0.2

for _ in range(150):
    action = brain.getAction()
    
    #print(action)
    actionmax = np.argmax(np.array(action))
    
    nextObservation,reward,terminal, info = env.step(actionmax)
    if terminal:
        nextObservation = env.reset()
    frameshistory.append(nextObservation)
    nextObservation = preprocess(nextObservation)
    brain.setPerception(nextObservation,action,reward,terminal)
brain.epsilon = backupepsilon
    
display_frames_as_gif(frameshistory, 'playing_space_invaders.gif')

Exercises

This session you were handed the network layout and training methods described in the paper Playing Atari with Deep Reinforcement Learning.

Team up

Humans are very good at learning to play these Atari games. Once I learned aliens kill me with bullets and I got points for killing each alien I was quickly getting many points in the game. Unfortunately our programmed brain is terrible at learning how to play the game (it needs a day, and a lot of frames, before it is able to consistently avoid bullents and kill aliens). Part of the problem is that it is difficult to estimate a reward from observations of random behaviour. The agent would be able to make a better guess of the Q-value if it would be fed with a well-played game of space invaders. The exercises I though would be interesting are:

Record yourself or a friend playing a game of Space Invaders and save the observations, actions, and rewards in a replay memory (save this memory for later use). Use this game as initial replay memory of the agent.
Record a fully trained agent playing a game of space Invaders and save the observations, actions, and rewards in a replay memory (save this memory for later use). Use this game as initial replay memory of a new agent you are going to train. Think of ways to evaluate how well these agents are doing compared to agents initialised on “random” experiences and see what agents are better.
During the first episodes the reply memory of the agent is filled with many useless episodes of the agent only moving around, hitting nothing, getting no reward, etc. Perhaps we can increase the speed of learning by selecting episodes we deem “useful” for the agent. You can either do this by designing a “usefulness heuristic” for episodes. For example: the last X frames before getting a reward are something that should be learned really well, as this apparently was a good move. You can also show few-second videos of users and ask them if these frames show “good” behaviour of the agent. If not: why not remove this nasty memory from his memory?

Transfer knowledge

Humans are very good at transferring knowledge from one domain to another. Unfortunately, our agent is not that good at this.

Try training and agent on one game, and try to see how long it takes for this agent to learn to play another domain. If this topic interests you, take a look at how Deepmind improved their agent

Acknowledgments

This blogpost is the first part of my TRADR summerschool workshop on using human input in reinforcement learning algorithms. More information can be found on their homepage

Introduction to OpenAI gym part 2: building a deep q-network

Roland Meertens — Mon, 17 Jul 2017 17:25:50 GMT

In part 1 we used a random search algorithm to “solve” the cartpole environment. This time we are going to take things to the next level and implement a deep q-network.The OpenAI gym environment is one of the most fun ways to learn more about machine learning. Especially reinforcement learning and neural networks can be applied perfectly to the benchmark and Atari games collection that is included.Every environment has multiple featured solutions, and often you can find a writeup on how to achieve the same score. By looking at others approaches and ideas you can improve yourself quickly in a fun way.

In part 1 we introduced the Gym environment, and looked at a “random search” algorithm. Hopefully you were able to add something to this algorithm, and got some more experience with OpenAI Gym. In part two we are going to take a look at reinforcement learning algorithms, specifically the deep q-networks that are all the hype lately.

Background

Q-learning is a reinforcement learning technique that tries to predict the reward of a state-action pair. For the cartpole environment the state consists of four values, and there are two possible actions. For a certain state S we can predict the reward if we were to push left Q(S,left) or right Q(S,right).

In the Atari game environment you get a reward of 1 every time you score a point. This scoring can happen when you hit a block in breakout, an alien in Space Invaders, or eat a pallet in Pacman. In the cartpole environment you get a reward every time the pole is standing on the cart (which is: every frame). The trick of q-learning is that it not only considers the direct reward, but also the expected future reward. After applying action a we enter state S_{t+1} and take the following into account:

The reward r we obtained by performing this action

The expected maximum reward Q(S{t+1},a), in the cartpole environment this is max(Q(S_{t+1},left), Q(S_{t+1},right)

We combine this into a neat formula where say that the predicted value should be r in a

Where 𝜸 is the discount factor. Taking a small 𝜸 (for example 0.2) means that you don’t really care about long-term rewards, a large 𝜸 (0.95) means that you care a lot about the long-term rewards. In our case we do care a lot about long-term rewards, so we take a large 𝜸.

This notebook can be found in my prepared docker environment. If you did not install Docker yet, make sure you do this. To run this environment type this in your terminal:

docker run -p 8888:8888 rmeertens/tensorflowgym

Then navigate to localhost:8888 and navigate to the TRADR folder.

Let’s apply our knowledge of q-learning on the same environment we tried last time: the CartPole environment.

%matplotlib notebook
from time import gmtime, strftime
import threading
import time

import numpy as np
import matplotlib.pyplot as plt

from ipywidgets import widgets
from IPython.display import display
import tensorflow as tf
import gym
from gym import wrappers
import random

from matplotlib import animation
from JSAnimation.IPython_display import display_animation

env = gym.make('CartPole-v0')
observation = env.reset()

Value approximation

There are many ways in which you can estimate the Q-value for each (state,action) pair. Neural networks have been really popular the last couple of years, so we are going to estimate the Q-value using a neural network.

We will build our network in Tensorflow: an open-source libary for machine-learning. If you are not familiar with Tensorflow, the most important thing to know is that we will fist build our network, then initialise it and use it. All python variables are “placeholders” in a session. You can find more information on the Tensorflow homepage

I created a very simple network layout with four inputs (the four variables we observe) and two outputs (either push left or right). I added four fully connected layers:

From 4 to 16 variables
From 16 to 32 variables
From 32 to 8 variables
From 8 to 2 variables

Every layer is a dense layer with a RELU nonlinearity except for the last layer as this one has to predict the expected Q-value.

# Network input
networkstate = tf.placeholder(tf.float32, [None, 4], name="input")
networkaction = tf.placeholder(tf.int32, [None], name="actioninput")
networkreward = tf.placeholder(tf.float32,[None], name="groundtruth_reward")
action_onehot = tf.one_hot(networkaction, 2, name="actiononehot")

# The variable in our network: 
w1 = tf.Variable(tf.random_normal([4,16], stddev=0.35), name="W1")
w2 = tf.Variable(tf.random_normal([16,32], stddev=0.35), name="W2")
w3 = tf.Variable(tf.random_normal([32,8], stddev=0.35), name="W3")
w4 = tf.Variable(tf.random_normal([8,2], stddev=0.35), name="W4")
b1 = tf.Variable(tf.zeros([16]), name="B1")
b2 = tf.Variable(tf.zeros([32]), name="B2")
b3 = tf.Variable(tf.zeros([8]), name="B3")
b4 = tf.Variable(tf.zeros(2), name="B4")

# The network layout
layer1 = tf.nn.relu(tf.add(tf.matmul(networkstate,w1), b1), name="Result1")
layer2 = tf.nn.relu(tf.add(tf.matmul(layer1,w2), b2), name="Result2")
layer3 = tf.nn.relu(tf.add(tf.matmul(layer2,w3), b3), name="Result3")
predictedreward = tf.add(tf.matmul(layer3,w4), b4, name="predictedReward")

# Learning 
qreward = tf.reduce_sum(tf.multiply(predictedreward, action_onehot), reduction_indices = 1)
loss = tf.reduce_mean(tf.square(networkreward - qreward))
tf.summary.scalar('loss', loss)
optimizer = tf.train.RMSPropOptimizer(0.0001).minimize(loss)
merged_summary = tf.summary.merge_all()

Session management and Tensorboard

Now we start the session. I added support for Tensorboard: a nice tool to visualise your learning. At the moment I only added one summary: the loss of the network.
If you did not install Docker yet, make sure you do this. To run tensorboard you have to run:

docker run -p 6006:6006 -v $(pwd):/mounted rmeertens/tensorboard

Then navigate to localhost:6006 to see your eTnsorboard.

sess = tf.InteractiveSession()
summary_writer = tf.summary.FileWriter('trainsummary',sess.graph)
sess.run(tf.global_variables_initializer())

Learning Q(S,a)

An interesting paper you can use as guideline for deep q-networks is “Playing Atari with Deep Reinforcement Learning (https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf). This paper by deepmind explains how they were able to teach a neural network to play Atari games.

One of the main contributions of this paper is their use of an “experience replay mechanism”. If you were to train your neural network in the order of images you see normally the network quickly forgets what it saw before. To fix this we save what we saw in a memory with the following variables:

(S, action, reward, is terminal, S_{t+1})

Now every frame we sample a random minibatch of our memory and train our network on that. We also only keep the newer experiences to keep our memory fresh with good actions. The full algorithm in their paper looks like this:

In [ ]:

replay_memory = [] # (state, action, reward, terminalstate, state_t+1)
epsilon = 1.0
BATCH_SIZE = 32
GAMMA = 0.9
MAX_LEN_REPLAY_MEMORY = 30000
FRAMES_TO_PLAY = 300001
MIN_FRAMES_FOR_LEARNING = 1000
summary = None

for i_epoch in range(FRAMES_TO_PLAY):
    
    ### Select an action and perform this
    ### EXERCISE: this is where your network should play and try to come as far as possible!
    ### You have to implement epsilon-annealing yourself
    action = env.action_space.sample() 
    newobservation, reward, terminal, info = env.step(action)

    ### I prefer that my agent gets 0 reward if it dies
    if terminal: 
        reward = 0
        
    ### Add the observation to our replay memory
    replay_memory.append((observation, action, reward, terminal, newobservation))
    
    ### Reset the environment if the agent died
    if terminal: 
        newobservation = env.reset()
    observation = newobservation
    
    ### Learn once we have enough frames to start learning
    if len(replay_memory) > MIN_FRAMES_FOR_LEARNING: 
        experiences = random.sample(replay_memory, BATCH_SIZE)
        totrain = [] # (state, action, delayed_reward)
        
        ### Calculate the predicted reward
        nextstates = [var[4] for var in experiences]
        pred_reward = sess.run(predictedreward, feed_dict={networkstate:nextstates})
        
        ### Set the "ground truth": the value our network has to predict:
        for index in range(BATCH_SIZE):
            state, action, reward, terminalstate, newstate = experiences[index]
            predicted_reward = max(pred_reward[index])
            
            if terminalstate:
                delayedreward = reward
            else:
                delayedreward = reward + GAMMA*predicted_reward
            totrain.append((state, action, delayedreward))
            
        ### Feed the train batch to the algorithm 
        states = [var[0] for var in totrain]
        actions = [var[1] for var in totrain]
        rewards = [var[2] for var in totrain]
        _, l, summary = sess.run([optimizer, loss, merged_summary], feed_dict={networkstate:states, networkaction: actions, networkreward: rewards})


        ### If our memory is too big: remove the first element
        if len(replay_memory) > MAX_LEN_REPLAY_MEMORY:
                replay_memory = replay_memory[1:]

        ### Show the progress 
        if i_epoch%100==1:
            summary_writer.add_summary(summary, i_epoch)
        if i_epoch%1000==1:
            print("Epoch %d, loss: %f" % (i_epoch,l))

Testing the algorithm

Now we have a trained network that gives use the expected Q(s,a) for a certain state. We can use this to balance the stick (and see how long it lasts) and see what the network predicts at each frame:

In [ ]:

def display_frames_as_gif(frames, filename_gif = None):
    """
    Displays a list of frames as a gif, with controls
    """
    plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
    patch = plt.imshow(frames[0])
    plt.axis('off')

    def animate(i):
        patch.set_data(frames[i])

    anim = animation.FuncAnimation(plt.gcf(), animate, frames = len(frames), interval=50)
    if filename_gif: 
        anim.save(filename_gif, writer = 'imagemagick', fps=20)
    display(display_animation(anim, default_mode='loop'))

### Play till we are dead
observation = env.reset()
term = False
predicted_q = []
frames = []
while not term:
    rgb_observation = env.render(mode = 'rgb_array')
    frames.append(rgb_observation)
    pred_q = sess.run(predictedreward, feed_dict={networkstate:[observation]})
    predicted_q.append(pred_q)
    action = np.argmax(pred_q)
    observation, _, term, _ = env.step(action)
    
### Plot the replay!
display_frames_as_gif(frames,filename_gif='dqn_run.gif')

Result

During this run on my pc the robot learns to compensate… but then overcompensates and dies. We can plot the Q-value the robot expected for each action(left or right) during the training.

In [ ]:

plt.plot([var[0] for var in predicted_q])
plt.legend(['left', 'right'])
plt.xlabel("frame")
plt.ylabel('predicted Q(s,a)')

Handling difficult situations — team up with your robot

You can see in the graph above that our q-function, without the final mistake it made, has a good idea how well it is doing. At moments the pole is going sideways the maximum expected reward lowers. This is a good moment to team up with your robot and guide him when he is in trouble.

Collaborating is easy: if your robot does not know what to do, we can ask the user to provide input. The initial state the robot is in gives us a lot of information: Q(S,a) tells us how much reward the robot expects for the next frames of its run. If during execution of the robots strategy the maximum expected Q drops a bit below this number we can interpret this as the robot being in a dire situation. We then ask for the user to say if the cart should move left or right.

Note that in the graph above the agent died, even though it expected a lot of reward. This method is not foolproof, but does help the agent to survive longer.

In [ ]:

%matplotlib inline
plt.ion()
observation = env.reset()

### We predict the reward for the initial state, if we are slightly below this ideal reward, let the human take over. 
TRESHOLD = max(max(sess.run(predictedreward, feed_dict={networkstate:[observation]})))-0.2
TIME_DELAY = 0.5 # Seconds between frames 
terminated = False
while not terminated:
    ### Show the current status
    now = env.render(mode = 'rgb_array')
    plt.imshow(now)
    plt.show()

    ### See if our agent thinks it is safe to move on its own
    pred_reward = sess.run(predictedreward, feed_dict={networkstate:[observation]})
    maxexpected = max(max(pred_reward))
    if maxexpected > TRESHOLD: 
        action = np.argmax(pred_reward)
        print("Max expected: " + str(maxexpected))
        time.sleep(TIME_DELAY)
    else:
        ### Not safe: let the user select an action!
        action = -1
        while action < 0:
            try:
                action = int(raw_input("Max expected: " + str(maxexpected) + " left (0) or right(1): "))
                print("Performing: " + str(action))
            except:
                pass
    
    ### Perform the action
    observation, _, terminated, _ = env.step(action)

print("Unfortunately, the agent died...")

Exercises

Now that you and your neural network can balance a stick there are many things you can do to improve. As everyones skills are different I wrote down some ideas you can try:

Machine learning starter:

Improve the neural network. You can toy around with layers (size, type), tune the hyperparameters, or many more.
Toy around with the value of gamma, visualise for several values what kind of behaviour the agent will exercise. Is the agent more careful with a higher gamma value?

Tensorflow starter:

If you don’t have a lot of experience you can either try to improve the neural network, or you can experiment with the Tensorboard tool. Try to add plots of the average reward during training. If you implemented epsilon-greedy exploration this number should go up during training.

Reinforcement learning starter:

Because our agent only performs random actions our network dies pretty often during training. This means that it has a good idea what to do in its start configurations, but might have a problem when it survived for a longer time. Epsilon-greedy exploration prevents this. With this method you roll a die: with probability epsilon you take a random action, otherwise you take the action the agent thinks is best. You can either set epsilon to a specific value (0.25? 0.1?) or gradually take a lower value to encourage exploration.
Team up with your agent! We already help our agent when he thinks he is in a difficult situation, we could also let it ask for help during training. By letting the agent ask for help with probability epsilon you explore the state space in a way that makes more sense than random exploration, and this will give you a better agent.

Reinforcement learning itermediate:

Right now we only visualise the loss, which is no indication for how good the network is. According to the paper Playing Atari with Deep Reinforcement Learning the average expected Q should go up during learning (in combination with epsilon-greedy exploration).
Artur Juliani suggests that you can use a target network. During training your network is very “unstable”, it “swings” in all directions which can take a long time to converge. You can add a second neural network (exactly the same layout as the first one) that calculates the predicted reward. During training, every X frames, you set the weights of your target network equal to the weights of your other network.

Conclusion

In part two we implemented a deep q-network in Tensorflow, and used it to control a cartpole. We saw that the network can “know” when it has problems, and then teamed up with our agent to help him out. Hopefully you enjoyed working with neural networks, the OpenAI gym, and working together with your agent.

Initially I wanted to dive into the Atari game environments and skip the CartPole environment for the deep q-networks. Unfortunately, training takes too long (24 hours) before the agent is capable of exercising really cool moves. As I still think it is a lot of fun to learn how to play Atari games I made a third part with some exercises you can take a look at.

Acknowledgments

This blogpost is the first part of my TRADR summerschool workshop on using human input in reinforcement learning algorithms. More information can be found on their homepage.

Getting started with OpenAI gym

Roland Meertens — Tue, 11 Jul 2017 20:52:45 GMT

The OpenAI gym environment is one of the most fun ways to learn more about machine learning. Especially reinforcement learning and neural networks can be applied perfectly to the benchmark and Atari games collection that is included. Every environment has multiple featured solutions, and often you can find a writeup on how to achieve the same score. By looking at others approaches and ideas you can improve yourself quickly in a fun way.I noticed that getting started with Gym can be a bit difficult. Although there are many tutorials for algorithms online, the first step is understanding the programming environment in which you are working. To easy new people into this environment I decided to make a small tutorial with a docker container and a jupyter notebook.

What you need

Before you get started, install Docker. Docker is a tool that lets you run virtual machines on your computer. I created an “image” that contains several things you want to have: tensorflow, the gym environment, numpy, opencv, and some other useful tools.

After you installed Docker, run the following command to download my prepared docker image:

docker run -p 8888:8888 rmeertens/tensorflowgym

In your browser, navigate to: localhost:8888 and open the OpenAI Universe notebook in the TRADR folder.

Play a game yourself

Let’s start by playing the cartpole game ourselves. You control a bar that has a pole on it. The goal of the “game” is to keep the bar upright as long as possible. There are two actions you can perform in this game: give a force to the left, or give a force to the right. To play this game manually, execute the first part of the code.

By clicking left and right you apply a force, and you see the new state. Note that I programmed the game to automatically reset when you “lost” the game.

%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt

from ipywidgets import widgets
from IPython.display import display

import gym

from matplotlib import animation
from JSAnimation.IPython_display import display_animation



def leftclicked(something):
    """ Apply a force to the left of the cart"""
    onclick(0)

def rightclicked(something):
    """ Apply a force to the right of the cart"""
    onclick(1)
    
def display_buttons():
    """ Display the buttons you can use to apply a force to the cart """
    left = widgets.Button(description="<")
    right = widgets.Button(description=">")
    display(left, right)
    
    left.on_click(leftclicked)
    right.on_click(rightclicked)

# Create the environment and display the initial state
env = gym.make('CartPole-v0')
observation = env.reset()
firstframe = env.render(mode = 'rgb_array')
fig,ax = plt.subplots()
im = ax.imshow(firstframe) 

# Show the buttons to control the cart
display_buttons()


# Function that defines what happens when you click one of the buttons
frames = []
def onclick(action):
    global frames
    observation, reward, done, info = env.step(action)
    frame = env.render(mode = 'rgb_array')
    im.set_data(frame)
    frames.append(frame)
    if done:
        env.reset()

Replay

Now that you toyed around you probably want to see a replay. Every button click we saved the state of the game, which you can display in your browser:

def display_frames_as_gif(frames, filename_gif = None):
    """
    Displays a list of frames as a gif, with controls
    """
    plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
    patch = plt.imshow(frames[0])
    plt.axis('off')

    def animate(i):
        patch.set_data(frames[i])

    anim = animation.FuncAnimation(plt.gcf(), animate, frames = len(frames), interval=50)
    if filename_gif: 
        anim.save(filename_gif, writer = 'imagemagick', fps=20)
    display(display_animation(anim, default_mode='loop'))

display_frames_as_gif(frames, filename_gif="manualplay.gif")

Representation

The cartpole environment is described on the OpenAI website. The values in the observation parameter show position (x), velocity (x_dot), angle (theta), and angular velocity (theta_dot). If the pole has an angle of more than 15 degrees, or the cart moves more than 2.4 units from the center, the game is “over”. The environment can then be reset by calling env.reset().

Start learning

This blogpost would be incomplete without a simple “learning” mechanism. Kevin Frans made a great blogpost about simple algorithms you can apply on this problem: http://kvfrans.com/simple-algoritms-for-solving-cartpole/.

The simplest one to implement is his random search algorithm. By multiplying parameters with the observation parameters the cart either decides to apply the force left or right. Now the question is: what are the best parameters? Random search defines them at random, sees how long the cart lasts with those parameters, and remembers the best parameters it found.

def run_episode(env, parameters):  
    """Runs the env for a certain amount of steps with the given parameters. Returns the reward obtained"""
    observation = env.reset()
    totalreward = 0
    for _ in xrange(200):
        action = 0 if np.matmul(parameters,observation) < 0 else 1
        observation, reward, done, info = env.step(action)
        totalreward += reward
        if done:
            break
    return totalreward

# Random search: try random parameters between -1 and 1, see how long the game lasts with those parameters
bestparams = None  
bestreward = 0  
for _ in xrange(10000):  
    parameters = np.random.rand(4) * 2 - 1
    reward = run_episode(env,parameters)
    if reward > bestreward:
        bestreward = reward
        bestparams = parameters
        # considered solved if the agent lasts 200 timesteps
        if reward == 200:
            break
            
def show_episode(env, parameters):  
    """ Records the frames of the environment obtained using the given parameters... Returns RGB frames"""
    observation = env.reset()
    firstframe = env.render(mode = 'rgb_array')
    frames = [firstframe]
    
    for _ in xrange(200):
        action = 0 if np.matmul(parameters,observation) < 0 else 1
        observation, reward, done, info = env.step(action)
        frame = env.render(mode = 'rgb_array')
        frames.append(frame)
        if done:
            break
    return frames

frames = show_episode(env, bestparams)
display_frames_as_gif(frames, filename_gif="bestresultrandom.gif")

Exercises to learn more about OpenAI gym

The next step is to play and learn yourself. Here are some suggestions:

Continue with the tutorial Kevin Frans made: http://kvfrans.com/simple-algoritms-for-solving-cartpole/
Upload and share your results. Compare how well either the random algorithm works, or how well the algorithm you implemented yourself works compared to others. How you can do this can be found on this page: https://gym.openai.com/docs#recording-and-uploading-results under the heading “Recording and uploading results”
Take a look at the other environments: https://gym.openai.com/envs . If you can solve the cartpole environment you can surely also solve the Pendulum problem (note that you do have to adjust your algorithm, as this one only has 3 variables in its observation).

Conclusion

Congratulations! You made your first autonomous pole-balancer in the OpenAI gym environment. Now that this works it is time to either improve your algorithm, or start playing around with different environments. This Jupyter notebook skips a lot of basic knowledge about what you are actually doing, there is a great writeup about that on the OpenAI site.

Next step

Unless you decided to make your own algorithm as an exercise you will not have done a lot of machine learning this tutorial (I don’t consider finding random parameters “learning”). Please take a look at:

Acknowledgments

This blogpost is the first part of my TRADR summerschool workshop on using human input in reinforcement learning algorithms. More information can be found on their homepage.

Getting started with OpenAI gym was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.