My M2M : August - Week 3

This month’s challenge is to build a project for my local science fair. The project must use a neural network or genetic algorithm and be at the standard necessary to win the competition.

James O'Keefe
Aug 23, 2017 · 3 min read

So I’ve been working hard this week, and I feel like I’ve made real progress towards this month’s goal. Due to my inexperience with Deep Learning, my brains been fried at different points during the week. It’s been a steep learning curve, grappling with vexing concepts like why, theoretically, the LSTM RNN should be superior to another network architecture for action recognition. I knew a small amount previously from Andrew Ng’s machine learning course, but theory is no substitute for practice.

Anyway, the week commenced with the long, mindless chore of combing through my CSV file, containing every file and its class, and deleting all classes with less than 5 examples. In this case, class means the correct answer to the example (file) i.e. the English translation for the file’s sign language. The goal of my program is to look at the file, and guess its class.

My previous plan was to download every example, but downloading over 12,000 videos takes endless hours of time, internet bandwidth and a lot of GB on a computer. Besides, many of the classes only had 1 or 2 files for training. I figured that it would be useless training the model on only 1 or 2 files for a class. There wouldn’t be enough for a test set, and the accuracy would be pretty low due to a lack of data.

As such, any classes with less than 5 files were deleted. This still left close to 5,000 files for training, at an average of 7 files per class. I ran my script to automatically download every file on the list, before sitting down to read my book as my computer is unusable while the script is running. 4 hours later, I returned to find my script was only 25% finished. Growing impatient, and worried about my ISP throttling me, I terminated the script and decided to test my script on smaller, more nimble data to get faster feedback. For the final version I still plan to download those extra 3,700 files.

After this, I ran 2 bash scripts, one to create a folder for every class, and another to place each file into the correct folder, based on its class. Then the brief, yet repetitive chore of manually placing ~25% of files from each class into the test set. My reasoning behind doing this drudgery was that there weren’t enough classes to justify the time spent on writing the script to automate the process.

After this, I wrote a script to loop through every class in the test_data and train_data folders. For every .mov file, I used the ffmpeg library to extract the video into a sequence of .jpeg files, at a rate of 10 frames per second. Another script used the Imagemagick library to crop the image to 300 x 300 and then convert the image from RGB to greyscale.

Unfortunately, I made a disheartening rookie mistake, forcing me to repeat this extracting, cropping and de-colourising. While these normally would have been no big deal, my 2011 MBP is exasperatingly slow. One incorrect word (I wrote convert instead of mogrify if you’re interested), cost me a seemingly ginormous amount of time.

Yet finally my precious data is ready for training. I wrote a quick Convolutional Neural Network in Keras to use for benchmarking. As I explained in my last post, a CNN (Convolutional Neural Network) only recognises spatial features (the pixels of the picture). Other, more complex network architectures such as the aforementioned LSTM network can also take advantage of temporal features (how the current frame relates to past frames).

Due to these constraints, I don’t expect for my CNN to have good performance. I plan to run it overnight and update you with the results in the morning.

Thanks for reading!

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade