The number of frames to use is a hyperparameter I tuned through experimentation. Ultimately I’d prefer to use more frames, but it has a memory tradeoff: fitting, say, 100 frames into a network means far more parameters that need to be fit onto the GPU and trained. It would be interesting to see how accuracy is affected by using more or fewer frames.
After you train the model, you should have model waits saved in the checkpoints folder. You can use the utility script
validate_rnn.py with your newly trained model to see how it does on the test set.
Soon I’ll have a demo script you can use to test out any video and see how it’s classified.
Thanks for the interesting post!
I’m afraid your validation accuracy is artificially high due to the way you’re splitting train/test. Since SKLearn’s
train_test_split() function randomly assigns samples to each bucket, it’s highly likely that you have samples from the same recording used…
I think ultimately, a network architecture of this type should perform best (or nearly best) as well. However, to get it to train properly likely takes a lot more fine-tuning of parameters and perhaps multiple GPUs to handle the memory required. Definitely worth a lot more exploration to validate.