One of the things about machine learning that excites me is how a neural network creates a compact representation of the world that could be portable and used in any mobile device not only on centralized server farms.
You train a model using gigabytes of data and the end result may only take 10 MB and perform a specific task with high performance. The process of learning is slow but inference is fast and pretty lightweight.
Why is this? A neural network has a roughly fixed size of parameters based on the number of neurons and connections between them. When you train it, you squeeze data through these connections of neurons and they get gradually configured to solve an optimization problem that reduces the error in a cost function that compares the predicted output and the true value. This process is called learning. Once trained, the neural network can decide and make predictions based on novel inputs. Even if you feed more data, the model will not increase in size unless you change the architecture. The other thing is that the original data is forever lost, the process of learning is lossy, the model just keep an essence of the instances it saw in the form of learned features.
This opens the opportunity to pack a small chunk of "intelligence" which is smart enough to perform a task, in a decentralized way, without any round trip to servers to take a decision.
The goal of the experiment is not about fine tuning the model and get the best accuracy, but about learning on the server and doing all the inference client side in the browser. For this purpose I used a dataset from my friends from Properati with real-state images. The idea is to tell apart images which are renders from the real ones.
I used the following tech to implement it:
- A Python script to scale all images to 200x200, standardizing the colorspace and format.
- A convolutional neural network for extracting the features that define if an image is a render or a real house, I didn't want to do any type of feature engineering or image processing apart from scaling.
- For this purpose I decided to use Keras on top of Tensorflow. Keras is a high-level deep learning library, well documented and written in ❤️ Python. It abstract many of the complexities if you would write code your directly in Tensorflow. Many thanks to François Chollet for his contribution :)
- Train the model on my own GPU at home (Nvidia GTX 1080).
- The site is built using Next.js framework.
The model was trained using around 60.000 real-state images around 50% for each class, a total of 600MB of data.
The resulting model takes 2MB, and perform inference pretty fast on my Macbook Pro 13.
You can test it on your own computer https://rendernet.abarmat.com
I would be interesting to try different types of tasks apart from a binary classification, like style transfer, GAN, real-time pattern recognition, etc.
What do I like about all this?
I think there are several advantages about neural networks apart from the obvious ability to identify patterns and predict outcomes.
- Privacy: If you don't need to send your data to a server, your data is protected as all decision making is done right on your device.
- Decentralization: All these models working as standalone agents mean that they will be resilient to server outages, network delays, and censorship.
- Online learning: Learning is a slow process, but with future improvements in dedicated processors we could have millions of devices learning different strategies which will add diversity to avoid having a single machine-learning model failing massively on certain decisions. Much like we humans learn from our different experiences.
Code is available on GitHub -> https://github.com/abarmat/rendernet
If you try it send me any feedback!