Member-only story
On the edge — deploying deep learning applications on mobile
Techniques on striking the efficiency-accuracy trade-off for deep neural networks on constrained devices
So many AI advancements get to headlines: “AI is beating humans in Go!”; “Deep weather forecasting”; “Talking Mona Lisa painting”… And yet I do not feel too excited… Despite the appeal on the outlook, these results are achieved with models that are sound proof of concept but are still too far from the real world applications. And the reason for that is simple — their size.
Bigger models with bigger datasets get better results. But these are neither sustainable in terms of the physical resources they consume, such as memory and power, nor in inference times, which are very far from the real-time performance required for many applications.
Real-life problems require smaller models that can run on constrained devices. And with broader security and privacy concerns, there are more and more pros for having models that can fit on a device, eliminating any data transfer to the servers.
Below I go over techniques that make models feasible for constrained devices, such as mobile phones. To make that possible, we reduce the model’s spatial complexity and inference time and organize data flow such that…