Biologically Inspired Software Architecture for Deep Learning
With the emergence of Deep Learning as the dominant paradigm for Artificial Intelligence based systems, one open question that seems to be neglected is “What guidelines do we have in architecting software that uses Deep Learning?” If all the innovative companies like Google are on a exponential adoption curve to incorporate Deep Learning in every thing they do, then what perhaps is the software architecture that holds this all together?
The folks at Google wrote a paper (a long time ago, meaning 2014), “Machine Learning: The High-Interest Credit Card of Technical Debt” that enumerates many of the difficulties that we need to consider when building software that consists of machine learning or deep learning sub-components. Contrary to popular perception that Deep Learning systems can be “self-driving”. There is a massive ongoing maintenance cost when machine learning is used. In the Google paper, the authors enumerate many risk factors, design patterns, and anti-patterns to needs to be taken into consideration in an architecture. These include design patterns such as : boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies and changes in the external world.
The Google “Technical Debt” article should be required reading for anyone involved in operationalizing Deep Learning systems. For easy reference and to aid in discussion, lets detail the important risk factors and design patterns in that paper.
Software architect involves patterns to ensure software code are modular and thus have minimal dependencies with each other. By contrast, Deep Learning systems (applies equally to machine learning), code is created from training data. This is the key difference between classical software and Deep Learning systems
- Complex Models Erode Boundaries
There are few mechanisms to separate data dependencies in data.
Entanglement — Deep learning models and the data used to train them are naturally entangled.
Hidden Feedback Loops — Systems that learn from the world are in a feedback loop with its actions and its observations.
Undeclared Consumers — Predictions made by a machine may be used by other systems.
2. Data Dependencies Dominate over Code Dependencies
Data dependencies have greater importance unfortunately it is far less common to find tools to discover data dependencies.
Unstable Data Dependencies — Data behavior inevitably changes over time. A mitigation strategy is to use versioned copies of data.
Underutilized Data Dependencies — Regularly evaluate the effect of removing features from a model whenever possible.
Static Analysis of Data Dependencies — Annotate data and features to allow automatic dependency checking.
Correction Cascades — Using models in a domain different from its original domain. Annotate the model to allow inspection of its original use.
3. System Level Spaghetti
Glue Code — Only 5% of the code is machine learning, 95% of code is glue code and thus should be treated with conventional software architecture approaches.
Pipeline Jungles — Invest engineering resources so that maintaining pipelines ( code for data collection and feature extraction) can be made sustainable.
Dead Experimental Codepaths — A famous example of this was Knight Capital’s system losing $465 million in 45 minutes dues to an obsolete experimental codepath.
Configuration Debt — Machine learning algorithms can be elegant, but a lot of real world messiness can be found in their configuration.
4. Changes in the External World
The world is rarely stable and therefore these system need to be adaptive.
Fixed Threshold in Dynamic Systems — This applies to classical machine prediction models where arbitrary thresholds are defined rather than learned from the data.
When Correlations no longer Correlate — Models that assume correlation may break when the correlations no longer hold.
Monitoring and Testing — Live monitoring of behavior is critical.
As you can see, the problems are vast and the solutions are quite limited. However, as we explore newer architectures (i.e. “Modular Deep Learning” and “Meta-Learning”) we can begin to seek out newer solutions. A good inspiration, that I stumbled upon can be found in this insightful blog (Scientific American) that describes “Building a Resilient Business Inspired by Biology”. The author describes 6 features found in biology and applied it to business processes. I will take the same approach and see how it may apply to Deep Learning systems.
- Redundancy. Duplication of components may be inefficient however it provides the mechanism to handle the unexpected. In addition, functional redundancy offers a way to repurpose components to reduce costs.
- Heterogeneity. Different predictive machines make it possible to react to a more diverse range of change as well as avoid correlated behavior that can lead to total system failure. Diversity is required for evolutionary learning and adaptation.
- Modularity. Decoupling of components act like firewalls between components and help mitigate against total collapse. Individual component damage can be tolerated while the integrity of other components are preserved. In general, a distributed loosely coupled system has higher survivability that a centralized tightly coupled system.
- Adaptation. A systems needs to be sufficiently flexible and agile to adjust to changes in the environment. Adaptive approaches that involve simulation, selection, and amplification of successful strategies are important. Self-learning is requirement to achieve adaptability.
- Prudence. The environment is unpredictable and thus the management of uncertainty should be built in. Thus continuous simulations that stress test the system as well as the development of alternative scenarios and contingent plans are necessary.
- Embeddedness. Systems do not exist in isolation and are embedded in a much larger ecosystem. Therefore these systems require behavior that works in a way that is of mutual benefit to the ecosystem as a whole.
These 6 features are excellent guidelines on how to build not only adaptable systems, but one’s that are ultimately sustainable. It is important to note the importance of “loose coupling” in biology.
A recent paper from the folks at Berkeley are exploring the requirements for building these new kinds of systems (see: “Real-Time Machine Learning: The Missing Pieces”). The project is Ray from Berkeley’s RISELab, although they don’t mention it in their paper. They make the argument that systems are increasingly deployed in environments of “tightly-integrated components of feedback loops involving dynamic, real-time decision making.” and thus requires “a new distributed execution framework”. The difference between the classical machine learning system and their new framework is depicted by this graphic:
The authors spell out 7 requirements that this new kind of architecture needs to support:
- Low Latency — milisecond end to end latency.
- High Throughput — millions of tasks per secon.
- Dynamic Task Creation — the number of tasks required needs to be dynamically allocated.
- Heterogenous Tasks — the resources and execution time required by tasks vary wildely.
- Arbitrary Dataflow Dependencies
- Transparent Fault Tolerance
- Debuggability and Profiling
What was just described was the state-of-the-art thinking in design. Clearly, we have a very long way to go in terms of architectures that are adaptive to the environment. Although the prescription does address other aspects such as heterogeneity, redundancy and modularity.
Present day software architectures are clearly not up to the task in accommodating systems that employ Deep Learning components. A new kind of architecture is clearly demanded. It is very early, but this is a very important area and it is essential that our Deep Learning systems have manageability built in. After all, every complex technology requires manageability to be economically sustainable.
It has come to my attention that DARPA has a new program “Toward Machines that Improve with Experience” that “seeks to develop the foundations for systems that might someday learn in much the way biological organisms do”:
Concepts from nature could include but are not limited to:
(1) Mechanisms for evolving networks;
(2) Memory stability in adaptive networks;
(3) Goal-driven behavior mechanisms;
(4) Learning rules and plasticity mechanisms;
(5) Modulation of local processing based on global context, neuromodulators, hormones;
(6) Minimizing resources use in processing and learning;
For more on this, read “The Deep Learning Playbook”