Learning to Coordinate in Deep Learning Multi-Agent Systems

Published in

Intuition Machine

8 min readFeb 25, 2017

Credit: https://unsplash.com/search/orchestra?photo=slbOcNlWNHA

In the inevitable transition towards more modular multi-objective and multi-agent Deep Learning systems, we need to begin exploring the same loose coupling principles that underpin the coordination of distributed systems. A key criteria in building effective DL systems is better generalization. Although, generalization can mean many things, we can at a minimum accept an intuitive interpretation. That is, generalization implies systems of greater adaptability. In distributed architectures, Loose coupling principles encourage greater adaptability and therefore can provide valuable ideas on how best we can architect analogous DL multi-agent architectures.

Another justification is that intelligent systems are expected to contain a massive degree of diverse agents and therefore any mechanism that demands tight coupling is a mechanism that will not scale. Therefore, given a choice in selecting which mechanism to use, a loose coupling mechanism should be the preferred choice. If we think about this even deeper, all behavior should be based on the least amount of information, on only local information. This clearly sets it up to favor low information coupling. Any method that requires high information coupling to decide on an action is a method that is intuitively not the correct one.

A warning to the reader, this is highly speculative stuff and therefore should be either ignored entirely or treated with a grain of salt. Reading this can only lead to greater confusion. With that out of the way, let’s review some loosely coupled principles that I dug up from a past life:

Mapping these principles to may not be applicable to DL multi-agent systems. However it can be educational to explore each one and potentially propose an equivalent viable approach. As a caveat, we are constructing here a hypothetical system based on a future idea of a network of collaborative and competitive agents tasked with solving a problem using imperfect knowledge (see: 5 Capability Levels of Deep Learning). These are merely some preliminary ideas that we may want to bake into a hypothetical multi-agent system.

Fixed Verb Interfaces

Ideally we would like to have as few interfaces as possible to reduce incompatibility as well as increase plug and play. Think of how effective the USB standard has been for power and communication convenience. One other thought is to build these kinds of systems employing FIPA communicative acts or speech acts. The present approach for DL systems is to learn how to communicate on their own. Perhaps however by adding constraints on the nature of communication, inspired by speech acts, may lead better reusability. So for example, a neural network may be trained against another system with speech acts as the protocol, conceivably this kind of system may be adaptable in another context where speech acts are also used as the protocol.

Document Passing based Messaging

Communication is likely to be fire and forget document message passing. It’s just an easier thing to learn. Presently, most research on learning to communicate tends to employ message passing rather than a coordinate request/response or a procedural like invocation.

Dynamic Typing

One would think that there’s no notion of types for representations in DL system. However, as we have seen in Google’s NMT system, the addition of a label that indicates a representation’s language was one of important tricks to achieve one-shot cross translations. So there is evidence that tagging data with its type may be valuable even for DL systems.

Asynchronous Synchronization

Monolithic DL system actually have a rigid form of synchronization with respect to back and forward propagation through its layers. In a more modular rendition, we would like to relax these restrictions such that synchronization between components are not required. Besides, biological brains don’t require a single clock like computer systems. So one should not have this same synchronized requirement for DL systems.

Queried References

In DL system,there are Pointer Networks that maintain hard coded references similar to memory pointers. A more modular system would require a second level of indirection such that the reference is to a query and not necessarily to some internal representation. This is analogous to associative or context based retrieval. In other words, if there is a retrieval to be made then the request for that information is through a query and not through a opaque identifier (as we see in conventional computers).

Self Describing Ontology

Representations for DL system are definitely not self-describing, they are in fact opaque. This of course makes it next to impossible to coordinate between multiple interacting agents if there is no mechanism for sharing knowledge. Furthermore, there does not exist research that tries to learn a “meta-level” representation of data. This is going to be the major technical hurdle for multi-agent based systems. A lesser form of this problem is the Fixed Verb approach. One development to note however is that in the DeepMind’s PathNet approach, lower level representation sometimes shared between networks.

Pattern Based Schemas

A pattern based schema is one that is more loosely defined than one that is grammar based. The idea here is that only partial specification is necessary for interoperability. There is a concept in DL that nuisance variables are ignored in training or the notion of invariant features in data. One would like to train to ignore data that is invariant while focusing on features that are distinct.

Multicast Communication

Ideally, one shouldn’t have to care how subnetworks are wired together. The exists research where networks are wired in a hierarchical manner (see: Maluuba and DeepMind). Where there are networks that are coordinators for much simpler networks. Multicast networks assumes that participants are able to reason on their own as to what messages are worth listening to. This is a big burden to justify.

Brokered Interaction

Ideally, in an adaptive system, no two components are hard coded for interaction and that dynamically that interaction can change depending on context. We’ve seen an architecture like this in “Conditional Logic in Deep Learning”, where the selection of subnetworks are controlled also by a layer. Also in CPPN based systems, there are brokering components that are neural networks that’s sole purpose are to learn how to adapt one layer to another layer.

Lazy Evaluation

Alternatively, we can think of this as late-binding. That is, if we can defer commitment until action is necessary. The value of this capability related more to planning execution. That is, there is an additional dimension to plan execution such that work is not performed only when dependencies are available. It is difficult to see how this applies to DL other that the fact that DL systems are naturally data flow based system and thus lazy evaluation is a foundational implementation feature. What is interesting however is the application of the late-binding principle to recognition. One of the glaring deficiencies of DL is its lack of adaptability and perhaps some kind of learning on the fly, a lazy learning process, may be a solution to this problem.

Adaptability, Interoperability

These are just general principles that guide a loosely coupled system and should also be applicable to a loosely coupled DL system.

Reactive Behavior

There is research in Learning to optimize or Learning to plan, where a DL system is able to propose execution plans. RL system by their nature learn reactive behavior and long range planning is certainly more difficult to implement. The hunch here is that DL systems as a consequence of their poorer logic inferencing capabilities are going to perform better as reactive agents as opposed to planning agents.

Market Driven Coordination

Market driven distributed coordination is likely going to be the mechanism for multi-agent coordination given that complexity of learning how to perform central command driven execution. The distribution of responsibility is one that seems to be more scalable and realistic.

Self Describing, Explicit Contracts

Contracts are necessary in a market driven economy to ensure that participants are compliant in their behavior. This implies then that contracts themselves are representations that provided guidance for execution as well as compensation in cases of failure. DL system that learn how to perform compensation based on failure is certainly going to be an interesting research topic.

Optimistic Transaction

We have to assume failures will exist in a marketplace (or in coordination) and therefore participants need the additional capability of performing compensating actions. It will indeed be interesting if we create systems that can learn this behavior.

Prototype Based Classification

The notion that there is a strict separation between instance and meta-level data is an artificial construct and sometimes may be too restrictive. Today, it’s unclear yet how representations can be built to represent a class of a concept. One can however imagine the storing of exemplar instances or prototypes and using this as a way to kickstart classification.

In a more abstract classification, loose coupling techniques have three recurrent characteristics. These are, late binding, mediation and decomposition. Interestingly enough, there are research papers covering all three characteristics, the most intriguing one of all how late binding is applied in the context of connectionist architectures.

One thing that seems to recur often enough that requires one attention is the notion of meta-level representation. Meta-level representations exist due to the need for coordination. It is a capability that is akin to having a system with internal self-awareness. This self-awareness of course seems like a problematic requirement. One should expect to build intelligent systems without the need for intelligent subcomponents (see: Artificial Intuition). However, the meta-level reasoning components themselves may also be unintelligent and may be unaware that its operation is at a meta-level.

Another observation here is that its hard to divorce oneself from a symbolicist approach once we begin discussing meta-level concepts. It is therefore quite conceivable that an embodiment of the above system would be a hybrid symbolicist/ connectionist architecture. Alternatively, just like the mind, something that works of dual process theory. That is, a intuition and a rational machines all working in concert.

What I did here is examine ideas from distributed computing and to see if the basic idea of loose coupling or low information coupling can serve as inspiration on how to build multi-agent DL systems. One main conceptual stumbling block is the notion of meta-level information that exists in distributed systems. However, there is indeed still a lot of common ideas that clearly should be leveraged. This exploration in promising in two aspects, the first is in identifying fundamental principles and the second in identifying constraints on how multi-agent systems communicate. The identifying of constraints or alternatively called the definition of a language of discourse, provides a bounded space for exploration and therefore may make feasible the possibility of learning to coordinate.

https://openreview.net/pdf?id=Hk8N3Sclg MULTI-AGENT COOPERATION AND THE EMERGENCE OF (NATURAL) LANGUAGE

Learning to Coordinate in Deep Learning Multi-Agent Systems

Written by Carlos E. Perez