Learning vs training in machines and organizations: Production of knowledge vs production of capability — Exploration of metaphors

Published in

Metaphor Hacker

24 min readFeb 6, 2018

Machine learning is to organizational learning as…

A recent lecture by Gary Pisano made the parallel between machine learning and organizational learning in the sense that they are both processes that produce knowledge. My immediate reaction was that they are not alike because machine learning is really a misnomer. Or rather machine learning has some aspects of learning but is dissimilar from prototypical learning in enough ways to make the analogy dangerous. But my second thought was that actually organizational learning and prototypical learning are dissimilar in the same ways that machine learning and prototypical learning are. We could then learn more about both if we look at the various mappings and mismappings (a sort of metaphor hack).

What follows is a sketch of some of the ways we could think this through.

Prototype knowledge frame

I would propose that the prototype (frame) of learning is built around the acquisition of communicable and demonstrable knowledge. We learn history, physics, maths, medicine and the way we show we’ve learned these subjects is by demonstrating knowledge of them. This, of course, is not all there is to the frame of learning. We also learn to swim, learn to speak French, learn to solve problems. These are examples of the acquisition of what we could call capabilities. This is well reflected in educational theory and practice such as when dividing learning objectives into knowledge, skills and attitudes (KSA). But if we look at language and behavior, our default framing tends to learning = knowledge. A quick survey of most curricular objectives will be very heavily focused on on knowledge. Equally, most objections about poor education are illustrated by lack of knowledge of things.

Defining knowledge and capability

For the purposes of this exercise, I would define knowledge as something that can be communicated in a significantly shorter time than it can be acquired. It may take me days to memorise all the state capitals but I can recite them in minutes. Or it can take me years to research a topic for a book, months to write it, and days for someone to read it. Capability then could be defined as something that requires constant time to acquire given the same context and the same method of acquisition. And we have a word for the aquisition of capability, namely training.

Of course, both knowledge and capability can be defined in many other ways and a similar distinction could be made using different terms. Some terms that come to mind as alternatives to capability are phronesis, practical knowledge, or tacit knowledge, skill, etc. They are not really the same beast but they graze in the same semantic fields.

A great way to illustrate this further is the distinction used in linguistics between knowledge about and knowledge of language (this could be enriched by the famous competence/performance distinction but I’ll set that aside for now by focusing only on competence). I know quite a lot about the grammar of Swahili, yet, I do not ‘know’ Swahili or I don’t have any capability to speak it. However, I ‘know’ Czech and (having written a grammmar of Czech), I know a lot about Czech. I can communicate my knowledge about Czech to someone in an amount of time it takes to speak or write. However, my knowledge of Czech cannot be communicated. I have to take someone on a journey similar to mine (in time and effort) — or, in other words, train them.

Machine learning is machine training

Now, machine learning is really machine training. Current algorithms are trained on data sets with certain parameters and will assign certain probabilities (weights) to nodes on a decision tree (this may be direct with real decision trees or indirect with neural networks where the decision trees are multidimensional matrices). But no knowledge that can be communicated is produced. Unlike with humans, I can take the weights produced by the learning algorithm and transplant them to another computer but I cannot communicate what those weights mean. And I cannot easily add to the knowledge in the way I can add more facts to my knowledge about something. I can add more training data to improve the weights for an intended purpose or I can use the algorithm to train on a new data set and produce entirely different weights. But in both cases, I am not producing or communicating knowledge. I am replicating a training method. That’s why machine learning plays a relatively small role in something like climate modelling. You can train it on patterns and produce predictions in fractions of the time it takes to run a complex climate simulation (which could be literally months of computing). But this learning is a complete black box. You cannot extract communicable knowledge from a neural network. And because the feedback loop of testing on climate processes is measured in decades, it is impossible to rely on models produced solely by machine learning (I’m simplifying, of course).

Knowledge in organizational learning

How about organizational learning? Well, it’s possible to produce a lot of knowledge about organizations. Make lists, taxonomies, charts and graphs. And parts of this knowledge can be and routinely are communicated. When I join a new organization, I am told about all (or many) of the structures and rules. But as the anthropologist James C Scott points out, these rules are parasitic on the actual complexity of actions and relationships that are created in any society or organization to make it work. That’s why following the rules exactly is tantamount to a strike. The rules and structures are idealizations that express a certain vision (or model) of what is happening but I cannot be successful with just knowing about them. I have to spend time in the organization to learn them in the same way I would have to spend some time in Kenya or Tanzania to really learn Swahili.

It is well known that organizational learning cannot be simply communicated across organizations. That’s because it does not produce knowledge. It produces capabilities. That’s why it’s not easy to just ‘bring back computer manufacturing to the US’. Manufacturing a chip is a known process. With enough money, anyone can buy a fab. But even with the levels of automation in the chip manufacturing process, the levels of organizational learning (which includes the learning of all the relationships in the society of organizations and the workforce available in a region) simply cannot be created out of whole cloth in the same way that a manufacturing line can be. You need to train the organizations to acquire the same capabilities.

Now, that does not mean that there is no communicable knowledge being produced as part of organizational learning. But it is produced by people in the organization not the organization itself. And it is the knowledge of more efficient structures and processes to make sure organizations can achieve the kinds of capabilities that will lead them to success. In the same way, we can get better and better training methodologies for languages, musical instruments, or sports (described in great detail in Ericsson’s Peak). They can reduce the amount of time or effort to achieve certain capabilities. But in competitive environments like sports, the total amount of effort and time is constant because everyone is using the same methods, so the average performance improves but the effort is constant (ie. it takes much less time and effort today to become a top athlete or virtuoso pianist of 50 years ago but as much time to reach the peak performance of today.) And with some more complex and harder to measure capabilities like social skills, speaking a language, or becoming a good manager, the time and effort reduction is minimal even with better training methods.

Additive and non-additive learning

The same applies to organizational and machine learning. It is not that lessons from organizational learning cannot be learned and transported. But each organization has to go through the process to train itself to acquire the capabilities. And there’s a bottom limit on how quickly that can happen. It will be months to years for single organizations and years to decades for interconnected regional and supraregional industries. That is why huge mergers hoping to capitalize on synergies so often fail. The merger drivers assume that the organizational learnings are additive but they are mostly not. If you get two people who speak a year’s worth of French, they can certainly build on each other’s competence (assuming it is complementary) but they will not be as proficient as somebody who speaks two years worth of French. They might even be less proficient together than they would be individually because of the overhead it takes them to negotiate who knows what and who should speak when.

This is also the case for machine learning algorithms (or neural networks). You cannot add together their joint abilities in the same way you could add up two parts of a knowledge repository (like an encyclopedia). Two speech recognition systems that can each recognize speech with 50% accuracy cannot be added together to get 100% accuracy even assuming their coverage was complementary. You would still need another expert system to decide which was the accurate transcription in every case — and if you had that, you might as well use that to do the transcription. But such systems do not exist as we know from P=NP for problems where correctness is easier to check than it is to arrive at an answer.

However, that does not mean that additive complementarity is impossible. On the contrary, where an easy decision algorithn is available, complementarity is incredibly powerful. Hypothetically, if we had one speech recognition system for male and one for female voices, it would be much easier to come up with a third system for distinguishing which system to use in every particular case. The same could apply to the two limited French speakers. If their domains of competence were clearly delineated, their skills could be additive. Say if one could order in a restaurant and the other ask for directions to the restaurant.

The easiest and most straightforward decision algorithm is sequentiality. If I know two stages of a process come after one another, I can train a system on them separately and then add them together. This is what much of deep learning is about. The same goes for organizations. One path to success (suggested by research) for a secodary school is to aqcuire a primary school. That way it can improve its results simply by improving inputs. Apple has done the same with chip design. (Although there is a limit to how much this can be done — for instance in manufacturing, given that the quality of a supply chain often arises from its diversity, a car manufacturer acquiring manufacturers of all parts may make temporary gains in quality at the expense of long-term improvement.)

But this is a different kind of complementarity than that of adding together two encyclopedias or dictionaries with complementary entries. We don’t get one improved trained unit but two separate complementary units. In fact, in this sense, we could say that machine learning only works because of non-intersecting additive complementarity with human learning. Humans have to decide where and when machine learning algorithms can be deployed. They provide intentionality and complex embodied judgement. They also provide something machines do not have, which is metacognition.

Giving a person a calculator will magnify their abilities, giving a chess master a chess computer will also give them a definite edge. But we cannot upload the processing capabilities of the calculator into a person’s brain and make them into one system. All tool-using is a case of additive complementarity but only on the surface. An organization may use another to complement its capabilities but a merger of the two often does not result in the same outcome. The complexities of merging the myriad of synapse-like connections and system activations proves an impossible task even for the most intelligent of designers. The two merged organizations simply have to learn how to be one.

Can machine learning simulate organizational learning?

Which brings us to the question also raised by Gary Pisano which is the ability of machine learning to simulate organizational learning. This is easily imagined in the case of highly mechanized environments where machine learning is necessary to process the amounts of data generated. Or in highly complex environments where humans have a well-known cognitive limitation — for instance, organizing the best rotas and paths for workers in a warehouse. But even here we run up against the limits of complexity. NP-hard problems are NP-hard problems even for learning machines.

It is possible that some complex system problems are simply not machine learnable but rather the systems themselves compute the solution which emerges from their interaction. Here we may find the Hayekian problem of knowledge in society relevant. He was, of course, arguing with people who believed it was possible to centrally allocate resources within the economy with sophisticated algorithms using new data collection methods (back then computed by human ‘computers’). His point was that the market using the price signal can achieve better efficiency through self-organization. Efficient central planning, Hayek argued, was impossible in principle because we cannot collect enough data and even if we could, we could not compute them. It seems that Hayek have been correct in the sense of the computational tractability of perfect market efficiency, but is it possible that a sufficiently powerful machine learning system could outperform the computational efficiency of the market itself?

Or could this be at least achieved at the scale of very large organizations? Is it even needed? Massive conglomerates have been successfully managed through more direct methods for decades with little recourse to neural-net-based-AI which have only produced anything resembling useful results in the last 10 years.

It is certainly not possible at the current state of machine learning and with the available data. The advances in the field of learning algorithms necessary to make that leap illustrate an important difference between current machine learning and organizational learning. Organizations can organize and process their own inputs — whereas machine learning requires significant human effort to create the kinds of data structures and interpretations on which the machine learning algorithms can be applied. Organizations are analog computers operating directly on the data whereas machine learning always needs to convert the input into discrete units that can be expressed as vectors.

Types of machine learning as metaphor

There are three broad types of machine learning: 1. supervised, 2. unsupervised and 3. reinforcement. (These are often combined in actual deployed systems.)

1) Supervised learning simply means that the system (such as neural net) is trained on a highly processed, extremely large set of labeled data. For instance, pictures with labels of what’s in them or pairs of sentences and their translations. The network can then essentially compute the likelihoods of new data matching one of the labels. Or rather these likelihoods arise out of the complex matrix of activations across millions of parameters. Something that is completely computationally infeasible otherwise. What is important to know, that there is no list of ‘if-then’ statements anywhere within the system of the type ‘if object is round, increase likely hood object is a ball by x%’. (This is what the first wave of AI systems used and it seems to me to be what most people still imagine that machine learning produces.) We get extremely great aggregate results but no way of accounting for errors such as mistaking a toothbrush for a baseball bat. That is why these systems are susceptible to unpredictable adversarial attacks — for instance, understanding a seemingly random noise as a spoken command. Nevertheless, supervised learning has produced almost all of the useful and most directly visible outcomes of recent machine learning advances such as image recognition, machine translation or speech transcription.

2) Unsupervised learning is simply identifying regularities and irregulaties (patterns) in an unlabeled set of data. For instance, asking such an algorithm to classify words in a corpus of 10s of millions of English words, would give us a classification similar (but not identical) to what we know as nouns and verbs. A useful application of this is in network security where a machine learning system can learn typical patterns and then identify irregularities. This could also be used to identify potential credit card fraud.

3) The final type is reinforcement machine learning. This was used to teach AlphaGo to defeat the world champion. The system played many millions of games against human opponents (through past records — not in real life) and then itself. It used its failures and successes to adapt the weightings of moves and counter moves that lead to success. It is also great for training robots to navigate complex and changing environments.

A special case of reinforcement learning is evolutionary computation (although it could also be seen as separate). It can for instance combine multiple reinforcement learning agents working in parallel and testing out different solutions with assigning different fitness weights to them. This can be enromously powerful when A/B testing web design.

Aside: What’s missing in recent advances in machine learning?

Before we proceed to think about the analogies between machine learning and organizational learning, we should be very careful about how this machine learning revolution came about. It has three sources: 1) new computational methods to employ applied mathematics (such as differential calculus) to the design of networks of weighed nodes — in other words advances in software engineering and applied mathematics; 2) advances in computational power (such as graphics cards) and memory capacity that make these new methods feasible (such as increasing the number of layers of neural nets from 1 to 3 or even more); 3) availability of large amounts of structured or semi-structured machine readable data.

But nowhere on this list of accomplishments are advances in our understanding of how the mind, language or culture work. Quite the opposite. Fred Jelinek (a speech recognition pioneer) supposedly once quipped that the more linguists he fired from his lab, the better outputs he got. I’m not surprised. All of our understanding of how the mind or language work is either computationally infeasible (in the sense of incomputable collections of if-then statements across seemingly infinitely many dimensions) or too unstructured to be amenable to machine learning approaches (for instance, traditional semantics or semiotics or anthropology — even if we could structure this data — like in FrameNet or even WordNet, we could still not collect enough of it).

Caveat: The seductiveness of successful metaphor

It is very seductive to conclude that just because the new machine learning (neural networks known as deep learning) is essentially inspired by how we think the brain is architected, it is more likely to produce human-like intelligence (AGI). But that is just as wrong as saying that just because airplanes have been initially inspired by how birds fly, we will soon have the capacity to fly like the birds. Airplane flight looks like bird flight (wings) and relies on the same fundamental physics. But it is achieved completely differently from bird flight and it does not only not make us able to fly like birds. It also contains no pathway to that ability. I suspect that neural networks are the same.

What kind of machine learning is organizational learning most like?

Given all that, it is not difficult to see that organizational learning is not similar to all three types of machine learning. It is mostly like reinforement learning (although business schools teach it in a supervised learning mode). And we can also envision that reinforcement learning algorithms with the assistance of unsupervised learning pattern recognition would be where the most fruitful approaches to achieve AI-managed organizations. Evolutionary algorithms could even simulate the market and achieve efficiencies without the inconvenience of actual market failure. However, it is equally possible that we will soon run against the limits of the sort of data we can provide these algorithms as inputs. It is far too easy to underestimate the importance of information architecture and data analysis.

So can we think about an organization learning as a type of reinforcement learning? We certainly can but with important caveats. The useful parallel is the complexity of the learning that arises out of massive amounts of connections with constantly changing weights. Which means that the system has learned to respond in useful ways to variations in inputs and produce usable outputs. But we should remember that the learning is a bit of a black box and we cannot simply extract the knowledge as a series of if-then rules which business case studies so often try to do.

However, the actual topology of the network is infinitely more complex with too many nodes and connections to identify. But even more importantly, the nodes in an organizational networks are very complex agents rather than just passive probability repositories. This means that organizations are much more able to adapt to change. The famous AlphaGo system would have been stumped if it was asked to play go on a board of 20x20 instead of 19x19. A human player would adapt easily.

Aside: Learning from single examples, the human way

The difference between human individual and a machine learning algorithm is instructive. Machine learning needs inhuman amounts of data. Humans can learn from single examples by techniques such as analogy or metacognition. For instance, it may take me hundreds of tries to learn that a particular animal is a dog but only one time of seeing a platypus. I can also build on previous experience in creative ways. It took me months to learn to play anything resembling a tune on the guitar and years to play it well. Yet, it took me less than an hour to play a song on a Ukulele and a few months to play it well. I also have all sorts of knowledge about my knowledge that helps. So I know that if I want to know what letter comes after e, I just have to say a, b c, d, e and ‘f’ will pop into my head in a way it does not when I just say ‘e’. I also know a lot of things about the world in a completely non-propositional way that helps me immediately recognize that a baby holding a baseball bat is more unusual than a baby holding a toothbrush. I did not have to learn this as a fact or even as one of the many configurations of the world. I just know it. And without all of the above human learning and intelligence are inconceivable. And none of it is in any way present in machine learning.

But how about organizations? Are they more like individual humans or more like machine learning systems? They are more like machine learning systems in that they need to learn as a whole to adapt to new conditions. When a new regulation or manufacturing method is introduced, it takes more time for an organization to adapt to it than it might take any individual human within that organization. Which is why it takes time when a new manufacturing process is introduced before new levels of productivity can be reached.

However, because an organization does not consist of dumb neurons that can only adapt their individual weights but humans who are capable of complex adjustments, there are many types of changes that it can adapt to easily. For instance, I may send out an email to everyone working in an office that the back door can no longer be used. This may completely throw a machine learning system trained on a situation where the backdoor could be used. However, the individual humans will make the adjustments very easily and the whole organization will reroute its activity around that without having to relearn everything about how it works.

Negotiating the metaphor

So the trick for a planner, leader or organizational designer is to figure which learning metaphor is more apt for any given example of organizational change. Is this a kind of change an organization will have to learn in the same way a machine learning system is trained on new data or is it more like the kind of change a human can adapt to almost without breaking a stride? Or is it the kind kind of change that even an individual human will find difficult? Such as learning to play the trumpet having mastered the ukulele.

Oversimplifications as generative metaphors: A how to guide

Having spent so much time showing the different ways in which machine learning, organizational and human learning are not alike, it might seem logical to conclude that we should ditch all such comparisons. But that would be extremely unwise. Generative metaphors work best when there is more dissimilarity between two domains of comparison than similarity. But as long as we know enough to avoid the seduction by analogical success and the isomorphism fallacy, we should feel free to explore. One way to do this is to see what comes out if we look at one domain in terms of another.

So it may be actually extermely useful to recast organizations as machine learning systems based around reinforcement. One of the problems our models suffer from is what Bayesian statisticians call overfitting. This happens in extremely complex environments when we try to add additional variables to a model to make it reflect reality more. But because the reality is very complex and noisy, we can never get all the parameters. So we are just as likely to pick ones that will fit the model to random noise as ones that will make it more useful. Therefore, if we simplify the organization as just a network of interconnected nodes with changing weights (neurons and synapses) we may be able to think about organizational learning in ways that will make it more tractable. The more we try to complicate the model by giving the nodes structure, the less tractable it will be and the more noise it will generate.

So what can this generative simplification get us? Well, it may help us think about things like iterations, input data and data structures. This way we can design better environments in which organizational learning succeeds or identify key elements in designs that already proved successful.

But we may also be minded to take into account that organizational learning is a black box that is conditioned by the training data. That may make us skeptical about rich narratives of change based around the behavior of certain agents. They may be significant but the model may be better off without them.

Then we must be mindful not to forget that those thick descriptions capture realities from the perspectives of those involved and that we cannot describe the behavior of the individual in terms of the system in the same way we cannot describe the behaviour of nodes in a machine learning network as learning. Thus statements such as ‘the organization learned because its members learned’ may become slightly suspect within this model. But again this also does not mean that the individual’s learning is irrelevant but just maybe that it was not part of the organization itself learning as a black box machine learning system. So we may start thinking about what role does an individual’s knowledge play in the organization? What does it mean when an organization ‘learns’ for its mistakes, etc.

The key thing is to remember that at every analogical step we should reaffirm how closely tied our statements are to the parameters of our model and that the partial success of the model does not make it more real than richer models that may not be quite as useful due to their computational intractability.

Conclusion

So what is the purpose of all this? The various dimensions of machine and organizational learning are already well known. But perhaps contrasting the various models of learning can provide useful models for thinking about any one of them. As in any analogical chain, it is as important to identify the discontinuities as it is to identify continuities. Otherwise, we are in the danger of overfitting — or the isomorphism fallacy. Machine, organizational and human learning have many parallels but they are not the same. They have many superficial similarities and deep dissimilarities.

Perhaps the key lesson is the distinction between knowledge and capability. Organizations and machines do not produce anything resembling knowledge. And perhaps acknowledging that can be useful. But organizations are different from trained neural networks because their components contain knowledge which can be harnessed for training them on new circumstances. All of that could be done without the machine learning analogy but this was my journey. Perhaps it can be of some use.

Final Caveat: Prehistory of machine learning: Teaching computers rules and words then and now

One of Stephen Pinker’s most powerful and most deeply misleading book titles is ‘Rules and words’. It is a book showing the power of the computational model of language that can take 2 sets of finite inputs (rules and words) and generate an infinite set of patterns (sentences of a language). This could be seen as the last gasp of the old way. When I started studying linguistics exactly 10 years before the book was published, it was the mainstream that still gleamed with the shine of the new. But when the book came out, nobody was doing anything interesting with the rules and words model. The computational model of the mind and language took classical structured descriptions of how we speak and think and converted them into computationally tractable structures (logical trees — not decision trees — and rewriting rules operating on them). This was the system underlying all expert systems developed in the 70s, 80s and early 90s and it led to AI being seen as a failed paradigm. That’s not to say that it was useless, it produced much of the data infrastructure that was then later used to train machine learning algorithms. And it also produced some basic working examples of machine translation and speech recognition. But it never produced anything even remotely resembling intelligence. It struggled to reliably translate even something so seemingly simple and routine as weather forecasts between 2 related languages.

There is an interesting paradox when it comes to computational power. Rule-based machine expert systems are much more computationally tractable when dealing with small sets of structured data. You can even do it on a piece of paper. However, they do not scale almost at all when confronted with the combinatorial complexity of the real world. The number of rules required to handle even the simplest words (such as ‘dog’) in all possible contexts is too monumentally staggering that even were the Moore’s law continue, it would still likely not catch up with the needs.

Stochastic approaches and neural nets on the other hand, can produce nothing of any real use whatsoever with little data or little processing power. But they scale wonderfully. Once you can run millions of iterations on hundreds of millions of tokens in training sets, they only get more useful. They can only learn regularities that exist in data that can be presented to them (so there is a limit to how far they can go) but if we tried to express those same regularities through rules, we would never get anywhere. We couldn’t come up with that many rules and if we did, we could not compute the outcomes.

But the rule-based systems are more consononant with our intuitions about how the mind works. Many of their foundations go back to the first philosophers be they in Greece, India or China. That’s what made the early AI systems like ELIZA so seductive. The rule-based AIs were obviously like us, we just needed to spend more time adding rules, and we’d be there in no time. But this was wrong. Because when we write down these rules we are cheating. The amount of rich intricate knowledge required to make sense even of trivial sentences like ‘The cat is on the mat’ is mind boggling. What are the rules for choosing ‘the’ instead of ‘a’ in one or the other word? Why do we say a bird is ‘in’ the tree but ‘on’ a perch? In Czech we say ‘on’ the tree, so why did English pick this? And how do we encode distinctions such as ‘in’ for tight and loose fit in some languages? How about local knowledge such as knowing that the mat is valuable so the cat being on is a bad thing or that that the cat could also be in the larder so its being on the mat is a good thing. Sentences in language analysis and logical propositions are not idealised examples of language or reason. They are snapshots of unnatural poses we manipulate them into when trying to take a look at them through themselves.

When I was first studying these approaches, stochastic processing was the timid new kid on the block but 10 years later, it was the source of all major breakthroughs in expert systems developments. But they hit a wall in the mid 00s, and new improvements were slow in coming. In the 1990s, connectionism and neural nets which had been the big hope for the future since the 80s were eclipsed by simpler approaches such as hidden markov models so much so that MIT was even considering dropping them from the syllabus of their AI course as late as 2010. But then breakthroughs in the engineering of multilayer neural nets (deep learning) were able to incorporate all the stochastical approaches and achieve a step change in the quality of the outputs and practical utility of machine learning.

Rule-based systems still play a role today but they only provide refinements on the edges, not the core power. Machine learning systems do not rely on rules nor do they produce rules.

This history is crucial to understand that there is a fundamental discontinuity between early AI research and machine learning. And an even bigger one between research into human speech and reason and machine learning. The best experts in machine learning are great applied mathematicians and computer engineers who only have a passing knowledge of the true complexity of human communication. This is not a bad thing for the task at hand. We’re running current models on the very edge of computational feasibility. But we need to be careful when relying on the judgements of AI experts as to how generalizable these advances are and how much we should believe that they reveal anything about how the mind and language actually work.

Acknowledgments in lieu of references

The above is based on my reading on the subject of metaphor, machine learning and organizations over many years. I did not take the time to insert references in the text but they would create a rich tapestry of interconnections.

The key readings on metaphor I had in mind were Donald A Schon’s work on generative metaphors and Gareth Morgan’s Images of the Organization. Dedre Gentner’s work on analogies is also very relevant. But my general thinking on metaphor has been shaped most closely by George Lakoff’s work on frames and idealised cognitive models.

My thinking on formal models and machine learning is based on accessing some machine learning courses and reading more informal work by people like Rodney Brooks and Andrew Gelman. I wrote my first paper on unification categorial grammars and have kept an eye on developments in NLP over the last 20 years even if it has never been the main area of my study or research.

My knowledge of organizational learning theory is the thinnest of them all. All of my reading here has been incidental. I draw more on my thinking about personal experiences and general reading about organizations influenced by ethnography and formal modelling in equal measure.