# Sense and Scalability

Published in

--

In an era of AI adoption in industry, stark contrasts in our thinking begin to show about how we leverage computing, data, and inference. This article considers graph technologies in the context of business: enhancing human thinking and enabling data exploration, especially among teams of domain experts augmented by AI applications. Specifically, let’s develop and deconstruct the notion of graph thinking.

# Contexts

Suppose you have an errand to run, such as shopping for groceries: “Remember to buy eggs and more rice on the way home from work today.” The needs are clear, and your approach is well understood. People use phrases such as “It’s not rocket science” to describe the level of competency required here. In terms of math, you understand that:

2 + 3 = 5

Because you memorized arithmetic tables years ago. Or perhaps still count on your fingers? In any case, let’s call this a “Simple” context. This is where Captain Obvious dwells.

Now suppose you face a more challenging problem — for example, refinancing the mortgage on your home, for the first time. While the answers probably exist, they won’t be particularly clear to an average person. So you involve an expert in loan refinancing — and by the time you’ve closed on the loan and filed personal taxes that year, you probably will have involved a few more experts. This is not exactly rocket science, although the matter is non-trivial and you don’t want to get it wrong. Someone with requisite expertise can analyze the situation and solve for what’s needed. In terms of math, if:

x + y = 5
x − 2y > 1

Then what are the possible values of “x” and “x” based on these constraints? Let’s call this a “Complicated” context.

What happens when a situation becomes even more complex? Suppose that no single answer exists? For example, consider when a student is graduating from a university and selecting among potential first jobs. There’s no “formula” to determine the best job. A parent might try to hire an expert to counsel their offspring about career choices, although the phrase “Let me know how that works out for you” seems appropriate here. Even so, in most cases there are likely to be some indications, some emergent patterns which can help guide an informed decision. We’ll spare you the math, although complexity theory is one area that addresses these kinds of provably hard problems. Let’s call this a “Complex” context.

Suppose that a situation devolves into chaos. The time is 04:30 in the morning, a raging wildfire is heading toward your newly refinanced home, and the local fire department has been knocking on your front door, ordering your family including a visiting recent graduate to evacuate. Now. Crisis management is the appropriate response: you proceed swiftly to a suggested area of relative safety and stability. Let’s call this a “Chaos” context.

Or, sadly, what if there’s no refuge possible? Let’s call that a “Disorder” context. Other contexts can devolve into this. All bets are off. Run!

The paragraphs above describe the five contexts of the Cynefin framework by Dave Snowden, et al. — see [Snowden2007]. Each context defines a set of approaches required by leadership. In the first of these, the “Simple” context of known knowns presents a stable situation with clearly defined cause and effect. Not much expertise is required to perform these kinds of tasks. Leaders need to establish facts, categorize needs, then instruct people to respond based on rules, i.e., best practices.

The “Complicated” context of known unknowns requires expert analysis to determine cause and effect. Leaders must assess the available facts, leverage experts to analyze the situation, then apply the appropriate norms to take action based on decisions about trade-offs.

The “Complex” context of the unknown unknowns was made infamous by US military leadership in response to the 9/11 attack on New York City and Washington DC. While some aspects of cause and effect may be deduced in retrospect, these are impervious to reductionist methods. While no “right” answers will be immediately available, instructive patterns can emerge to help make informed decisions. Leaders must probe the situation, sense for these emergent patterns, then respond accordingly — where experimentation and some tolerance of risk and failure are probably the best resorts.

In the “Chaos” context, cause and effect are unknowable. Leaders must sense where stability exists, then respond to convert the chaotic into the complex, to attempt to reestablish order.

Snowden developed the Cynefin framework to describe different kinds of business contexts plus the sense-making and leadership approaches which are most appropriate for them. The subtext is that while “Simple” or “Complicated” contexts had dominated business thinking and priorities, by the 21st century most business had shifted into the “Complex” context. Vastly different leadership approaches became necessary.

This is our world today, where the competitive challenges of AI are being all but ignored by the executives at more than half of the enterprise firms. Frankly, those executives tend to be preoccupied by the multi-faceted challenges of other risks to business: climate change, ongoing pandemics, political polarization, regulatory compliance, escalating cyber threats, and so on. Our world is complex, and while there are no simple answers to these kinds of problems we can leverage emergent patterns.

# Antipatterns

The Cynefin framework also describes antipatterns. For example, entrained thinking among leaders may contribute to complacency and cause oversimplification leading to disasters.

In 1982, executives at AT&T orchestrated a consent decree with the US federal government, effectively breaking up a century old state-sponsored monopoly of the Bell System. This powerhouse of American technology innovation had invented the telephone network, transistors, communications satellites, the Unix operating system and the C and C++ computer languages, and so on. AT&T executives agreed to spin-out their “Baby Bell” subsidiaries, in exchange for being permitted by the US government to compete freely as a computer vendor. They sought to compete against IBM in what was assumed to be a highly lucrative sector.

However, this situation was a classic case of oversimplification. AT&T executives did not recognize that IBM was in a tailspin: a decade later IBM would post an \$8 billion loss — the largest ever in US corporate history. Instead AT&T leadership applied 20th century thinking about “Complicated” business contexts. They assessed the available facts and leveraged analysis conducted by a small army of legal, financial, political, and business experts. Mountains of data analysis and reporting got consumed. Their entrained thinking drove AT&T to pursue “low-hanging fruit” by emulating IBM as the category leader. Their lack of situational awareness about a “Complex” business context meant that these experts and executives did not perceive the importance of two emergent patterns which proved fatal for AT&T:

• personal computers: six months earlier, IBM had launched their PC business, which paved a way for Microsoft and Apple to become the dominant firms in that area
• internetworking: eleven months later, Stanford launched the Internet, which ecommerce start-ups such as Google and Amazon would use to eclipse the technology leadership of Bell Labs

AT&T purchased a 25% stake in Sun Microsystems, though its struggling AT&T Computer Systems venture never quite gained traction. The world shifted instead to desktops, laptops, and smartphones. Some technology experts from Bell Labs moved to Google. In 2005, one of the “Baby Bells” — Southwestern Bell, notorious among its sibling RBOCs for being consistently “strong and wrong” — executed an aggressive takeover of its parent and renamed itself AT&T. The hyperscalers such as Apple, Microsoft, Amazon, Google, Facebook, etc., took over leadership in the technology sector.

# Learning

Dave Snowden leveraged pedagogy and learning theory while developing the Cynefin framework. Let’s consider what we know about the developmental aspects of human cognition, especially about how people organize knowledge as they progress from “Novice” to “Expert” in a particular subject — see [Ambrose2010], pp. 46–65.

We can represent this in terms of nodes (i.e., pieces of acquired knowledge) and edges (i.e., relations between nodes) as shown in the diagram:

“Novices” tend to struggle with a subject initially, constructing sparse and relatively superficial mental structures — visualized as an unconnected set of simple facts. This level of cognition uses context-free features and rules, with no responsibility for consequences, and cannot handle uncertainty in the data.

“Advanced beginners” tend to connect knowledge into chains of association — visualized as sparse links between facts. Use of more sophisticated rules becomes situational, although that typically requires sequential access for large units of information, such as recall by association. At this stage, a learner begins to ask questions.

“Competent practitioners” tend to use hierarchical procedures for making decisions — visualized as decision trees. At this stage, a learner can formulate plans.

“Experts” tend to organize knowledge in highly connected ways, which helps them leverage understanding (not facts) — visualized as graphs. At this stage, learners can contend with nuances, uncertainty, exceptions to the rules, and for sense-making they leverage emergent patterns.

One can see how the knowledge organization of a “Novice” learner corresponds closely to a “Simple” context for a leader. The approach of “Competent practitioner” corresponds to what’s indicated in a “Complicated” context. Experts, however, tend to approach learning as if they were confronting a “Complex” context and attempting to shift it into a “Complicated” context. They apply sense-making among the unknown unknowns by recognizing emergent patterns within graph-like conceptual structures to analyze which decisions are indicated. Over the past five decades the field of AI has been trying to convey that these situations are where a combination of AI plus domain expertise together excel.

A detailed exploration of leveraging patterns is given in [Alexander1977], based on the famous Oregon Experiment. Christopher Alexander developed a pattern language to defuse a highly contentious situation (e.g., “Chaos”) by incorporating feedback from diverse stakeholders. Student riots at the University of Oregon occurred in the wake of multiple deaths. Poor city planning resulted in heavily loaded logging trucks driving through campus at speed, which killed students who were walking or riding on bicycles. Alexander’s work incorporated “domain expertise” from students, industry representatives, professors, and even the janitorial staff, to rebuild the university campus. Subsequent architectural design at the university corrected the city planning.

Alexander’s work has been foundational for the field of architecture. It also informed Ward Cunningham (from Oregon) who pioneered the use of software design patterns, including in his invention of “wiki” software which formed the basis for Wikipedia. There is a distinct theme here in overlap between “patterns as language”, leveraging domain expertise, and representing expert knowledge in a web (read: graph) organization of data.

# Data Management

The report titled Fifty Years of Data Management and Beyond (see [Nathan2019]) reviews the history of data management. This provides a decade-by-decade analysis of challenges faced by the computing industry and the kinds of data management technologies which emerged as a result.

Early work in the 1960s focused on a simple data processing model, where input ran through some code for processing, which generated output. In 1970, Edgar Codd proposed relational database theory (see [Codd1970]) in response to the inefficiencies of COBOL programmers working with hierarchical databases on mainframe computers that had dominated the 1960s. A loose interpretation of Codd’s work became popularized as SQL. Then in the late 1980s, enterprise data warehouses and business intelligence practices built upon relational databases and extended their capabilities for analytics.

During the 1990s these relational technologies dominated data management, emphasizing two aspects:

• storage and retrieval of facts or aggregated summaries of facts
• organization based on facts, dimensions, and indexes

If one examines these closely, there’s an uncanny correspondence between the OLTP use of relational databases and the “Simple” context in Cynefin as well as the “Novice” learner approach to knowledge organization. Similarly, the OLAP use of data warehouses and BI analytics bear uncanny resemblance to the “Complicated” context and how “Advanced beginners” organize knowledge.

Anecdotally, IBM data management executives would joke privately during that relational database technology had been aimed at “B students”. In other words, enterprise firms which needed to staff large departments couldn’t afford to hire solely among “A students”, but relational data afforded a relatively safe, simplified computing space for a wider range of talent to be effective.

Engineering trade-offs that enable efficient use of relational databases come at the expense of having data stored in normalized ways, i.e., using normalized relations. In other words, data gets atomized into sets of facts, which become nearly unrecognizable unless accessed through complex queries. Moreover, data which does not fit into the predetermined static relations (e.g., relational “schema”) simply does not get stored into the database (see [Olson2003]). As a direct result, generations of IT have come to assume that sense-making with data depends on querying known facts. Unfortunately, this approach to data infrastructure does not work effectively to support how either people or machines learn, especially in complex business contexts.

# Hyperscalers

The report titled Hardware > Software > Process: Data Science in a Post-Moore’s Law World (see [Nathan2021]) discusses two divergent lines of priorities for computing that emerged during the early 2000s. On the one hand, many still invoked a relatively simplistic 1960s-era trope: input processed through code produced output. With the advent of the Agile Manifesto in 2001, this segment in the industry prioritized fast iteration on a code base, while the use of data was literally omitted from the discussion. Over the subsequent twenty years, various interpretations and adaptations of Agile methodology became dominant in software engineering throughout industry.

On the other hand, in the late 1990s a small cohort of firms engaged in ecommerced recognized that relational database vendors (specifically, Oracle) were extracting most of their revenue as their businesses grew. In late 1997 (see [Nathan2019]), a team at Amazon famously “split the website” moving their ecommerce applications to a Linux cluster, migrating away from use of increasingly larger Oracle servers. Similar efforts were underway at eBay, Inktomi (which became Yahoo! Search), and the not-yet-named Google project at Stanford. This approach accomplished several key things:

• a basis for what became cloud computing
• projects which became Big Data frameworks for NoSQL work, such as MapReduce
• a “virtuous cycle” (see [Ng2017]) of machine data (i.e., log files) aggregated from ecommerce webapps, then used to build machine learning models, which enhanced the webapps

Hyperscalers have extracted trillions through ecommerce. These businesses have also applied a three step barrier-to-entry process:

1. preemptive hires of top AI talent
2. large cloud operations
3. business lines specifically developed to produce labeled data

The data gets used to train machine learning models on cloud computing resources, led by expert AI teams.

The nascent hyperscalers also subscribed to an alternate viewpoint. In lieu of prioritizing fast iteration on a code base, they prioritized learning with data as a competitive differentiator. Over the subsequent twenty years, widespread adoption of data science practices along with the explosion of deep learning use cases accelerated this approach, and the hyperscalers have since grown into trillion dollar companies.

Note that these two divergent viewpoints about computing do not reconcile. At this point more than half of enterprise firms globally are relatively fixed in the former camp, with IT practices focused on relational data warehouses, data lakes, and siloed data practices that tend to preclude AI adoption. Meanwhile, firms with first-mover advantage (see [Lorica2019]) have been accelerating their investments in AI based on demonstrated ROI since the mid–2010s. A gap is emerging.

Note that the machine learning emphasis (algorithmic modeling) of the hyperscalers and business that have emulated them during the 2000s and 2010s corresponds to a “Competent practitioner” learner stage and its approach to knowledge organization. However, this does not embody the Expert stage. By no small coincidence, there’s a popular saying among data science practitioners that domain expertise is more valuable than machine learning models.

In the “Expert” stage, learners perceive emergent patterns and use graph-like cognitive structures to organize knowledge. In the “Complex” context of Cynefin, leaders must sense emergent patterns to make informed decisions, often based on experimentation. Graph technologies provide means for representing domain expertise and knowledge, including ways to represent and leverage uncertainty — those unknown unknowns. Patterns get used to query graphs, and overall there’s striking correspondence between “Complex” contexts, “Expert” cognition, and graph technologies for leveraging data. Business today must perform in complex contexts.

Almost all of the hyperscalers make presentations and papers describing the business importance of their large-scale graph practices. Amazon is probably the largest among these within its Product Graph, although Microsoft, Google, Apple, Facebook, Alibaba, etc., have all disclosed about their graph practices at scale. Several of these firms had been involved with open source projects for graph technologies as well. However, notably, since 2016 the hyperscalers appeared to have backed away from public disclosure about the operational side of their graphs.

Relational database approaches emphasize the storage and retrieval of facts. Software engineering has emphasized coding over learning. However, the hyperscalers made abrupt moves away from both points. Large scale graph practices have become “secret sauce” among the firms with first-mover advantage in AI.

# Graph Thinking, Redux

To reiterate, issues as complex as climate change and sustainability cannot even begin to be approached while armies of corporate IT people continue to fetishize the storage and retrieval of facts. Meanwhile their bias about using data eschews the importance of leveraging uncertainty and complex patterns of relations within that same data. Expert response to complex context does not reconcile with a 1990s approach to data. The first-mover advantage in AI of the hyperscalers does not reconcile with a 1990s approach to data.

This is where graph thinking becomes crucial. Denise Gosnell and Matthias Broecheler explore the notion of graph thinking in [Gosnell2020], specifically in their first chapter — which includes a condensed history of data management practices examined in four stages between the 1960s and current day: https://www.oreilly.com/library/view/the-practitioners-guide/9781492044062/ch01.html

While I don’t agree 100% with how some important points about history got condensed, Gosnell and Broecheler make excellent arguments overall. The authors draw an effective parallel between CODASYL as a standard and some of the motivations for contemporary graph practices. It bears mentioning that Edgar Codd was reacting to CODASYL and COBOL in general when he developed relational database theory.

In the section titled “What Is Graph Thinking?” Gosnell and Brocheler touch on inherent connections between graph representation, means for addressing complex problems, and querying by pattern. The section titled “Seeing the Bigger Picture” provides a flowchart for mapping where and how graph thinking leads toward work with graph databases versus graph algorithms. This is quite useful, and could be readily extended. It points at a key issue of the separation between graph database and graph computation, which most of the vendors seem to avoid.

This article is a longish-winded attempt to deconstruct some of the backstory for graph thinking. Introducing the Cynefin framework establishes which contexts are good for relational data versus graph technologies. Introducing the learning theory refocuses the discussion of knowledge organization away from “storage and retrieval” and more toward how expert learners behave. Through graph technologies, we can help people and organizations to organize knowledge in expert ways. This also contextualizes the use of uncertainty within the data: nuances about category breakers and exceptions to rules are implicit in expert cognition.

One red-flag to note is that the “experts” tend to be aging out of the workforce. The void left by mass retirements among the Baby Boomer generation begs a question: who replaces them? Especially in the multidisciplinary subjects, where business operates within inherently complex contexts — such as manufacturing, where 10 years experience is still relatively “new” — how can younger practitioners accelerate their expertise, especially when they already face substantial demands for continual learning?

Acquisition of expertise becomes relatively more expensive, both for the individuals involved and for organizations overall. This is where the use of emergent patterns in graphs becomes crucial for transfering expertise:

Studies indicate that when students are provided with an organization structure in which to fit new knowledge, they learn more effectively and efficiently than when they are left to deduce this conceptual structure for themselves — [Ambrose2020] p. 53

If we can ditch the data management fetish about storage and retrieval of facts and instead move toward representing expert cognition with graphs, one large potential outcome is that enterprise organizations will have better means for bringing new people up to speed. People learn faster and better when they can leverage cognitive structures from others who are already more expert in the subject. Graphs allow for this kind of approach.

Similarly, another point that Gosnell and Broecheler didn’t include explicitly in their description of graph thinking: in the era of AI adoption, managers must think in terms of leading teams of people and machines. In terms of the effective management of machines, graph technologies stress current levels of computing capabilities: complex NP-complete problems are expensive to compute at scale. We need better cluster scheduling than the current generation of Kubernetes provides; see [Nathan2021] for more discussion.

However, we can also look at how teams of people and machines learn — which, in effect, is now the nature of how organizations learn. That’s where graph thinking plays a vital role. To set up this argument, consider the following diagram where there are two-way feedback loops between communities of domain experts (staff), machine learning models (AI), and the customers and market in general:

This diagram depicts the sense of “Hybrid AI”, to leverage learning — in the pedagogical sense of knowledge organization — for teams of people and machines together.

To summarize about graph thinking as a kind of rubric, first ask:

• What are you going to do with the relationships in your data?
• Where do you distinguish between graph data management and graph computation?

• Acknowledge the complexity of the context; muzzle any laggard reductionist methods
• Organize knowledge in terms of entities (nodes) and relations between them (edges)
• Use domain expertise to construct schema — as overlay, not to preclude data
• Where possible, represent uncertainty, nuances, exceptions to rules
• Allow for experimentation and define acceptable risks
• Probe for situational awareness, within a complex/graph context
• Identify emergent patterns in the data, especially among the relations
• Leverage inference to make informed decisions and augment the graph

The last point can be a tall order. To wit, relational databases normalize relations, but tend to suffer from serious performance issues when attempting to run self-joins. Translated: with a relational database and SQL queries one tends to consider any given data entity in the context of a very limited neighborhood. However, with graph technologies, entities may be considered in the context of neighbors that are several “hops” away. Often the trick to leveraging graph traversals is to identify patterns among neighbors, then transform the data to highlight those patterns.

For example, if a group of authors tend to write papers about a set of topics and the individual authors are associated with a set of research labs, one might infer probabilistically that these particular research labs are interested in these particular topics. Especially when that kind of pattern shows up thousands of times within a dataset. Annotate the data in the graph to capture that inference — which is often not expensive in graphs, but could be orders of magnitude more expensive in a relational database. That’s one form of inference. As you learn more about graph data science, there are many more forms to learn and use.

Overall, graph thinking enables organizations outside of the hyperscaler club to leverage the “secret sauce” of graph technologies for sense-making in complex contexts. This approach connects the respective strengths of machine learning and organizational learning together. We won’t be able to contend with the provably hard 21st century problems in business without this.

[Alexander1977]:
A Pattern Language: Towns, Buildings, Construction
Christopher Alexander, et al.
Oxford (1977–08–25)
https://derwen.ai/s/fdjv6zdjfcz3

[Ambrose2010]:
How Learning Works
Susan Ambrose, et al.
Jossey-Bass (2010–05–17)
https://derwen.ai/s/s8kgr56cycsp

[Codd1970]:
“Relational Completeness of Data Base Sublanguages”
Edgar Codd
CACM (1970) 13 (6): 377–387
https://derwen.ai/s/5t729nt54r47

[Gosnell2020]:
The Practitioner’s Guide to Graph Data
Denise Gosnell, Matthias Broecheler
O’Reilly Media (2020–05–12)
https://derwen.ai/s/mvhvxn4p9nnx

[Lorica2019]:
Ben Lorica, Paco Nathan
O’Reilly Media (2019–02–20)
https://derwen.ai/s/4fs5d7r8bv3m

[Nathan2019]:
Fifty Years of Data Management and Beyond
Paco Nathan
O’Reilly Media (2019–04–29)
https://derwen.ai/s/bw2vq3wmxtdx

[Nathan2021]:
Hardware > Software > Process: Data Science in a Post-Moore’s Law World
Paco Nathan, Dean Wampler
Manning (2021–05–25)
https://derwen.ai/s/7wfbpdtb5t42

[Ng2017]:
“Artificial Intelligence is the New Electricity”
Andrew Ng
Stanford GSB (2017–02–02)
https://youtu.be/21EiKfQYZXc

[Olson2003]:
Data Quality: The Accuracy Dimension
Jack Olson
Morgan Kaufmann (2003–01–09)
https://derwen.ai/s/hg2b3b5d7q8q

[Snowden2007]:
“A Leader’s Framework for Decision Making”
David Snowden, Mary Boone
Harvard Business Review 85 (11): 68–76
https://derwen.ai/s/vwmnxjw2k54r

Kudos to Jürgen Müller, and other co-authors on previous works listed above, who’ve helped develop many of these ideas.

--

--

evil mad sci, derwen.ai/paco ; lives on an apple orchard in the coastal redwoods https://mastodon.green/@pacoid