Conference summaries, Oct 2017 part 1.

Confident Personalities vs. Confidence Intervals, in the era of AI

Summary: observations from several talks at the recent Strata Data Conference, JupyterCon, and TheAIConf, plus a history lesson or two. Indications emerge for a parallel of the “collision” between data science and product management in the early 2010s, this time around as incompatibilities with AI adoption? This article looks back at earlier events in computing, considers indications, plus underlying issues and potential directions.

We had a busy season of data-related conferences recently. Among those I attended, presenting and helping host tracks: AI NY in June, JupyterCon in August, AI SF and Strata NY in September. Some consistent themes emerged, crossing between the conference programs:

  • repeatable science, which is both difficult and necessary
  • active learning, in the sense of human-in-the-loop design patterns
  • over-emphasis of objective functions in lieu of fitness functions which creates a recipe for disaster — both in automation and among people
  • where does topology fit into AI?
  • the “unreasonable effectiveness” of neuroevolution
  • advanced hardware as a big accelerant
  • software engineering no longer business-as-usual
  • talent crunch as the primary barrier for enterprise

Many of those points become simply mind-boggling, as one digs into details. Some of the most startling to me come from the hardware side, and probably the best summary is in my video interview with Wes McKinney about Apache Arrow. IMO, that represents a pervasive, fundamental change in software architecture, this time taking proactive steps to match stride with rapidly evolving hardware. Something which decades of emphasis on the JVM struggled to sweep under the rug.

In the midst of many keynotes, conference sessions, tech industry dinners, and hallway track discussions, one thing nagged at me. Uncomfortable memories recalled an earlier period during rapid “disruption”. In particular, another time with similar conditions, when hardware and systems-level changes were already well in place, particular areas of software practices shifted dramatically as a result, yet social systems lagged. Some organizations weathered that period much better than others. There were winners and losers, and in the broader scope of companies IMO that was largely because of the people involved.

My talks focused on active learning in NLP and content discovery, particularly how to use a mix of graph algorithms, deep learning, and ontology in the context open source for more advanced natural language understanding. Along with that, the corollary of applying the human-in-the-loop design pattern as a management strategy for teams that combine people and AI. So I kept revisiting the question of the people involved in AI systems, and almost every time that nagging feeling returned. One way to articulate it — at the risk of stirring up trouble — is to ask point blank: is Product Management going the way of the dinosaurs?

First, a brief history review. Admittedly from my experience and perspectives.

2010 was an interesting time for the tech industry. Recovering from the 2008 crash, “experts” were actively debating whether data science was appropriate as a new label or even valid as a discipline, whether cloud computing had much merit beyond marketing from Amazon, whether machine learning proved useful beyond mere pattern matching, and frankly whether the phrase artificial intelligence should even be used in a sentence. Comparing issues today, those debates seem almost absurd.

Early that year I’d joined a new ML team at Jive Software. Their execs were excited about opportunities to apply ML, NLP, graph algorithms, etc., on customer content within enterprise intranets — where the company’s core business focused. Thanks to a lead from Ro Monge, we’d pitched Jive execs with this idea in Portland, unfortunately just as they received an infamous letter from Sequoia Capital warning about 2008 austerity. Following some recovery and a pre-IPO move to Palo Alto, a couple years later they were ready to execute on ML . Our team included a good friend James Todd, and nearby other notable JVM heavy-hitters, including Craig McClanahan.

A recently promoted manager got assigned to our team. He dismissed ML generally as hype, calling it simply Pearson correlation by another name. Huh, really? To stress the point, he restricted our team from participating in the new Strata Data Conference by O’Reilly Media. Even if we took PTO, that was considered a “firing offense”. Some people really didn’t like the dialogue about data science circa 2010.

At the same time, Tim O’Reilly invited several of us who’d been working in big data, data science, cloud computing, ML, etc., to a breakfast in SF, to help brainstorm about where things were heading. I’d been an early customer for AWS in 2006, and led one of the first large use cases of Apache Hadoop on EC2, for a large-scale ML app in advertising. We’d hired a young consultant named Tom White in 2008 to write a crucial patch for Hadoop — resolving a JIRA issue about cluster I/O performance on EC2. Our team had become a case study for “Project 157”, later renamed Elastic Map Reduce. Add to that mix: early AWS dog & pony shows speaking alongside Andy Jassy, being a reference customer for early enterprise cloud adopters (e.g., SAP), and having data teams serve as “guinea pigs” for new AWS services, e.g., cluster computing and spot pricing. Also, our data insights team at ShareThis met regularly to compare notes with our neighbors DJ Patil, Monica Rogati, Pete Skomoroch, et al., at LinkedIn. Overall, that provided enough input to make reasonably clear judgements about where the industry was heading vis-a-vis data science.

Inklings pitched at an exec staff meeting, circa 2008: discussions green-lit displacement of $3MM of Netezza hardware with $400/day of AWS charges, for a mission-critical ML app — this “landscape diagram” trended for a few years as a top search result on Google for “cloud architecture”

That was a time of flux. Albeit, mostly flux for people: already the underlying technologies involved in a major industry disruption were proven and very much in place. Lightweight virtualization, loss functions plus regularization terms, data visualization — check, check, check. Even so, the social terms of the deal lagged considerably. People who were heavily vested in business intelligence tended to dismiss any discussion of “data science” or “machine learning”. People who worked in IT tended to prefer managing ~150 servers/admin, scoffing at notions that Amazon or Google could beat that rate by at least an order of magnitude. People who’d earned reputations (and certificates, and salary levels) as Oracle DBAs openly sneered at use of the phrase “big data”.

Years later, Google proved what they could do with AI, in multiple ways and counting. Notably by reducing their language translation code from 500K to 500 lines of source code with much better accuracy. Also replacing a myriad of ML algorithms used in self-driving vehicles with deep learning. The list goes on. Plus, overwhelming success with SRE, as if cloud computing needed further proof points. Those trajectories seemed obvious — at least to some — several years in advance.

I mention all this because 2017 feels so much like 2010 over again. All over again, it’s the people who are most likely the source of the core issues.

In 2010, an inexperienced manager called from Portland to say we’d get sacked for meeting with Tim O’Reilly to talk about data science and cloud computing. My immediate response was “Hey, you’ve got yourself a deal!” Packed up a few books, politely handed the Jive Software badge to a perplexed receptionist, then located the nearest Palo Alto sidewalk.

Big data/cloud/data science talk at AWS Start-Up Tour, circa 2009: following a guest lecture for Dave Patterson, Ion Stoica, et al., at UC Berkeley, invited to critique a paper by RAD Lab

If you’ve read Lean Startup, my next step was to join that company, where Eric Ries, et al., had originated those practices — heading efforts on big data, data science, etc. That role led to other related gigs: writing Enterprise Data Workflows, working with the Apache Spark team, and eventually joining O’Reilly Media. Participating along the way in more than a few Strata and related conferences.

Something we’d noticed in that 2009–2012 arc, when data science was hotly debated and yet becoming a thing: problems in adoption did not necessarily come from where one might expect, e.g., Business Intelligence, Operations, etc. Conflicts were often with Product Management. As we saw at Jive, at IMVU, at any number of other firms where I’ve consulted, advised, interviewed, held training, managed, performed due diligence, etc., since.

A larger narrative which had been playing out in industry, circa 2010-ish, was that Product Management and Data Science collided. The former tended to have confident personalities, plus ample vision about how customers would certainly embrace planned features — in what might be politely called “aggressive” natures. The latter tended to have data visualizations and confidence intervals which revealed, in many cases, markedly different stories about customer behaviors, top line revenues, LTV, ROI, etc. Your pick.

Then an odd thing happened. Looking back at some of the early people who were notable in data science — e.g., DJ Patil, Monica Rogati, et al., from the early LinkedIn data team — several moved into exec roles in Product. It wasn’t long afterwards — by my reckoning, 2014-ish — when Product Management as a discipline began to embrace the concept of “data-driven” across the board.

Out the door of Jive, circa 2010: next stop was a standing-room-only Data Science preso at Fenwick and West, with the slides making front-page on SlidesShare

I hosted one of the business tracks at Strata Data Conference in NYC in late September, where leaders from McKinsey, Deloitte, Accenture, Cloudera, Telefonica, EMC, etc., took stage to share insights about running enterprise business in the age of emerging AI. Most definitively we can say now that notions of data science, big data, cloud computing, and machine learning have become mainstream for the tech industry. To summarize from McKinsey, Deloitte, Accenture, etc., enterprise organizations now must be on-track with those exact disciplines, if they have any hope of competing in this new era where artificial intelligence applications in industry draw the dividing lines on business differentiation, profitability, and ROI. No short cuts.

Except that there’s a problem being called the talent crunch, where simply not enough people have spent years working in those combined fields. Say it again: Not. Enough. People. Spent. Years. To quote Werner Vogels from Amazon:

There is no compression algorithm for experience.

So instead , many “incumbents” are either going to shell out $$$ to the cloud providers — who, incidentally, own the barriers to entry for the AI space at the moment — or watch their respective market share get assimilated. Your pick.

Let’s take some forward-looking risks. Those are generally more entertaining than “I-told-you-so” moments, anywho. Let’s present a straw-man, notably that the problem isn’t inertia or complacency in organizations. The problem is fundamentally Product Management.

Because it’s become a crutch.

In the tech sector, some believe that upper management has become obsessed with themes of financialization. Not in a good way. However, steering core products and services needed by customers — that used to be the purview of top execs.

Let’s posit that somewhere along the way, core management functions became “outsourced” to Product Management, while the latter arose as a practice in the tech industry. On the one hand, product teams in tech now behave almost as microcosms for their respective companies. In many cases, product managers take the place of engineering managers. In some cases, there are small armies of software engineers being managed, servicing high rates of tech debt, possibly financed on cost-plus schemes or other nonsense — much of which leads toward “blue collar programmer” scenarios as likely outcomes for AI adoption, though that’s another story. On the other hand, consider that within tech firms which sell highly successful products — taking Apple under Jobs, or Tesla as examples — often the CEO leads, unquestionably, on product.

Nearly a decade ago Product Management and Data Science collided, in tectonic proportions. Not quite the magnitude of the Cascadia subduction zone, but big. After years of case studies, talent diffusion, big consulting firm reports, conference keynotes, etc., that mostly resolved. However, based on what’s emerging in the conferences now, it appears that another collision may be brewing between Product Management and AI. Only this time, the fundamentals of the problem aren’t going to be resolved by reading case studies and Gartner/Forrester reports.

A fairly linear notion has taken root in software practices, which some allege now eats the world:

  • product manager at a tech company writes a plan
  • the plan specifies features
  • engineers, led by a PM, implement those features — while working in opulent IDEs, according to carefully timeboxed schedules
  • unit test coverage results from that development
  • tests eventually pass through integration frameworks
  • new code gets deployed
  • customers get access to new products and services
  • revenues (probably) increase
  • build, measure, learn

All thoroughly agile, of course.

Except that, increasingly, that process no longer works. It’s usefulness as a guide to successful software business has screeched to a halt. I’ll argue that each step listed above no longer works in the context of AI. When tech debt begins to get scrubbed, aggressively, by disruptive AI apps the notion of having a small army of software engineers led by masterful product managers will seem about as quaint as horse-drawn buggies. IBM, Oracle, etc., haven’t quite mastered that technique yet, though they will acquire enough of the current round of AI start ups with that express purpose. While some companies haven’t quite gotten the memo yet, many more will soon.

Among my ilk, the linear manner of thinking outlined above gets called aristotelian. Pronounced with disdain, which unfortunately was learned over time the hard way. We tend to talk — perhaps a bit much — about things such as neuroevolution, second-order cybernetics, ontology, autopoiesis, inference, perhaps even operational closure and other terms which scare away the angry hordes. After deep learning runs its course, after it becomes widely acknowledged that AI requires much more than simply loads of neural networks, some of those two-bit terms will likely become more commonplace.

Now we face a problem, given the talent crunch which McKinsey, etc., noted. Product Management is inherently linear, while the application of AI in industry is quite the opposite.

Granted, perhaps I lack credentials to say such a thing. Please take me with a large grain (boulder) of salt.

However, do take a look at these talks by Peter Norvig from Google:

Also related, the excellent keynote talk at JupyterCon NY 2017 by Lorena Barba from GW:

…wherein one may recognize some of the two-bit terms from above, though Lorena is considerably more eloquent than this writer.

Point being, those who’ve ventured down the path of AI in industry have encountered marked circularities. Software business is no longer quite what you’d been expecting. Concerns grow that Product Management as a practice, and software engineering as the related field in general, will neither serve nor survive the influx of AI apps into mainstream tech, not without abrupt changes. While we experienced similar growing pains in the early 2010s with the introduction of data science practices, this time the disconnects appear much more structural.

Instead, perhaps one thing that’s needed is for CEOs to take back their central role in product definition, trade-offs, and leadership. Besides, CEOs rarely fit aristotelian molds. No outsourcing that crucial quality.

It’s probably a better idea than waiting ~4 years for today’s AI experts to start moving into Product Manager roles, if even. Or something.

While working on that, dear tech execs, please take a look at WTF? What’s the Future and Why It’s Up to Us. Time for linearized obsessions passed long ago.

Meanwhile, I’m grateful to be involved in a portfolio which embraces AI. Next time I’m evaluating a company, some of these questions will likely get raised.