Air travel and telecoms: different disruptions, same management problem

I had a fascinating conversation recently with Jasenka Rapajic, a polymath engineer and management consultant who specialises in aviation. She is author of the book “Beyond Airline Disruptions”, which captures a fraction of her insight from several decades of operational management experience of airlines.
 
The airline industry is based on a “stocks” model of seat reservations, with the goal being to maximise load factors and revenue yield. Whilst the physical planes and passengers might fly about, its legacy management systems are not truly flow-based. For instance, people are processed in large batches, and are frequently buffered, such as long waits for customer service and rebooking after a cancellation.
 
The airline industry is a networked one, which means that it is subject to natural variability at all scales, in both space and time. This could be as small as a late crew arrival at a single plane delaying departure, all the way up to complete network shutdown due to natural causes (like a volcano) or unnatural ones (like an IT systems collapse). These major systemic outages may be infrequent, yet have an outsized impact on both brand and bottom line.
 
This innate service variability is not welcomed by the passengers, who have an awkward propensity to grumble, since their hope of perfect punctuality cannot be met in a real world with storms and strikes. There is little visibility of the individual human impact of variation in service quality, since passengers are treated by airlines as self-loading cargo. The ability of the airline to engineer “personally managed disappointment” is therefore relatively limited.
 
Airlines see “disruptions” (delays and cancellations) as a pure negative, resulting in revenue loss, internal cost, and foregone customer goodwill. As such, they work to eliminate as many disruptions as possible. This comes at a price, as you must have slack in the system (such as spare parts, idle aircraft, standby crews and extra landing slots). The internal career rewards are attached to revenue-growing activities, not risk-reducing ones, so there’s little promotion value from fine-tuning this slack.
 
The danger is that the CFO sees this “slack” as a target for slash-and-burn cost-cutting. The result is that the airline locally optimises for each function, such as maintenance, crew rostering, aircraft scheduling, seat inventory management, etc. Yet the collective set of interactions, which is what the passenger experiences, is highly sub-optimal. Cost accounting takes over the customer experience, which results in the kind of mess that EasyJet has faced at Gatwick.
 
For instance, when you reduce the inventory of spare parts, you may suddenly find you have to cancel a flight when there is no spare readily available. The global coupling of the local decisions of each operational unit is not visible. As a result, the “hazard arming” to the end user experience of the management choices is not controlled. This is especially true when system constraints, such as slot capacity at Heathrow, remove the slack needed for recovery from failure.
 
Consequently, airline operational managers spend all their time fire-fighting the emergent failures of the system. What is normal variation versus abnormal variation is not clear: the service quality is not properly defined, and certainly not from the customer’s perspective. How long should it take for everyone on a cancelled flight to be rebooked? In the recent case of my brother on British Airways in Edinburgh, it was a 3+ hour wait standing in a queue.
 
Airline managers are so busy working in the management system, they have little spare resource to work on it. This may sound horribly familiar to anyone in the broadband networking business, which is an endless fight against complexity and collapse.
 
Given the huge number of “dials” and “lights” in the network operations centre, which ones are actually important? Why are the customers still angry when the service quality indicators are green? How to change any one process without breaking something else unintentionally? The parallels are obvious.
 
I see many similarities between aviation and telecoms.
 
The first is that these are “complex resource trading systems” carrying lots of historical baggage. Both reallocate transport resources between a dynamic demand and supply, and have to do so in both space and time. They have accreted IT systems and processes over long periods, and these act as an anchor to how quickly they can react to change. For instance, the Passenger Name Record (PNR) in aviation and Call Detail Record (CDR) in telecoms are inviolable systems architecture anchor points, and constrain service innovation.
 
The resource planning and reservation process for both telcos and airlines makes them prone to service under-delivery and over-delivery, with often misaligned expectations between the user and service provider. Neither industry has a good definition of what the service being delivered actually is, or what the acceptable failure and disappointment should be under different levels of ordinary and extraordinary variability.
 
The visibility of the true user experience is limited in both industries. For airlines, it is a bodily experience, and they don’t know what you are feeling. In my case, I had awful vibration in a wing seat on by British Airways flight from Montreal to London a few weeks ago. It was a 787 Nightmareliner, and not an experience I would wish to repeat. (I’d have paid for a downgrade to a vibration-free seat!) For telcos, the user experience is a computation that’s happening outside of the network, and typically invisible to the network operator.
 
The airlines have taken to talking about the customer experience, and deprecating the term “passenger”. Likewise, telcos talk about the customer experience, and avoid thinking in terms of “users”. Yet both are involved in enabling very human activities, and it may be delivered to someone who is not the person paying. As such, they are both subject to agency problems, whereby the incentive is to over-promise to the enterprise customer, and under-deliver to the passenger or user.
 
Both industries struggle to deal with failure as a normal part of operations. The very name “disruption” implies something is wrong, when actually “arrival time variability” is perfectly acceptable and necessary part of travel. For the airline, the financial management system fails to attach retention rewards to designing good “disruption experiences”, so there is under-investment in that area of the experience. For a telco, they try to deliver every packet as if it was interactive real-time video, refusing to construct “economy” service classes for bulk and time-insensitive traffic.
 
Neither industry really understands how the sum of the operational parts results in the experience whole. I discussed this at length in my popular article on “Brand suicide case study: British Airways”. The passenger experience is not a series of disjointed activities, but is the cumulative effect of their interaction. In telecoms, the basic ability to accurately predict the performance of a composed set of subsystems is missing from mainstream engineering practise. We just don’t know how to repeatably “compose” an experience, with core service delivery knowledge often being tacit, held by a select few personnel with long service.
 
When Heathrow was being built in the 1940s, a major government concern was the loss of prime agricultural land! Then in the 1950s the terminal buildings were put close together, without room for parking, since passengers were presumed to have a chauffeur. Each industry has waged a long fight to squeeze more supply out of fixed resources in the face of exploding demand for popular low-cost transport.
 
As such, each industry is struggling with the same fundamental issue. The management system and implied paradigm was created for a different age, with different needs. You can only get so far with incremental change before you need to fundamentally refactor the model. Dealing with constraints like runway and slot capacity means rethinking the model, just as telecoms has to stop throwing supply at all problems, but instead use what it has differently.

Yet neither the airline industry nor telecoms has yet faced it “lean” quality revolution, and is stuck with an industrial-era “batch” paradigm. In each case, the enabler to radical change is to define the “pull” outcome that the customer values, and separate this from the “push” resource input. Both are “MS-DOS” sophistication industries looking for a higher-level “Windows or Unix” type of business “operating system” upgrade.
 
This transformation means moving from a very concrete resource model (e.g. plane seats, time slots) to one that abstracts its end-user value (e.g. destination arrival, application performance). You then sell the abstract outcome, not the underlying concrete resource.

Good abstraction is what lets us cut through complexity when done across the whole service lifecycle: design, sales, delivery, and support. It allows us to make rational management interventions, safe in the knowledge that we won’t experience unintended side-effects. It is how we go from emergent to engineered experiences.
 
If airlines sold virtual “timed arrival options” instead of physical seat reservations, they would have more flexibility to deliver passengers across their network. This is especially true of global airlines and alliances, where there are more degrees of freedom to trade people and resources around.

A “lean” model with “virtual services” would also support new kinds of product. Maybe backpackers could travel on a revamped standby ticket where you couldn’t even be certain of arriving in a specific airport on a specific day, but rather country or region over a range of dates. A professional orchestra on tour, where everyone has to arrive in time at the same place, could be accommodated with a different performance contract to an amateur choir, where a few missing voices might not matter.
 
In the case of telecoms, we now have the requisite abstraction tools: RINA as the generic “information teleporting” container API, and ∆Q as the quantitative language in which to express the “performance contracts”. Conceptually, the abstraction of supply and demand is a solved problem. It just needs a few years to reach market acceptance.

“Travel disruptions” are innate to the operation of packet networks, and variable quality is the welcome price we pay for the low cost of statistical multiplexing of costly transmission resources. This virtualised network world is a playground of unexplored possibilities. We can engage in forms of yield management that airline chiefs cannot begin to dream of.
 
Low-cost carriers transformed the airline industry with radically simplified services that break the cost vs quality trade-off. The opportunity is there for the “lean telco” to do the same for networked communications. We can have both better and cheaper. This is achieved by scheduling resources appropriately, and redefining our services as user-centric outcomes rather than network-centric inputs.
 
The barrier to lean telecoms transformation isn’t technical, just as you don’t need new aircraft to reinvent airlines. Our core constraint on progress is human: little imagination, limited ambition, and (just like airlines) legacy management methods.

About Martin Geddes

I am a computer scientist, telecoms expert, and consultant. I collaborate with leading practitioners in the communications industry to create game-changing new technologies and businesses.

martingedd.es