Of Butterflies and Floods

Can Big Data Save Us From Climate Catastrophes?

On Sunday May 3rd, 2011, after a contentious legal battle, the United States Army Corps of Engineers began blowing up levies at the confluence of the Ohio and Mississippi rivers near the border between Missouri and Illinois. In the process the Corps saved the city of Cairo, Illinois from being destroyed by one of the worst floods of the century, even as it inundated nearly 130,000 acres of Missouri farmland, obliterating the harvest for the year.

In all, the storms and flooding on the Ohio and Mississippi rivers in the late spring of 2011 destroyed hundreds of homes, ruined crops and cost billions, making it one of the costliest flood events in US history. Later that same year, across the globe in Thailand, months of unrelenting rain saturated the landscape and eventually flooded scores of factories and parts manufacturers, upending global supply chains and reconfiguring the competitive landscape of automobile and electronics manufacturers.

For many climate scientists, these two floods were part of a worrying trend. By many accounts, the economic costs of natural disasters are growing, even as the average death toll falls. Globally, floods take a particularly devastating toll: insured annual flood losses have risen from between $1–2 billion per year in the 1970s to $15 billion in 2011 according to global insurance giant Swiss Re. But in spite of these growing costs, approaches to managing these disasters have not changed much since the middle of the 20th century.

Current flood control and water management
schemes have not been improved for decades.

“Current widely adopted flood control and water management schemes have not been improved for decades,” says Mengqian Lu, a Postdoctoral Research Scientist in Applied Physics and Math at Columbia University. “There are many reasons associated with this.” One of the main ones, she says, is that in spite of the growing confidence of climate researchers in making season-ahead climate predictions, “we haven’t earned trust from decision makers to have the confidence to change from seasonal planning to a robust, multi-timescale scheme that is more adaptive to changes and volatility.”

In order to achieve that goal, scientists needed to address the dearth of reliable forecasting between longer-term climate projections, which miss crucial information on volatility, and weather “nowcasting” which doesn’t give decision-makers enough time to adjust to rapidly changing conditions. “There is also the political reality that longer term issues rarely intrude into decision-making unless they can be translated into more immediate concerns; and most likely people would just think the whole matter of climate change is a storm in a teacup, no action will be taken at all.”

According to Lu, this may all be about to change.

Lu is part of a multidisciplinary team of hydrologists, climate experts and environmental engineers headed by the Columbia Water Center, part of the Earth Institute at Columbia University. She believes she and her colleagues may have cracked a key global code for intermediate-term flood forecasting — a code that could unlock revolutionary new approaches to addressing flood and other disaster risks.

“The big data is the rice, the algorithm is the recipe, and I am the lucky chef who made the dish happen,” says Lu.

What distinguishes this approach, explains Lu’s mentor Upmanu Lall, is that rather than come up with a climate model and try to validate it through data, Lu has mined raw data to find patterns that weren’t obvious before. Lall, the director of the Columbia Water Center, is a world-renowned hydroclimatologist who has been working on the mechanics of extreme flood forecasting for almost two decades.

“The way science works,” explains Lall, “is you kind of go ziggy-zag on things, where either someone has a theory and you try to find whether or not that works with data, or you find something form data and you try to understand it. What’s been going on now for quite a while [in climate forecasting] is that we’ve taken whatever we think in terms of theory and we’ve run the models and we’ve seen whether it’s working or not, and generally we’re finding its not working. And for most things related to precipitation dynamics, we’re finding that they’re quite a few things we don’t understand that are there in the models.”

As a result, Lall says, “we don’t know whether the reason the models don’t predict is because there’s no predictability, or if the questions are wrong. So the complementary side of it is to take a computer science approach and say, okay, in certain cases I have a lot of data, so from that data can I identify what some structures are.”

For the first part of her investigation, Lall explains, Lu identified a pattern between the ocean temperatures and atmospheric pressure, without looking at rainfall. “But that’s boring,” Lall says. So Lu took the pattern and tried to see if it was useful. “And,” Lall says, “She finds that that pattern actually does lead to prediction of rainfall extremes, not just regular rainfall, but above some level.” This is important because “extreme rainfall leads to floods; more particularly what we’ve found before was that repeat episodes of rainfall that are close together — like every seven days — those really lead to major floods.”

It is, Lu explains, “a piece of the puzzle that, if it can be added to existing management and operations schemes, would result in trillions of dollars of losses avoided for governments and businesses,” by providing forecasts of extreme flood risk in particular places up to 30 days in advance.

In order to understand how this could be so, it’s useful to step back and appreciate the dramatic changes that taken place in climate science over the past two decades. While headlines have understandably focused on the growing scientific awareness and alarm over the threat of human-caused climate change, less has been said about the equally dramatic revolution in scientific understanding of natural climate cycles — in particular, the way in which climate conditions in one part of world can have dramatic impacts across the globe, and the way in which climate varies and shifts, both year-to-year and over decades.

“The way science works,” explains Lall, “is you kind of go ziggy-zag on things.”

People have been trying to predict the weather for millennia, but it’s only in the past half century that forecasters could begin to make reliable forecasts about what was going to happen in the next few days.

A major turning point happened in the mid-1960s when Eduard Lorenz, a mathematician and meteorologist from MIT, began developing chaos theory, which he famous summed up in what has been called “the butterfly effect” — the idea that a very small change in one part of a complex, non-linear system could spark enormous and dramatic changes in a distant part of the system. (Metaphorically: could the flapping of a butterfly’s wings in Brazil cause a tornado in Texas?)

While the broad concept of a butterfly effect has since entered mainstream discourse, it’s worth remembering that when Lorenz developed it, the understanding that cause and effect could be non-linear was anything but intuitive for most people; even scientists who did understand it would have to wait decades before a combination of accumulating data and exponentially growing computer power would make accurate representations of these systems possible.

Gradually, though, climate science deepened, and weather forecasting improved. In the 1960s, when Lorenz started, the best weather forecasters could hope for was to predict conditions a day or so away; today forecasts go out as far as 10 days.

And then came El Niño.

El Niño is, of course, a phenomena that has been known in some form for centuries. The term refers to the Christ child, and was coined by South American fishermen who would notice warmer than normal Pacific Ocean temperatures every few years around Christmas time. Meanwhile, in the early part of the 20th century, British mathematician Sir Gilbert Walker began to analyze correlations between variations in tropical Pacific air pressure and the strength of subsequent monsoons in India — a phenomena he called the Southern Oscillation. In 1969, Jacob Bjerknes proposed that El Niño and the Southern Oscillation were in fact two sides of the same semi-periodic weather phenomena — a phenomena that was now coined the “El Niño Southern Oscillation” (ENSO). (The opposite of El Niño, now dubbed “La Niña”, represents the other pole of that same oscillation — a cluster of global weather phenomena at arise from colder-than-average Pacific sea surface temperatures).

ENSO is basically a 2 to 7 year, semi-periodic pattern in which Pacific sea-surface temperatures are either warmer or colder than normal. These temperatures in turn affect the weather in far-flung places.

Since the idea of ENSO was first developed, the phenomena has been correlated to annual temperatures variations, chance of flood or droughts, changes in crop yields, changes in river flows, higher or lower commodity prices, disease outbreaks, greater or lesser energy consumption and a variety of macro-economic impacts.

And ENSO is only the beginning: since its discovery, scientists have identified a host of other semi-periodic oscillations, including the Pacific Decadal Oscillation, the Interdecadal Pacific Oscillation, the Atlantic Multidecadal Oscillation, Arctic Oscillations and others. By looking at the historical climate record, scientists were able to correlate these diverse changing climate patterns with diverse effects — understanding, for the first time, how an El Niño butterfly could be mapped to that drought in California, or a La Niña to a bad hurricane season on the Atlantic coast. Climate scientists call these links between sea temperatures or other broad climate phenomena and weather events in far off places “teleconnections”.

In the 1990s, climate scientists grew increasingly confident of their ability to predict whether the next year would be an El Niño or La Niña year, based on current conditions. The biggest turning point happened in 1997, when the National Oceanic and Atmospheric Administration issued an ENSO advisory, warning that sea-surface temperatures were the highest they had been since 1983. These temperatures pointed to to the arrival of a very strong El Niño. In the following year, NOAA’s forecasts were validated, suggesting that we had entered a new era of climate forecasting.

Around the same time, a number of researchers, many out of Columbia University, were beginning to look at how to correlate changes in oscillations such as El Niño/La Niña, with the likelihood of extreme events in specific places, like floods.

Imagine, for example, the plight of a reservoir manager who must determine how much water to release early in a season. Release too much and there might not be enough water to get farmers through the summer; release too little and the manager might find himself overwhelmed by later rains and flooding.

Until the 1990s, most hydrologists looked at extreme floods as statistically “stationary” or in essence, random events — that is, events that had an equal probability of occurring in any given year. But as scientists began to unravel both the patterns of climate variations over time and the long-distance “teleconnections” between those events and distant weather, it seemed more and more likely that at least in some places, the statistical likelihood of a major flood in given year would be correlated to larger observable patterns.

An early example of the direction this research was taking happened in 2000, when Lall and a colleague, Shaleen Jain, published a paper that looked at the historical pattern of flooding on the Blacksmith Fork River in Utah. The river was chosen because while many years of historical flood date were available for it, it remained relatively unaffected by diversions and other human influences. What Lall and Jain discovered was that major floods on the Blacksmith Fork River were negatively correlated with both the El Niño and the Pacific Decadal Oscillation — meaning that one could expect larger floods on the river in La Nina years, or when the PDO was in a negative position.

Then they put the two indices together to measure the non-linear flood relationship, and found something even more interesting: while periods with simultaneously negative El Niño and PDO were still correlated with the largest floods, the opposite combination, a positive El Niño and a positive PDO — as well as the combination of a strongly negative PDO with a positive El Niño — also yielded moderately large floods along the river.

In other words, what scientists were discovering was that the global climate could create big floods using a variety of different formulas. Now, for the first time, researchers like Lall and his colleagues were on the verge of reverse engineering those formulas — not just in general terms for large regions (predicting an unusually wet or dry season) but in specifics, for specific river basins, on specific time-frames. Suddenly, it looked like it might be possible to predict when and where one of the most common and devastating natural disasters was likely to take place, a full season ahead of time — at least for some rivers in some places — based on measured sea surface temperatures the year before.

Needless to say, the implications of such an ability were profound. The obvious ones involved disaster planning; if an agency like the Red Cross, say, could get advance warning that a particular region in Africa was going to suffer a deluge months ahead of time, they could pre-mobilize their relief efforts in those areas, and thus reach victims with needed supplies days faster, potentially saving thousands of lives. Just this scenario was tested in 2008, when, for the first time, the Red Cross issued an emergency appeal based on a seasonal flood forecast from Columbia University’s International Research for Climate and Society. The alert allowed the agency to respond to disasters in Togo, Senegal, Ghana and Gambia within 48 hours after flooding started, and seemed to mark a major change in how emergency responders could prepare for climate disasters.

But there were other implications. Imagine, for example, the plight of a reservoir manager who must determine how much water to release early in a season. Release too much and there might not be enough water to get farmers through the summer; release too little and the manager might find himself overwhelmed by later rains and flooding, forced to make the kind of heartrending choice the Army Corps of Engineers faced in 2011. But what if that reservoir manager were given clear, reliable information on the likelihood of flooding later in the season? Would it be possible to avoid such dramatic measures?

But while these early efforts remain promising, the full potential of seasonal forecasting has yet to materialize. Anyone who was hoping that the Red Cross’ 2008 forecasting-based disaster preparations would turn into a yearly event has so far been disappointed — since then, forecasts for floods have not been reliable enough to warrant such dramatic action.

Mengqian Lu remembers how, as a girl, a giant tree grew in front of her home in Qingdao on the northeast coast of China. Every year during typhoon season, her parents would aggressively prune the tree in hopes that it wouldn’t break and fall on the house; even so, she says, it was terrifying to watch it twist and shake in the wind. It never came down, but there were other impacts; in her hilly neighborhood, roads were still unpaved, which meant that many would inevitably be washed out after a storm, as rain and wind flushed sand from the hilltops into far-flung corners of town.

“Houses near the sea,” she says, “were always flooded by the storm surge.”
And it wasn’t just where she lived; every year, she says, they would hear news about all manner of natural disasters, especially floods, on the Yangtze and Yellow rivers. “Americans are used to hearing news every second,” she says, but “when I was little, in China, only the big news would hit you,” and that news was often about a natural disaster of some sort.

Because her parents were businesspeople, she says, she had direct understanding of something else — that in a global economy, natural disasters don’t just impact the people who live on the banks of the rivers or in coast communities. Factories could get flooded, and families ruined.

“I believe data should be the basis for the exploration of pattern, structure and function … we are dealing with the most sophisticated systems of ocean, atmosphere, terrestrial systems; in order to understand even some part of this complex system one needs to be able to do data mining to dig the pattern and extract leading information.”

After finishing her undergraduate studies in Hong Kong, Lu came to New York to study at Columbia University, with Lall of the Columbia Water Center as her mentor.

Lu is quick to emphasize the unusual, multidisciplinary rigor of the program. Under Lall’s guidance, she realized that to truly understand climate and natural disasters and have any hope of helping policy-makers manage the associated risks, one has to understand the whole system — from the physics of water, to the climate nexus of ocean-atmospheric hydrology — and to pull it all together, have strong analytical skills and heavy doses of statistical training. (Lu’s training in statistics, she says, is equal to that of doctoral students in the statistics department).

Last year, Lu’s work was cited by the American Geophysical Union as some the “most exciting” new research — research that could open the door for improved flood prediction. She believes she is one of only a few young scientists to fully embrace rapid improvements in computing power for cutting edge data mining to study multi-timescale hydroclimate systems.

“I believe data should be the basis for the exploration of pattern, structure and function,” she says, “rather than just serving its traditional role as a vehicle for parameter estimation and model calibration. We are dealing with the most sophisticated systems of ocean, atmosphere, terrestrial systems; in order to understand even some part of this complex system one needs to be able to do data mining to dig the pattern and extract leading information,” — to discover, in other words, what’s important and filter out the noise. (“Not that we could ever understand the whole,” she adds quickly. “I actually appreciate the chaotic beauty of the system.”)

Where Lu’s recent research is revolutionary is that now, for the first time, she and her colleagues have discovered a way to predict extreme precipitation 30 days in advance — bridging the gap between seasonal climate predictions and day-to-day weather forecasts.

Up until now, she explains, “we have weather forecasts typically for up to 10 days ahead — what we’ve seen on TV and weather.com — but we tend to believe only up to three days, because our experience tells us that forecasts after that are no longer reliable.” By contrast, seasonal climate forecasting has, for the most part, retreated to what it has done well for a while — accurately predicting average chronic conditions for the upcoming season, but with little to say about the acute episodes that cause so much damage. “Given seemingly enhanced variability, it’s the frequent ups and downs of a time series really kicks our nerves and those of decision-makers. It’s what lives at the edges that determines losses and damages. We have to know better the extremes,” says Lu.

When it comes to major floods, in addition to periodic climate oscillations, there are “atmospheric rivers” a term that was coined in a 1998 paper by Zhu and Newell, two researchers from MIT. Zhu and Newell discovered that most of the water vapor that moves around the planet does so in 4 or 5 relatively narrow bands of the atmosphere. These bands can be thousands of miles long, but are typically only 250 miles wide — hence, “atmospheric rivers.” In the United States, the most famous atmospheric river is the one sometimes pointed straight for the heart of California, dubbed “The Pineapple Express,” because it begins in the waters off Hawaii and, every so often, drenches the West Coast. (In December of 2014, the Express brought much-needed moisture to the parched state — along with power-outages, flooded highways and mudslides that sent homes plunging into the Pacific).

In 2010, Lu and Lall began looking seriously into how patterns of Atmospheric Rivers could impact major floods. Along with two other researchers, Aurelien Schwartz and HyunHan Kwon, they focused on the rare and devastating flood that washed over parts of France and Germany in 1995 — a flood that was associated with the heaviest precipitation the region had experienced in 150 years. Their paper, published in 2013, mirrored similar work being done at the same time on the Ohio River by another team of Columbia researchers. What both papers confirmed is that major floods — especially in the mid-latitudes — are associated with particular anomalies of atmospheric rivers bringing moisture from the tropics.

In her most recent work, Lu took this research one step further, making the connection between changes in sea surface temperature (SST) and anomalies in atmospheric pressure in other parts of the globe, days or weeks later.

It turns out that “persistent outgoing SST signals” primarily come from the tropics — a fact consistent with the understanding that most moisture flows from the tropics and is, therefore, likely the dominant drivers of climate variability at the 30 day timescale, throughout the year.

Because storms are usually associated with low pressure systems, a consistent connection or “SST signal” means, in theory, that one could predict extreme precipitation weeks before it happens by looking at anomalies in sea surface temperatures — if one knows where to look. Now, according to Lu, we do. Using big data, modern computer power and cutting-edge data mining skills, Lu and her team have identified the butterfly, know where it lives, and can hear it flapping its wings.
It’s only a step — but potentially a huge one.

As the conclusion of the study puts it, correlating sea surface temperatures is the beginning of “an exploratory process of model building and making the case for potentially unprecedented predictability of precipitation extremes over the ensuing 30 day period.” The strongest connections so far are between tropical Pacific Ocean areas, closely related to ENSO, and subtropical areas whose weather and climate system are heavily governed by atmospheric circulation patterns driven by tropical ocean energy.

Upmanu Lall remains cautious, however. Even though Lu has identified a correlation, he says, the actual mechanics of how it works are not yet completely clear, though it makes sense intuitively. “The thing is that if one could demonstrate similar predictability in other places, then it fits with what we are trying to do” he says.

If 30-day predictability can be put together with longer-term, seasonal climate forecasts, in a global model, decision makers could have an entirely new way of approaching climate-related disasters.

“Extreme events and flood risks should not be considered purely a regional thing,” says Lu, adding that her work highlights the merits of “a global network that captures spatiotemporal structure,” of interconnected climate phenomena. “We are working to create a global flood initiative focused on improving catastrophic risk management across geographies, business units, sectors and portfolios. The goal is to have a complete package of information for decision making.”

This all comes, of course, not a moment too soon, as the impacts of human-caused climate change will increasingly insert themselves into the picture. As variability is further irritated by climate change, says Lu, multi-timescale and frequent updating of operations and management schemes will be crucial to mitigate damages.

“We are celebrating the 36th year of having satellite climate data (first available in 1978),” says Lu. “The high resolution of the satellite dataset, combined with the adapted global correlation network algorithm in my paper,” make it possible “to investigate the entire hemispheric ocean-atmosphere system with daily data and provide the spatiotemporal network that enables our 30 day forecasts.”

“The big data is the rice, the algorithm is the recipe, and I am the lucky chef who made the dish happen,” says Lu.

Those forecasts could be a game changer, and not just for people who live in vulnerable areas. “Weather and natural disasters are not just for people who live near the rivers,” says Lu. “They affect everyone.”