The Science Behind the Signal: Tracking Unknown Oil Tanks Around the World
Today, we announced the availability of our China Oil product, which is the culmination of months of meticulous engineering, data science, and quality assurance. Here’s how we did it.
Starting at home
The price of oil is driven by many political, technological, and economic factors, but much of this volatility is due to a lack of transparency in the market. We might know how much oil is stored in Rotterdam, and we have reliable numbers for the United States. But other sources of oil can easily flood or drain the market without warning, since their storage capacity is unreliably reported. We wanted to build a tool that could independently measure how much oil is stored in the world to protect the market from volatility.
We had already been counting millions of cars in US retail parking lots when we began building the US Oil product. We applied the same technology — of identifying objects from satellite images — to oil tanks. We trained our algorithms to detect crude oil tanks with floating roofs across the US, from Cushing, Oklahoma, to Houston, Texas. By measuring the size of each tank, we could count the total capacity for oil storage, but we didn’t quite have an accurate measure of current supply. Understanding the volume of oil in each tank took a bit of imagination and trigonometry.
Floating roofs sit on top of crude oil tanks for a variety of reasons — for example, to minimize breathing and evaporative losses. As the reservoirs are filled and emptied, the roofs rise and fall, reflected in the crescent moon-like shadows from the walls of the reservoir. The size and shape of the shadow is a sensitive metric of the volume of oil held in the tank, which we analyzed across approximately 6,000 tanks in the US, creating a holistic, near real-time view of the country’s current supply of crude oil.
Data on oil inventories in the US is relatively well-known, which made verifying the accuracy of our new product straightforward. We began testing the US Oil product in 2015, and backtesting on satellite images from as far back as 2010, and we have closely matched (within +/- 8%) the amount of oil reported to be stored in the US over the last 300 weeks. What’s more, we have been able to share that data ahead of industry reports. In August, our US Oil product reported a decrease in US crude oil inventory two trading days ahead of the International Energy Agency’s report.
With the success of the US Oil product, we set our sights abroad. Seeing a near real-time count of oil supply in the world’s largest economy can help regulate market drops and spikes, but we aim to provide a macroscopic view of the world, not a single country. As we continue to expand the coverage of the Oil product, we focused our lens on the second largest economy: China.
Bringing transparency to a closed economy
Our algorithms don’t care whether they see pictures of the United States or China; we simply feed them new images to analyze. From approximately 10,000 images, we tracked commercial and strategic petroleum reserves across the country’s 3.7 million square miles.
We found more than 1,500 crude oil storage tanks that were not cataloged in the industry standard database of tank farms. At first, we didn’t believe there would be so many “unknown” tanks, but as we started to check the tanks manually, we knew the algorithm had indeed found them.
After that, we performed quality assurance for the China Oil product. Our quality control processes involve a mix of algorithms and manual verification. Algorithms are good at producing statistics and finding anomalies, while human verification and image labeling are useful for calibrating machine vision algorithms.
Our most recent estimate counts approximately 600 million barrels of crude oil supply in China in May 2016. Though there may be additional storage underground, these findings have already dramatically changed our understanding of the amount of oil that can be stored in China, which gives investors the opportunity to price oil based on a more realistic estimate of supply.