Comet Time Series (CometTS): a New Tool for Analyzing a Time-Series of Satellite Imagery

By Jake Shermeyer and Dylan George

Hurricane Maria devastated Puerto Rico. Ebola infected and destabilized several West African countries. Syrian refugees continue to pour into the country of Jordan, straining their political and economic resources. A common feature in all of these crises is the need to deploy staff (e.g., clinicians, epidemiologists, first responders) and stuff (e.g., medical countermeasures, supportive care materials, drinking water, food) to those in need at the right time and place. Accomplishing that co-location of staff and stuff for humanitarian effect requires, in part, knowing where people are. Moreover, how people move, especially in low-resource settings, can be very different than what the average person experiences in more developed economies. For example, potentially 30% of the estimated population of Niger migrates seasonally to follow economic opportunities during the dry season[1]. As such, estimating population sizes and dynamics (e.g., movement and long term relocation trends) at regional and seasonal scales, particularly in low-resource settings, remains a surprisingly difficult yet critically important challenge for a variety of applications in national security, civil society, and commercial organizations.

Shortage of data and analytical tools constrain research

A lack of data and associated analytical tools constrain progress in enumerating regional and seasonal populations in low resource areas. For example, there is a lack of sufficient ‘ground truth’ data on population sizes and dynamics at regional and seasonal scales that researchers can use to evaluate possible data sources and methods for estimating population size and density. In the absence of ‘ground truth’ data, we assessed a range of data sources (e.g., social media, call data records, satellite imagery, epidemiological data) and determined that satellite imagery could still be helpful for targeted use cases given (1) the availability and relatively low cost of satellite imagery, (2) encouraging early results from academic researchers[2],[3],[4], (3) interest from stakeholders, and (4) the feasibility of work necessary for tool development.

Currently, there is a dearth of open source tools to perform foundational analysis using satellite imagery, and particularly time series analyses on such imagery. Analyzing large satellite imagery datasets has been traditionally handled by proprietary GIS tools or manual, bespoke methods that require a significant level of geospatial experience to use them appropriately. Companies dedicated to open sourcing solutions, such as Development Seed[5] (“Dev Seed”), have begun to develop and open source tools for exploiting satellite imagery (e.g., the Landsat Utility[6]). However, much remains unaddressed, including analyzing time series data from satellite imagery.

CometTS — a new, open source analytical tool

We have developed a tool that we call Comet Time Series (CometTS) that facilitates analysis and visualization of a time series of satellite imagery in order to enable population estimation research, change detection, or natural disaster monitoring using a range of data types. We have chosen to open-source our software and results to enable both researchers and humanitarian organizations to support and speed up the analysis of time series satellite imagery datasets. CometTS can analyze a diverse range of satellite data such as high-resolution imagery, Landsat, Synthetic Aperture Radar data, or nighttime satellite imagery from the Suomi National Polar-orbiting Partnership (NPP) Visible Infrared Imaging Radiometer Suite (VIIRS) satellite[7]. Ultimately, we hope more researchers will be able to explore, test and prove hypotheses surrounding certain image features, and will have a greater ability to estimate dynamic populations and their changes.

CometTS provides a partially automated approach for analyzing a time series of satellite imagery in any user defined area of interest. The tool calculates relevant statistical quantities (e.g., measures of central tendency and variation), and visualizes their changes over time (Figure 1 and Figure 2). We believe this work is novel, as presently we are not aware of any such open-source tools to evaluate and leverage a time series of satellite or potentially airborne imagery[8] from user drawn polygons. Furthermore, this tool makes time series satellite imagery more accessible to the data science community and removes a GIS tool as a requirement for working with these data. The tool requires only a web browser, Python®, and dependent packages to function[9]. Other time series tools like TSTools[10] are powerful but require a GIS interface and can only be used to analyze individual pixels rather than larger areas of interest. Additionally, existing proprietary GIS-heavy solutions and services have yet to gain widespread adoption, particularly in the fields of humanitarian and epidemiological response. This limitation presents an opportunity to assist unsupported or resource constrained stakeholders and organizations. Ultimately, we hope our work will demonstrate how improved tools can enable a greater quantity of analyses with multiple types of time series imagery.

Figure 1: A visualization of the workflow walkthrough
Figure 2: An example of the visualization that the tool produces (Step 4) which depicts changes in brightness over time in Agadez, Niger as captured by the Suomi NPP VIIRS satellite.

Key inputs for the tool

The tool ingests two key components: (1) time series of overhead imagery and (2) the user drawn polygon to designate the area of interest. Beyond what is possible with a single satellite image, time series imagery enables the investigation of patterns and sequences of the spectral responses and how they change over time.

Tool Outputs

The tool produces a tabular and visual depiction of relevant statistics of spectral responses for every pixel contained within the polygon that delimits the area of interest. This operation is repeated for every image within the time series. The user can determine the statistic(s) of most relevance for his/her analytic needs. The output is customizable by the user to produce any type or range of statistics needed. For our assessment and demonstration, we choose standard statistics to illustrate trends and uncertainty, including median, lower quartile (25th percentile), upper quartile (75th percentile), linear regression, and a Gaussian signal filter[11]. In a dynamic population-monitoring scenario, these statistics can help to show how correlates for population (such as brightness) change over time and the distribution of these correlates within an area of interest.

Users also have the option to supply “mask” images that are sometimes distributed with satellite imagery or that can be created. These masks can be used to remove areas that contain cloud cover, cloud shadow, snow, or other anomalies that can interfere with analysis of data (Figure 3).

Figure 3: An example of a cloud mask: (left) an unmasked Landsat satellite image, and (right) same image with a cloud-mask applied.

Initial Results: Comparison to GIS based Workflows

Leading researchers[3],[13]. have educated us on the challenges they currently face in dealing with nighttime imagery; notably, workflows are typically tedious, hands-on exercises that rely heavily on licensed, proprietary GIS software to make conclusions and estimate changes to populations. Using a proprietary tool or even an open source tool like QGIS for such an analysis can be time intensive.

In general, an end user would have to:

1. Select, draw, and save an area of interest
2. Mask areas with clouds or other unwanted anomalies. Each image would be read into the GIS program and then written back out to convert unwanted pixels to “null”.
3. Extract relevant statistics using a zonal statistics tool for every raster image in the area of interest. This will return one table of relevant stats (median, mean, standard deviation, etc.) per image per area of interest.
4. Manually extract date information for each image and label each corresponding zonal statistics table with this information to build time series plots.
5. Merge all tables into one coherent table or excel sheet.
6. Utilize an external program like Microsoft® Excel™ to generate plots or estimates of regression.

Step 1 is a common theme for all types of these analyses. Step 2 would read and write hundreds of gigabytes of data. Step 3 is automatable but can be time consuming. Steps 4 through 6 are not automatable in commonly used GIS software, and would have to be done by hand for each image in a time series (50+ times for most applications).

Comparatively, such an analysis can be completed in minutes using the CometTS tool. An end user would have to:

1. Select, draw , and save an area of interest
2. Document the organizational structure of the associated satellite imagery (i.e., where and how the imagery files and cloud masks are stored) in a tabular fashion using the CometTS CSV Creator script.
3. Feed this tabular output to the tool along with areas of interest. The user then simply clicks the “Execute…” button, and CometTS completes the rest of the workflow automatically.

The CSV Creator script from step 2 runs in under 10 seconds for our test set of 64 images, while step 3 takes approximately 1 minute to run for each area of interest. After clicking the “Execute…” button, CometTS removes anomalous pixels only in each area of interest (an area of approximately thousands of pixels instead of millions), thus saving the computational overhead of reading and writing entire images. The tool then automatically calculates all relevant statistics and extracts date information. The statistics and date information are output into a table for each area of interest. Plots and estimates of regression are also output immediately.

Initial Analysis and Results

The primary time series dataset we used to test CometTS was a dataset from a satellite that provides monthly composite data on nighttime surface brightness from the Suomi National Polar-orbiting Partnership (NPP) Visible Infrared Imaging Radiometer Suite Satellite (VIIRS)[14]. NPP VIIRS offers a daily revisit rate, a spatial resolution of ~400 meters, and a time series dating back to April 2012. We selected this dataset as the initial test case because it is hypothesized by many researchers that the brightness of human generated night-time light can be correlated with the presence and size of a human population[15]. Variations in socioeconomic, atmospheric (e.g., cloud cover) and technological conditions in different locations require establishing correlations for each location. Changes in brightness values as indicated from time series analyses could indicate changes in either the size of the population, or the ability of the population to emit light for other reasons (e.g., natural disasters, war, power outages, infrastructure development).

We chose several locations across Africa, the Middle East, and Europe to assess the effectiveness of the tool and the data. For example, in Niamey, Niger, the data we analyzed show both seasonal migration patterns and a general upward trend in population growth from 2012–2017 (Figure 4). In the Suruç Refugee Camp, Turkey (Figure 5), these data are consistent with the December 2014 inception of a 30,000-person refugee camp just north of the Syrian border. In Aleppo, Syria (Figure 6), we can see another effect of the Syrian Civil War, with this plot showing diminished lights, which likely correlate to the loss of power and/or potential emigration of people from the city.

Figure 4: Changes in brightness over time in Niamey, Niger as captured by the Suomi NPP VIIRS satellite.
Figure 5: Changes in brightness over time in the Suruç Refugee Camp, Turkey as captured by the Suomi NPP VIIRS satellite.
Figure 6: Changes in brightness over time in Aleppo, Syria as captured by the Suomi NPP VIIRS satellite.

We have received positive feedback from academic researchers indicating that CometTS would improve workflow, save time, and be useful for future analysis and training. Additionally, as mentioned previously, it is extensible to analyzing any time series of satellite imagery and to analyzing various types of imagery concurrently. This will enable analyses of other data-sources, such as MODIS, Landsat, or higher resolution imagery from DigitalGlobe, to evaluate the relationships between these data, population movement, and other dynamics. The tool was open sourced on the CosmiQ Works Github repository and can be downloaded here: https://github.com/CosmiQ/CometTS.

Full install instructions, a tutorial walkthrough, and test outputs are also included in the repository.

Conclusions

Multiple opportunities exist to employ CometTS for impactful work. There are a number of useful primary applications including: (1) population dynamics; (2) land-use change; and (3) investigating seasonal or climatic conditions such as drought. These primary applications generate outputs that have been demonstrated as useful inputs to gain better understanding of changes in other topics such as climate, poverty, food security, biodiversity, political conflict, and civil instability. Presently, CometTS has ingested two datasets: Suomi NPP VIIRS and multi-spectral Landsat imagery. We will explore land use change monitoring further with CometTS and Landsat in a subsequent post.

References

[1] ICT Development Project in the Region of Dakoro, Niger, 2006, see website link (translated from French to English).

[2] Buckee et al 2017. Seasonal Population Movements and the Surveillance and Control of Infectious Diseases. Trends in Parasitology. January 2017, Vol. 33, №1;

[3] Bharti et al 2011. Science. Explaining Seasonal Fluctuations of Measles in Niger Using Nighttime Lights Imagery. 334, 1424. DOI: 10.1126/science.1210554

[4] Tatem AJ. Mapping population and pathogen movements. IntHealth 2014; 6:5–11.

[5] https://developmentseed.org

[6] https://github.com/developmentseed/landsat-util/tree/develop, Version 0.13.1 Released 2017–01–10

[7] https://www.nasa.gov/mission_pages/NPP/main/index.html

[8] The tool can ingest any time series of geolocated airborne or spaceborne imagery. However, the focus of our work to date has been exclusive to spaceborne imagery.

[9] Packages include: numpy, pandas, gdal, geopandas, rasterio, shapely, Fiona, affine, rasterstats, matplotlib, seaborn, jupyter, ipywidgets, IPython, tqdm and scipy

[10] Christopher E. Holden. (2015). TSTools: Linking time and space visualization for remotely sensed timeseries. Zenodo. 10.5281/zenodo.34182

[11] The median response for an area of interest depicts the centermost spectral response that is occurring in that area, and is more resilient to outliers than the mean response. The quartile ranges allow an end user to visualize the spread and range of variation of data. The linear regression allows for a visualization of general trend over time. Finally, the Gaussian signal filter is similar to a moving average, and enables the visualization of seasonality or the identification of patterns of changes in the data over time.

[12] Bharti et al 2011. Science. Explaining Seasonal Fluctuations of Measles in Niger Using Nighttime Lights Imagery. 334, 1424. DOI: 10.1126/science.1210554

[13] Bharti et al 2015. International Health. Remotely measuring populations during a crisis by overlaying two data sources. 2, 90–8. DOI:10.1093/inthealth/ihv003

[14] https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html

[15] Sutton P, Roberts D, Elvidge CD, Baugh KE. Census from heaven: an estimate of the global population using nighttime satellite imagery. Int J Remote Sens 2001;22:3061–76.