Processing Spatial Raster Data in Apache Sedona

Mo Sarwat
5 min readAug 3, 2023

--

Raster data is a type of geospatial data representation of the Earth’s surface and atmosphere from a distance, typically using satellite or aerial sensors. It is a grid-based data structure where each cell, or pixel, in the grid represents a specific location on the Earth’s surface. These pixels can store various types of information, such as elevation, temperature, land cover, precipitation, or any other continuous or categorical attribute associated with that location.

What is raster data used for?

Some of the key applications of raster data processing include:

  1. Agriculture: Monitoring crop health, estimating crop yield, detecting disease or pest infestations, and assessing soil moisture and nutrient levels.
  2. Environmental Monitoring: Tracking deforestation, urban sprawl, land use changes, and assessing the impact of natural disasters like wildfires, floods, and hurricanes.
  3. Climate Studies: Monitoring climate patterns, studying changes in glaciers and ice caps, and measuring greenhouse gas emissions.
  4. Natural Resource Management: Monitoring forests, water bodies, and wildlife habitats for conservation and management purposes.
  5. Disaster Management: Assessing the extent of damage after disasters, aiding in search and rescue operations, and assisting in emergency response planning.
  6. Urban Planning: Analyzing urban growth, land use patterns, and infrastructure development in cities.
  7. Geology and Geomorphology: Identifying geological features, mapping geological formations, and studying landforms and terrain.
  8. Oceanography: Monitoring sea surface temperature, ocean currents, and marine ecosystems.
  9. Archaeology: Identifying ancient ruins, buried structures, and archaeological sites from aerial and satellite images.
  10. Transportation and Infrastructure: Monitoring transportation networks, assessing road conditions, and planning new infrastructure development.
  11. Water Resource Management: Monitoring water quality, detecting pollution, and assessing water availability in reservoirs and rivers.
  12. Health and Epidemiology: Tracking disease outbreaks, monitoring environmental factors affecting public health, and analyzing the spread of infectious diseases.
  13. Forestry: Managing and monitoring forest resources, detecting illegal logging activities, and assessing tree health and biomass.
  14. Land Use and Land Cover Mapping: Classifying and monitoring different land cover types and land use changes over time.

These are just a few examples of the wide range of applications of remote sensing. As technology advances and data becomes more accessible, remote sensing continues to play an increasingly important role in understanding and managing the Earth’s natural and human-made environments.

Raster Support in Sedona

There are more than 7000 satellites orbiting the earth, collecting tons of raster data. The scale and complexity of such data makes it difficult to process. Apache Sedona has been implementing scalable spatial vector data functions for several years, and the support has become mostly mature. In recent releases, the Sedona community has invested more in scalable raster data support, including raster data reader, writer, and raster manipulation. To enable comprehensive raster data support, Sedona has made the following efforts:

Raster data type

Sedona introduces a native raster data type. Similar to the geometry type, the raster type indicates that the column in a schema contains raster data. Each record stored in this type contains two parts: metadata and raster band information. With this in place, we can create a table that has both a geometry column and a raster column.

Raster reader

To begin processing raster data, the first step is to load the raster data. Additionally, we must be able to load raster data in parallel given the vast amount of available raster data. SEDONA-251 introduces a scalable raster reader and native raster data constructors. For example, we can read raster data files in Sedona as follows:

sedona.read.format("binaryFile")
.load("raster/*.tiff").createOrReplaceTempView("binaryTable")

To construct a raster column, use the following steps. We have added additional constructors, such as RS_FromGeoTiff and RS_FromAscGrid, which allow you to convert various raster binary formats to a unified raster format.

CREATE OR REPLACE TEMP VIEW rasterTable AS
SELECT RS_FromGeoTiff(binaryTable.content) AS raster
FROM binaryTable

We can print the schema of this table. As we can see, the type of this column now is raster.

sedone.table("binaryTable").printSchema()
root
|-- raster: raster (nullable = true)

Raster writer

After processing raster data, it is necessary to store the data in an external storage. With SEDONA-269, this can be achieved in Sedona Spark as follows:

First, convert the raster data to a specific binary format, such as GeoTiff. Note that the raster can be imported from one image format, but exported to another image format.

SELECT RS_AsGeoTiff(raster) AS image
FROM rasterTable

After processing the data, we can save it to an external location.

sedona.table("rasterTable").write.format("raster").mode(SaveMode.Overwrite).save("my_raster_file")

Raster functions

SEDONA-251, SEDONA-269, and SEDONA-292 together introduce several functions to transform and analyze raster data. Let’s examine a few examples.

First, let’s assume that we have a GeoTiff image as follows:

Extract bounding box, SRID and number of bands

We can extract the bounding box, srid, and number of bands of this raster.

SELECT raster, RS_Envelope(raster) as bbox, RS_Metadata(raster), RS_NumBands(raster)
FROM rasterTable
+--------------------+--------------------+-----+--------+
| raster| geom| srid|numBands|
+--------------------+--------------------+-----+--------+
|GridCoverage2D["g...|POLYGON ((590520 ...|32631| 3|
+--------------------+--------------------+-----+--------+

Now this resulting table has a schema that contains both raster and geometry type (see below)

root
|-- raster: raster (nullable = true)
|-- geom: geometry (nullable = true)
|-- srid: integer (nullable = true)
|-- numBands: integer (nullable = true)

This allows for more complex geometric operations on the table. For instance, we can save the table as a GeoParquet format and perform filter pushdown. When executing a Sedona spatial range query on this GeoParquet table, Sedona will retrieve only the data that intersects the spatial range query window.

sedona.table("rasterTable").write.format("geoparquet").save("rasterTable.parquet")

Extract individual band

We can extract any band in a raster.

SELECT RS_BandAsArray(raster, 1) as band1, RS_BandAsArray(raster, 2) as band2,
RS_BandAsArray(raster, 3) as band3
FROM rasterTable
+--------------------+--------------------+--------------------+
| band1| band2| band3|
+--------------------+--------------------+--------------------+
|[0.0, 799.0, 788....|[0.0, 555.0, 546....|[0.0, 330.0, 322....|
+--------------------+--------------------+--------------------+

Modify a band value

We can modify a band value and save it to the original raster

SELECT RS_AddBandFromArray(raster, RS_GreaterThan(band1, 0), 1) AS raster)
FROM rasterTable

The resulting GeoTiff looks like this

Conclusion

Raster data processing is quite essential for data and AI teams working with massive-scale satellite imagery and earth observation datasets. Apache Sedona now provides extensive support for scalable raster data reading, writing, and processing functionality. That will democratize the process of using such data in applications. To use Apahche Sedona Today, visit the Github repo

--

--

Mo Sarwat

Mo is the founder of Wherobots.ai, CS Prof at Arizona State University, & the creator of Apache Sedona (a scalable system for processing big geospatial data)