Get Sentinel Data within seconds in Python

Krishna G. Lodha
Rotten Grapes
Published in
6 min readFeb 6, 2024

With the development of STAC specification, accessing Satellite datasets have become easy and standard approach. This specifications allows users to access metadata and raw data with ease, along with an ability to filter data bases on attributes such as type of satellites, cloud cover, area of interest, dates ,etc.

In this short blog we explore 2 open source packages

1. pystac-client:

pystac-client is a Python library designed to interact with SpatioTemporal Asset Catalogs (STAC), enabling users to discover and access geospatial data easily. STAC is an emerging standard that provides a common language for describing satellite imagery and related assets, making it simpler to search for and access data across different platforms and providers.

2. odc-stac:

odc-stac is an extension of the Open Data Cube (ODC) project, specifically tailored for working with STAC catalogs. ODC is a powerful tool for managing and analyzing large volumes of Earth observation data efficiently.

Setup Environment

Start by creating your own virtualenv and activating it

python3 -m venv env
source env/bin/activate

once you are in virtualenv, install basic packages we need

pip install pystac-client odc-stac

If needed we’ll install more packages later.

Finally, we’ll create a python file and import these packages

from pystac_client import Client
from odc.stac import load
import odc.geo

Setup Variables

We’ll start by defining required variables

# use publically available stac link such as
client = Client.open("https://earth-search.aws.element84.com/v1")

# ID of the collection
collection = "sentinel-2-l2a"

# Geometry of AOI
geometry = {
"coordinates": [
[
[74.66218437999487, 19.46556170905807],
[74.6629598736763, 19.466339343697722],
[74.6640371158719, 19.4667885366414],
[74.66395296156406, 19.46614872872264],
[74.66376889497042, 19.466150941501425],
[74.66369077563286, 19.46577508478787],
[74.6635865047574, 19.465278788212864],
[74.66282073408365, 19.46540270444271],
[74.66218437999487, 19.46556170905807],
]
],
"type": "Polygon",
}

We also need to specify date for which we want to query data. There are various ways we can use dates such as

  1. Specific date ( e.g. “2024–01–21”)
  2. Complete month ( e.g. “2024–01”)
  3. Date range ( e.g. “2024–01–10/2024–01–20” )

We’ll have a look at each of these.

Searching for Items

First thing we need to do is to check if items are available for given combination of collection, date, geometry, query, etc.

# Specific Date
date_YYMMDD = "2024-01-21"
# run pystac client search to see available dataset
search = client.search(
collections=[collection], intersects=geometry, datetime=date_YYMMDD
)
# spit out data as GeoJSON dictionary
print(search.item_collection_as_dict())

### Console - {'type': 'FeatureCollection', 'features': []}

above code block uses search method derived from client. As the result dictates, there are no items available for given combination, now we can try searching for result for entire month.

# Complete month
date_YYMM = "2023-01"
# run pystac client search to see available dataset
search = client.search(
collections=[collection], intersects=geometry, datetime=date_YYMM
)
# spit out data as GeoJSON dictionary
print(search.item_collection_as_dict())
# loop through each item
for item in search.items_as_dicts():
print(item)p

upon executing above code we get 6 features

We can use these items to get the actual data out (we’ll see how to do it shortly).

We can also request for data within date range as follows

# Date range
date_range = "2023-01-10/2023-01-20"
# run pystac client search to see available dataset
search = client.search(
collections=[collection], intersects=geometry, datetime=date_range
)
# spit out data as GeoJSON dictionary
print(search.item_collection_as_dict())
# loop through each item
for item in search.items_as_dicts():
print(item)

this will give us 2 feature items.

We can also introduce query params to filter out result based on metadata. These params will depend upon the collection we are using. e.g. in case of Sentine 2 L2A, we can use params such as

  • eo:cloud_cover
  • s2:dark_features_percentage
  • s2:cloud_shadow_percentage
  • s2:vegetation_percentage
  • s2:water_percentage
  • s2:not_vegetated_percentage
  • s2:snow_ice_percentage, etc.

Example of such query is as below

# additional filters as per metadata 
filters = {
"eo:cloud_cover":{"lt":0.2},
"s2:vegetation_percentage": {"gt": 25}
}
# run pystac client search to see available dataset
search = client.search(collections=[collection], intersects=geometry , query=filters ,datetime=date_YYMM) #bbox=tas_bbox
#spit out data as GeoJSON dictionary
print(search.item_collection_as_dict())
# loop through each item
for item in search.items_as_dicts():
print(item)

We got one item based on filter

Getting data from Item

This is where odc-stac will be helpful, it converts data into xarray Dataset

#load the data in xarray format
data = load(search.items() ,geopolygon=geometry,groupby="solar_day", chunks={})
print(data)

When you print data, you will see mapping of this xarray Dataset along with vital information such as resolution, crs, timestamp as well as all available bands ( as Data Variables )

If you open Data Variables, it will show something like this

Creating Indices from Data

Once we get hold of data, we can create the indices we want. Here is an example of we can create NDVI index.

# create the index without considering scale or offset
ndvi = (data.nir - data.red) / (data.nir + data.red)

which then can be exported as tiff easily

# export data as tiff
odc.geo.xr.write_cog(ndvi,fname='ndvi.tiff', overwrite=True)

This tiff is a representation of data for entire bounding box, and has CRS similar to the parent tile.

Optional Steps

You can use GDAL or rasterio to reproject the tiff to EPSG:4326 ( In which we have our original geometry) and then clip it to create final geometry.

Code for this will look like this

# Add Imports
from osgeo import gdal
import rasterio as rio
from rasterio import mask as msk

# Reproject file to EPSG:4326
input_file = f"ndvi.tiff"
# Path to save the reprojected GeoTIFF file
output_file = f"ndvi_reprojected.tiff"
# Define the target CRS (EPSG:4326)
dst_crs = 'EPSG:4326'
input_raster = gdal.Open(input_file)
warp = gdal.Warp(output_file, input_raster, dstSRS=dst_crs)


# Open reprojected raster
raster = rio.open(output_file)
# Use masking from rasterio to clip according to geometry
with raster as src:
out_image, out_transform = msk.mask(src, [geometry], crop=True)
out_meta = src.meta.copy()
out_meta.update(
{
"driver": "GTiff",
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform,
"nodata": 0,
}
)
# set new file path
farmpath = f"final_ndvi.tiff"
# Export newly created raster
with rio.open(farmpath, "w", **out_meta) as dest:
dest.write(out_image)
dest.close()

If you’ve found this blog intriguing and believe it could be beneficial for your specific use case, we warmly encourage you to reach out to us today. Schedule a complimentary call with our team to explore how we can assist you.

Our expertise has supported individuals worldwide in developing stac-based applications to access the latest vegetation indices using an open-source stack. Let’s collaborate to bring your ideas to fruition.

Edit 1

Sometimes while calculating indices you might get values which are inappropriate. This happens because Data are scaled in storage, so that it fits into an unsigned 16 bit integer. Also, they’re offset, so that negative values can be stored, because sometimes there’s value in that.

To eliminate this, you can modify band values. by multipying with scale and remove offset.

def getstuff(band,prop):
return search.item_collection_as_dict()['features'][0]['assets'][band]['raster:bands'][0][prop]

data["red"] = data["red"] * getstuff('red', 'scale') + getstuff('red', 'offset')

This will give you modified red band value, similarly you can get modified band values and then use those in index calculation.

e.g. NDVI calculation will look like this

data['ndvi'] = ( 
(data["nir"] * getstuff('nir', 'scale') + getstuff('nir', 'offset'))
-
(data["red"] * getstuff('red', 'scale') + getstuff('red', 'offset'))
)/(
(data["nir"] * getstuff('nir', 'scale') + getstuff('nir', 'offset'))
+
(data["red"] * getstuff('red', 'scale') + getstuff('red', 'offset'))
)

--

--