Working with Spatial Data using FastAPI and GeoAlchemy

Bekzod Mirahmedov
12 min readJan 16, 2024
Aerial View of Geographic Information System (GIS) Mapping — Image Source: Land Surveys

Welcome to the tutorial on leveraging spatial data in your FastAPI applications using GeoAlchemy 2 and PostGIS. Throughout this guide, we will explore the process of storing geographical data efficiently with SQLAlchemy and GeoAlchemy 2, and showcase how to retrieve nearby U.S. cities (within certain distance) using either latitude/longitude coordinates or specific city, county, and state details.

This post offers a user-friendly, step-by-step guide that encourages hands-on learning, allowing you to actively code alongside the tutorial.

Tutorial Highlights

  1. Project Setup: Dependencies, Containerization (Docker and Docker Compose)
  2. Storing Geographical Data: async PostgreSQL (PostGIS), SQLAlchemy, GeoAlchemy 2 and Alembic for migrations
  3. Loading Geographical Data (U.S. Cities Data)
  4. Find Nearby Cities (located within certain distance) by:
    - City Details (city, county, and state information)
    - Coordinates (latitude and longitude)
  5. Conclusion
  6. Extra: Serialization of Geographical Data with Pydantic

GitHub repo with source code: https://github.com/notarious2/geolocations

Tools/Libaries used:

  • FastAPI, uvicorn
  • Pydantic, pydantic-extra-types
  • SQLAlchemy 2 and GeoAlchemy 2
  • Async PostgreSQL using asyncpg driver
  • Alembic for migrations
  • Docker for containerization
  • Poetry for dependency management

Project Setup

Create a dedicated folder, such as “geolocations,” and navigate to it using your preferred code editor. You can achieve this by entering the following commands in your terminal:

mkdir geolocations
cd geolocations
code . # if using VS Code

Install Poetry and initialize the project:

# installation  
curl -sSL https://install.python-poetry.org | python3 -
# intitiation
poetry init

follow the steps (don’t define dependencies interactively), and then add packages using:

poetry add fastapi uvicorn alembic sqlalchemy geoalchemy2 asyncpg alembic pydantic-extra-types

create main.py file:

from fastapi import FastAPI


app = FastAPI()

@app.get("/")
async def root():
return {"message": "Hello World"}

Containerization

Create Dockerfile

FROM python:3.11-slim

RUN apt-get update && apt-get -y upgrade

ENV \
PYTHONFAULTHANDLER=1 \
PYTHONUNBUFFERED=1 \
PYTHONHASHSEED=random \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_DEFAULT_TIMEOUT=100

ENV \
POETRY_HOME="/opt/poetry" \
POETRY_NO_INTERACTION=1 \
POETRY_VERSION=1.5.1

EXPOSE 8000

WORKDIR /opt/geolocations

COPY poetry.lock pyproject.toml ./
RUN pip install "poetry==$POETRY_VERSION"
RUN poetry export --output requirements.txt
RUN pip install --no-deps -r requirements.txt

COPY . .

Create docker-compose.yml to build the FastAPI application using the specified Dockerfile and setup database using image for a PostgreSQL database with PostGIS extension.

version: "3.8"

services:
web:
build:
context: .
dockerfile: Dockerfile
container_name: geolocations-backend
ports:
- "8000:8000"
volumes:
- .:/opt/geolocations
depends_on:
- db
command: uvicorn main:app --host=0.0.0.0 --port=8000 --reload

db:
image: postgis/postgis:15-3.4-alpine
container_name: geolocations-postgres
restart: always
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresql/data/

volumes:
postgres_data:

Once you create Dockerfile and docker-compose.yml run in the terminal:
docker-compose build and then docker-compose up.

To ensure that everything is set up correctly, navigate to http://127.0.0.1:8000/ in your web browser. You should see the following response: {"message":"Hello World"}.

Database configuration

Create database.py to implement SQLAlchemy's declarative table configuration using the declarative base class. The get_async_session function will serve as a dependency injection mechanism for obtaining a database session in endpoints.

from typing import AsyncGenerator

from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker


class Base(DeclarativeBase):
pass


DATABASE_URL = (
f"postgresql+asyncpg://"
f"postgres:postgres@geolocations-postgres:5432/postgres"
)


engine = create_async_engine(DATABASE_URL)
async_session_maker = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)


async def get_async_session() -> AsyncGenerator[AsyncSession, None]:
async with async_session_maker() as session:
yield session

Create Model

Create a models.py file and populate it with the following content:

from geoalchemy2 import Geometry, WKBElement
from sqlalchemy import Integer, String
from sqlalchemy.orm import Mapped, mapped_column

from database import Base


class City(Base):
__tablename__ = "city"

id: Mapped[int] = mapped_column(Integer, primary_key=True)
state_code: Mapped[str] = mapped_column(String(2))
state_name: Mapped[str] = mapped_column(String(50))
city: Mapped[str] = mapped_column(String(50))
county: Mapped[str] = mapped_column(String(50))
geo_location: Mapped[WKBElement] = mapped_column(
Geometry(geometry_type="POINT", srid=4326, spatial_index=True)
)

We will store city geography represented by latitude and longitude in the form of a “Point”.

A point is a discrete location on the surface of the planet, represented by an x-y coordinate pair. Each point on the map is created by latitude and longitude coordinates, and is stored as an individual record in a file.

Source: https://wiki.gis.com/wiki/index.php/Point_Feature_Class

Database Migrations with Alembic

Let’s first initialize Alembic using async template, run in the terminal:
docker exec -it geolocations-backend alembic init -t async alembic
This will create alembic folder at the root level.

# The project's structure so far:

alembic/
versions/
env.py
README
script.py.mako
alembic.ini
database.py
docker-compose.yml
Dockerfile
main.py
models.py
poetry.lock
pyproject.toml

We need to make certain adjustments to properly synchronize Alembic with our database.

First, open the alembic.ini file and assign the database URL to the sqlalchemy.url variable as follows:

sqlalchemy.url = postgresql+asyncpg://postgres:postgres@geolocations-postgres:5432/postgres

Next, open env.py file and make the following changes:

# add these imports
from database import Base
from models import City

# Change target_metadata to:
target_metadata = Base.metadata

Before creating Alembic migration script, we must address a known issue related to attempting to drop additional tables and indices created by the postgis/postgis image. To avoid this, specify the include_table argument (inside env.py) , using the following function:

def include_name(name, type_, parent_names):
if type_ == “schema”:
return False
else:
return True

pass this function to do_run_migrations function:

def do_run_migrations(connection: Connection) -> None:
context.configure(
connection=connection,
target_metadata=target_metadata,
include_name=include_name, # added
)

with context.begin_transaction():
context.run_migrations()

Run migrations

To generate migration script execute the following command in the terminal:

docker exec -it geolocations-backend alembic revision --autogenerate -m “create models”

You should get output similar to this:

Migration script will be generated in alembic/versions: (xxx_create_models.py)

Given we are autogenerating migrations and setting spatial_index=True , we should make further adjustments to the generated migration script (located in alembic/versions folder) as mentioned in the official Geolachemy 2 documentation:

the migration script misses the relevant imports from geoalchemy2.

the migration script will create the indexes of the spatial columns after the table is created, but these indexes are already automatically created during table creation, which will lead to an error.

Make the following adjustments to the migration script:

  1. Add the missing import statement:import geoalchemy2.
  2. Remove the create_index and drop_index statements from the upgrade() and downgrade() functions, respectively.

After implementing these modifications, execute the following command in the terminal to apply the migrations:

docker exec -it geolocations-backend alembic upgrade head

Load Geographical Data

We will make use of U.S. cities data provided in U.S. Cities Database Github repository, which contains data for 29,880 U.S. cities with city, county, state, latitude and longitude data. To streamline the process, we will download the provided CSV file and populate the database by reading its contents.

GET /load-cities endpoint will be created to populate the database. A dedicated helper function will check if the City table is empty before creating and adding new data, thereby preventing duplicates.

Create a services.py file with the following content:

from sqlalchemy import select, exists
from sqlalchemy.ext.asyncio import AsyncSession

from models import City


async def is_city_table_empty(db_session: AsyncSession):
query = select(City.id.isnot(None))
query = select(exists(query))
result = await db_session.execute(query)
table_exists = result.scalars().one()

return not (table_exists)

We add the endpoint in main.py, and it needs to be called only once.

# updated imports (includes previous imports)
import csv

from fastapi import Depends, FastAPI
from sqlalchemy.ext.asyncio import AsyncSession

from database import get_async_session
from models import City
from services import is_city_table_empty


@app.get("/load-cities")
async def load_cities(db_session: AsyncSession = Depends(get_async_session)):
if await is_city_table_empty(db_session):
cities = []
with open("us_cities.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")

# Skip the first row (header)
next(csv_reader)

for row in csv_reader:
city = City(
state_code=row[1],
state_name=row[2],
city=row[3],
county=row[4],
geo_location=f"POINT({row[5]} {row[6]})",
)
cities.append(city)

db_session.add_all(cities)
await db_session.commit()
return {"message": "Data loaded successfully"}

return {"message": "Data is already loaded"}

Code explanation:

  • The first row of the CSV file is disregarded as it contains header information: ID, STATE_CODE, STATE_NAME, CITY, COUNTY, LATITUDE, LONGITUDE.
  • Since latitude and longitude data are stored as a Geography Point, it is necessary to pass it in the form of a string as:
    f”POINT({latitude} {longitude})”.
  • To ensure the integrity of our data insertion process, SQLAlchemy’s add_all method is leveraged. This method is employed to insert multiple rows, encapsulated within the cities list. Its use guarantees that even if a single insert operation fails, all other rows will be rolled back.

Once we try this endpoint with Swagger at http://127.0.0.1:8000/docs/, we should observe a similar response:

Get Nearby cities by City Details

In this section, we will create an endpoint that allows the retrieval of cities located within a specified distance (radius) based on the provided city, county, and state information. This combination of the three ensures the uniqueness of the city.

Pydantic Schema

Create schemas.py file at the root level:

from pydantic import BaseModel, PositiveInt


class NearbyCitiesSchema(BaseModel):
city: str
county: str
state_code: str
km_within: PositiveInt

Add the following code to themain.py:



# updated imports (includes previous imports)
import csv

from fastapi import Depends, FastAPI, HTTPException, status
from geoalchemy2.functions import ST_DWithin, ST_GeogFromWKB
from sqlalchemy import and_, select
from sqlalchemy.ext.asyncio import AsyncSession

from database import get_async_session
from models import City
from schemas import NearbyCitiesSchema
from services import is_city_table_empty


@app.post("/nearby-cities-by-details")
async def get_nearby_cities_by_details(
nearby_cities_schema: NearbyCitiesSchema,
db_session: AsyncSession = Depends(get_async_session),
):
city, county, state_code, km_within = (
nearby_cities_schema.city,
nearby_cities_schema.county,
nearby_cities_schema.state_code,
nearby_cities_schema.km_within,
)

# Check if the target city exists and retrieve its geography
target_city_query = select(City).where(
and_(City.city == city, City.state_code == state_code, City.county == county)
)
result = await db_session.execute(target_city_query)
target_city = result.scalar_one_or_none()

# If the target city is not found, return an error message
if not target_city:
raise HTTPException(
status=status.HTTP_404_NOT_FOUND,
detail="City with provided details was not found",
)

# Extract the geography of the target city
target_geography = ST_GeogFromWKB(target_city.geo_location)

# Query nearby cities within the specified distance from the target city
nearby_cities_query = select(City.city).where(
ST_DWithin(City.geo_location, target_geography, 1000 * km_within)
)
result = await db_session.execute(nearby_cities_query)
nearby_cities = result.scalars().all()

return nearby_cities

Code explanation:

  • A database query is made to retrieve the specified target city based on the provided city name, county, and state code.
  • Upon successfully retrieving the target city, the geographic information is extracted from the geo_location field using GeoAlchemy2's ST_GeogFromWKB function.
  • Once the geography object of the target city is obtained, a subsequent query is executed to find cities within a specified distance range from the target city.
  • The condition for proximity is defined using GeoAlchemy2’s ST_DWithin function, which checks if cities are within a certain distance from the target city's geographic location. Since ST_DWithin uses meters, and the provided distance is in kilometers, the third argument to the function is multiplied by 1000. (Multiply by 1609 to get distance within miles)
  • Finally, the result is retrieved from the database session, and the names of nearby cities are extracted and returned as a list.

Example

To obtain the list of cities, including Baltimore, within a 3 km distance from Baltimore, MD, you can use the following API request:

Request
curl -X 'POST' \
'http://127.0.0.1:8000/nearby-cities-by-details' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"city": "Baltimore",
"county": "Baltimore",
"state_code": "MD",
"km_within": 3
}'
Response
[
"Arriba",
"Baltimore",
"Elkridge",
"Halethorpe",
"Harmans"
]

Demo

Swagger UI: Nearby Ctities by Details

Get Nearby cities by Latitude and Longitude

Instead of relying on the geographic information of a “target city” to locate nearby cities, we can implement an endpoint that takes latitude and longitude data to identify cities located in the specified proximity to those coordinates.

For instance, let’s consider finding nearby cities to a Walmart store located in Westminster, Maryland, U.S. We can obtain the store’s coordinates using Google Maps.

Walmart Store in the Google Maps — Source: Google Maps

Pydantic Schema

We can utilize pydantic_extra_types.coordinate module to validate Latitude and Longitude data. Let’s create a Pydantic schema in schemas.py with updated imports:

# updated imports
from pydantic import BaseModel, PositiveInt
from pydantic_extra_types.coordinate import Latitude, Longitude


class NearbyCitiesByCoordsSchema(BaseModel):
lat: Latitude
long: Longitude
km_within: PositiveInt

Add the following to the main.py:

# updated imports

import csv

from fastapi import Depends, FastAPI, HTTPException, status
from geoalchemy2.functions import ST_DWithin, ST_GeogFromText, ST_GeogFromWKB
from sqlalchemy import and_, select
from sqlalchemy.ext.asyncio import AsyncSession

from database import get_async_session
from models import City
from schemas import NearbyCitiesByCoordsSchema, NearbyCitiesSchema
from services import is_city_table_empty


@app.post("/nearby-cities-by-coordinates")
async def get_nearby_cities_by_coords(
coords_schema: NearbyCitiesByCoordsSchema,
db_session: AsyncSession = Depends(get_async_session),
):
lat, long, km_within = (
coords_schema.lat,
coords_schema.long,
coords_schema.km_within,
)

target_geography = ST_GeogFromText(f"POINT({lat} {long})", srid=4326)

nearby_cities_query = select(City.city).where(
ST_DWithin(City.geo_location, target_geography, 1000 * km_within)
)
result = await db_session.execute(nearby_cities_query)
nearby_cities = result.scalars().all()

return nearby_cities

Code explanation:

  1. Extract latitude, longitude, and distance range from the input schema.
  2. Create a geographic point (target_geography) using the provided coordinates and the GeoAlchemy2 library's ST_GeogFromWKB function.
  3. Execute a database query to find cities within the specified distance from the target geography using the ST_GeogFromText function.
  4. Retrieve and return the names of nearby cities as a list.

Example

Continuing with the Walmart store illustration, we can provide its coordinates (latitude: 39.58586, longitude: -76.98407) and retrieve nearby cities within a 10 km radius.

Request
curl -X 'POST' \
'http://127.0.0.1:8000/nearby-cities-by-coordinates' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"lat": 39.58569,
"long": -76.98407,
"km_within": 10
}
Response
[
"Cope",
"Flagler",
"Cooksville",
"Dayton",
"Finksburg",
"Glenelg",
"Glenwood",
"Sykesville",
"West Friendship",
"Westminster",
"Abbottstown",
"East Berlin",
"Hanover",
"Mc Sherrystown"
]

Demo

Conclusion

In conclusion, this tutorial has guided you through the process of incorporating spatial data into your FastAPI applications using GeoAlchemy 2 and PostGIS. The tutorial explores efficient storage of geographical data with async PostgreSQL, SQLAlchemy, GeoAlchemy 2, and Alembic for migrations.

Two APIs have been implemented to showcase capability to query nearby cities through distinct approaches. The first API allows users to retrieve nearby cities based on specific city details (present in the database), including city name, county, and state information. The second API enables to obtain a list of nearby cities using latitude and longitude data

The final code is available here: https://github.com/notarious2/geolocations

Extra: Serializing Geography Data

We have learned how to store geographical data and perform necessary queries/manipulations based on it. Now, let’s consider a scenario where we have geographical data, but the task at hand is to retrieve or send it to the frontend. This is not as straightforward as it may seem and will require some effort.

For example, we want to retrieve all cities within a specific state. Response should contain city, county, state, and geographical information in the Well-Known Text (WKT) format, represented as “POINT (latitude longitude).”

The geography point field we created earlier, geo_location, is stored in the Well-Known Binary (WKB) format under geoalchemy2.elements.WKBElement.

Pydantic does not support such conversion by default. However, we can utilize the Shapely library to transform the geo_location field format from WKB to WKT. This transformation can be achieved using Pydantic's field validator with the "before" mode, which executes prior to Pydantic's internal parsing and validation processes.

First, install Shapely Integration: poetry add geoalchemy2[shapely]. You must be rebuild docker-compose to apply changes (Stop docker-compose, run docker-compose build & docker-compose up).

Schema

# updated imports
from geoalchemy2.shape import to_shape
from pydantic import BaseModel, PositiveInt, field_validator
from pydantic_extra_types.coordinate import Latitude, Longitude

class CitySchema(BaseModel):
city: str
county: str
state_code: str
state_name: str
geo_location: str

@field_validator("geo_location", mode="before")
def turn_geo_location_into_wkt(cls, value):
return to_shape(value).wkt

In our schema, we convert the geo_location from WKB to WKT, making the field a string type.

Retrieve cities in a state, including the geography field in WKT format

in main.py

# updated imports 
import csv

from fastapi import Depends, FastAPI, HTTPException, Path, status
from geoalchemy2.functions import ST_DWithin, ST_GeogFromText, ST_GeogFromWKB
from sqlalchemy import and_, select
from sqlalchemy.ext.asyncio import AsyncSession

from database import get_async_session
from models import City
from schemas import CitySchema, NearbyCitiesByCoordsSchema, NearbyCitiesSchema
from services import is_city_table_empty


@app.get("/cities/{state_code}", response_model=list[CitySchema])
async def get_cities_in_state(
state_code: str = Path(..., min_length=2, max_length=2),
db_session: AsyncSession = Depends(get_async_session),
):
query = select(City).where(City.state_code == state_code.upper())
result = await db_session.execute(query)
cities = result.scalars().all()

if not cities:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"No cities found for provided state: {state_code}",
)

return cities

Code explanation:

  • state_code is a path parameter, which is constrained to have an exact length of 2 characters.
  • The SQLAlchemy query is constructed to filter cities where the state code matches the provided parameter.
  • If no cities are found, a Not Found response is returned with a corresponding detail message.
  • If cities are found, retrieved cities are returned as a response based on CitySchema response_model=list[CitySchema]

Example

The following is an example to get the list of cities in the state of Maryland.

Request
curl -X 'GET' \
'http://127.0.0.1:8000/cities/md' \
-H 'accept: application/json'

Response
[
{
"city": "Abell",
"county": "Saint Marys",
"state_code": "MD",
"state_name": "Maryland",
"geo_location": "POINT (38.249554 -76.744104)"
},
{
"city": "Aberdeen",
"county": "Harford",
"state_code": "MD",
"state_name": "Maryland",
"geo_location": "POINT (39.510886 -76.18054)"
},
...
]

--

--