Working with Spatial Data using FastAPI and GeoAlchemy
Welcome to the tutorial on leveraging spatial data in your FastAPI applications using GeoAlchemy 2 and PostGIS. Throughout this guide, we will explore the process of storing geographical data efficiently with SQLAlchemy and GeoAlchemy 2, and showcase how to retrieve nearby U.S. cities (within certain distance) using either latitude/longitude coordinates or specific city, county, and state details.
This post offers a user-friendly, step-by-step guide that encourages hands-on learning, allowing you to actively code alongside the tutorial.
Tutorial Highlights
- Project Setup: Dependencies, Containerization (Docker and Docker Compose)
- Storing Geographical Data: async PostgreSQL (PostGIS), SQLAlchemy, GeoAlchemy 2 and Alembic for migrations
- Loading Geographical Data (U.S. Cities Data)
- Find Nearby Cities (located within certain distance) by:
- City Details (city, county, and state information)
- Coordinates (latitude and longitude) - Conclusion
- Extra: Serialization of Geographical Data with Pydantic
GitHub repo with source code: https://github.com/notarious2/geolocations
Tools/Libaries used:
- FastAPI, uvicorn
- Pydantic, pydantic-extra-types
- SQLAlchemy 2 and GeoAlchemy 2
- Async PostgreSQL using asyncpg driver
- Alembic for migrations
- Docker for containerization
- Poetry for dependency management
Project Setup
Create a dedicated folder, such as “geolocations,” and navigate to it using your preferred code editor. You can achieve this by entering the following commands in your terminal:
mkdir geolocations
cd geolocations
code . # if using VS Code
Install Poetry and initialize the project:
# installation
curl -sSL https://install.python-poetry.org | python3 -
# intitiation
poetry init
follow the steps (don’t define dependencies interactively), and then add packages using:
poetry add fastapi uvicorn alembic sqlalchemy geoalchemy2 asyncpg alembic pydantic-extra-types
create main.py
file:
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def root():
return {"message": "Hello World"}
Containerization
Create Dockerfile
FROM python:3.11-slim
RUN apt-get update && apt-get -y upgrade
ENV \
PYTHONFAULTHANDLER=1 \
PYTHONUNBUFFERED=1 \
PYTHONHASHSEED=random \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_DEFAULT_TIMEOUT=100
ENV \
POETRY_HOME="/opt/poetry" \
POETRY_NO_INTERACTION=1 \
POETRY_VERSION=1.5.1
EXPOSE 8000
WORKDIR /opt/geolocations
COPY poetry.lock pyproject.toml ./
RUN pip install "poetry==$POETRY_VERSION"
RUN poetry export --output requirements.txt
RUN pip install --no-deps -r requirements.txt
COPY . .
Create docker-compose.yml
to build the FastAPI application using the specified Dockerfile
and setup database using image for a PostgreSQL database with PostGIS extension.
version: "3.8"
services:
web:
build:
context: .
dockerfile: Dockerfile
container_name: geolocations-backend
ports:
- "8000:8000"
volumes:
- .:/opt/geolocations
depends_on:
- db
command: uvicorn main:app --host=0.0.0.0 --port=8000 --reload
db:
image: postgis/postgis:15-3.4-alpine
container_name: geolocations-postgres
restart: always
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports:
- 5432:5432
volumes:
- postgres_data:/var/lib/postgresql/data/
volumes:
postgres_data:
Once you create Dockerfile
and docker-compose.yml
run in the terminal:docker-compose build
and then docker-compose up
.
To ensure that everything is set up correctly, navigate to http://127.0.0.1:8000/ in your web browser. You should see the following response: {"message":"Hello World"}
.
Database configuration
Create database.py
to implement SQLAlchemy's declarative table configuration using the declarative base class. The get_async_session
function will serve as a dependency injection mechanism for obtaining a database session in endpoints.
from typing import AsyncGenerator
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from sqlalchemy.orm import DeclarativeBase, sessionmaker
class Base(DeclarativeBase):
pass
DATABASE_URL = (
f"postgresql+asyncpg://"
f"postgres:postgres@geolocations-postgres:5432/postgres"
)
engine = create_async_engine(DATABASE_URL)
async_session_maker = sessionmaker(engine, class_=AsyncSession, expire_on_commit=False)
async def get_async_session() -> AsyncGenerator[AsyncSession, None]:
async with async_session_maker() as session:
yield session
Create Model
Create a models.py
file and populate it with the following content:
from geoalchemy2 import Geometry, WKBElement
from sqlalchemy import Integer, String
from sqlalchemy.orm import Mapped, mapped_column
from database import Base
class City(Base):
__tablename__ = "city"
id: Mapped[int] = mapped_column(Integer, primary_key=True)
state_code: Mapped[str] = mapped_column(String(2))
state_name: Mapped[str] = mapped_column(String(50))
city: Mapped[str] = mapped_column(String(50))
county: Mapped[str] = mapped_column(String(50))
geo_location: Mapped[WKBElement] = mapped_column(
Geometry(geometry_type="POINT", srid=4326, spatial_index=True)
)
We will store city geography represented by latitude and longitude in the form of a “Point”.
A point is a discrete location on the surface of the planet, represented by an x-y coordinate pair. Each point on the map is created by latitude and longitude coordinates, and is stored as an individual record in a file.
Source: https://wiki.gis.com/wiki/index.php/Point_Feature_Class
Database Migrations with Alembic
Let’s first initialize Alembic using async template, run in the terminal:docker exec -it geolocations-backend alembic init -t async alembic
This will create alembic
folder at the root level.
# The project's structure so far:
alembic/
versions/
env.py
README
script.py.mako
alembic.ini
database.py
docker-compose.yml
Dockerfile
main.py
models.py
poetry.lock
pyproject.toml
We need to make certain adjustments to properly synchronize Alembic with our database.
First, open the alembic.ini
file and assign the database URL to the sqlalchemy.url
variable as follows:
sqlalchemy.url = postgresql+asyncpg://postgres:postgres@geolocations-postgres:5432/postgres
Next, open env.py
file and make the following changes:
# add these imports
from database import Base
from models import City
# Change target_metadata to:
target_metadata = Base.metadata
Before creating Alembic migration script, we must address a known issue related to attempting to drop additional tables and indices created by the postgis/postgis image. To avoid this, specify the include_table
argument (inside env.py
) , using the following function:
def include_name(name, type_, parent_names):
if type_ == “schema”:
return False
else:
return True
pass this function to do_run_migrations
function:
def do_run_migrations(connection: Connection) -> None:
context.configure(
connection=connection,
target_metadata=target_metadata,
include_name=include_name, # added
)
with context.begin_transaction():
context.run_migrations()
Run migrations
To generate migration script execute the following command in the terminal:
docker exec -it geolocations-backend alembic revision --autogenerate -m “create models”
You should get output similar to this:
Migration script will be generated in alembic/versions: (xxx_create_models.py)
Given we are autogenerating migrations and setting spatial_index=True
, we should make further adjustments to the generated migration script (located in alembic/versions folder) as mentioned in the official Geolachemy 2 documentation:
the migration script misses the relevant imports from
geoalchemy2
.the migration script will create the indexes of the spatial columns after the table is created, but these indexes are already automatically created during table creation, which will lead to an error.
Make the following adjustments to the migration script:
- Add the missing import statement:
import geoalchemy2
. - Remove the
create_index
anddrop_index
statements from theupgrade()
anddowngrade()
functions, respectively.
After implementing these modifications, execute the following command in the terminal to apply the migrations:
docker exec -it geolocations-backend alembic upgrade head
Load Geographical Data
We will make use of U.S. cities data provided in U.S. Cities Database Github repository, which contains data for 29,880 U.S. cities with city, county, state, latitude and longitude data. To streamline the process, we will download the provided CSV file and populate the database by reading its contents.
GET /load-cities
endpoint will be created to populate the database. A dedicated helper function will check if the City table is empty before creating and adding new data, thereby preventing duplicates.
Create a services.py
file with the following content:
from sqlalchemy import select, exists
from sqlalchemy.ext.asyncio import AsyncSession
from models import City
async def is_city_table_empty(db_session: AsyncSession):
query = select(City.id.isnot(None))
query = select(exists(query))
result = await db_session.execute(query)
table_exists = result.scalars().one()
return not (table_exists)
We add the endpoint in main.py
, and it needs to be called only once.
# updated imports (includes previous imports)
import csv
from fastapi import Depends, FastAPI
from sqlalchemy.ext.asyncio import AsyncSession
from database import get_async_session
from models import City
from services import is_city_table_empty
@app.get("/load-cities")
async def load_cities(db_session: AsyncSession = Depends(get_async_session)):
if await is_city_table_empty(db_session):
cities = []
with open("us_cities.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
# Skip the first row (header)
next(csv_reader)
for row in csv_reader:
city = City(
state_code=row[1],
state_name=row[2],
city=row[3],
county=row[4],
geo_location=f"POINT({row[5]} {row[6]})",
)
cities.append(city)
db_session.add_all(cities)
await db_session.commit()
return {"message": "Data loaded successfully"}
return {"message": "Data is already loaded"}
Code explanation:
- The first row of the CSV file is disregarded as it contains header information: ID, STATE_CODE, STATE_NAME, CITY, COUNTY, LATITUDE, LONGITUDE.
- Since latitude and longitude data are stored as a Geography Point, it is necessary to pass it in the form of a string as:
f”POINT({latitude} {longitude})”.
- To ensure the integrity of our data insertion process, SQLAlchemy’s
add_all
method is leveraged. This method is employed to insert multiple rows, encapsulated within thecities
list. Its use guarantees that even if a single insert operation fails, all other rows will be rolled back.
Once we try this endpoint with Swagger at http://127.0.0.1:8000/docs/
, we should observe a similar response:
Get Nearby cities by City Details
In this section, we will create an endpoint that allows the retrieval of cities located within a specified distance (radius) based on the provided city, county, and state information. This combination of the three ensures the uniqueness of the city.
Pydantic Schema
Create schemas.py
file at the root level:
from pydantic import BaseModel, PositiveInt
class NearbyCitiesSchema(BaseModel):
city: str
county: str
state_code: str
km_within: PositiveInt
Add the following code to themain.py
:
# updated imports (includes previous imports)
import csv
from fastapi import Depends, FastAPI, HTTPException, status
from geoalchemy2.functions import ST_DWithin, ST_GeogFromWKB
from sqlalchemy import and_, select
from sqlalchemy.ext.asyncio import AsyncSession
from database import get_async_session
from models import City
from schemas import NearbyCitiesSchema
from services import is_city_table_empty
@app.post("/nearby-cities-by-details")
async def get_nearby_cities_by_details(
nearby_cities_schema: NearbyCitiesSchema,
db_session: AsyncSession = Depends(get_async_session),
):
city, county, state_code, km_within = (
nearby_cities_schema.city,
nearby_cities_schema.county,
nearby_cities_schema.state_code,
nearby_cities_schema.km_within,
)
# Check if the target city exists and retrieve its geography
target_city_query = select(City).where(
and_(City.city == city, City.state_code == state_code, City.county == county)
)
result = await db_session.execute(target_city_query)
target_city = result.scalar_one_or_none()
# If the target city is not found, return an error message
if not target_city:
raise HTTPException(
status=status.HTTP_404_NOT_FOUND,
detail="City with provided details was not found",
)
# Extract the geography of the target city
target_geography = ST_GeogFromWKB(target_city.geo_location)
# Query nearby cities within the specified distance from the target city
nearby_cities_query = select(City.city).where(
ST_DWithin(City.geo_location, target_geography, 1000 * km_within)
)
result = await db_session.execute(nearby_cities_query)
nearby_cities = result.scalars().all()
return nearby_cities
Code explanation:
- A database query is made to retrieve the specified target city based on the provided city name, county, and state code.
- Upon successfully retrieving the target city, the geographic information is extracted from the
geo_location
field using GeoAlchemy2'sST_GeogFromWKB
function. - Once the geography object of the target city is obtained, a subsequent query is executed to find cities within a specified distance range from the target city.
- The condition for proximity is defined using GeoAlchemy2’s
ST_DWithin
function, which checks if cities are within a certain distance from the target city's geographic location. SinceST_DWithin
uses meters, and the provided distance is in kilometers, the third argument to the function is multiplied by 1000. (Multiply by 1609 to get distance within miles) - Finally, the result is retrieved from the database session, and the names of nearby cities are extracted and returned as a list.
Example
To obtain the list of cities, including Baltimore, within a 3 km distance from Baltimore, MD, you can use the following API request:
Request
curl -X 'POST' \
'http://127.0.0.1:8000/nearby-cities-by-details' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"city": "Baltimore",
"county": "Baltimore",
"state_code": "MD",
"km_within": 3
}'
Response
[
"Arriba",
"Baltimore",
"Elkridge",
"Halethorpe",
"Harmans"
]
Demo
Get Nearby cities by Latitude and Longitude
Instead of relying on the geographic information of a “target city” to locate nearby cities, we can implement an endpoint that takes latitude
and longitude
data to identify cities located in the specified proximity to those coordinates.
For instance, let’s consider finding nearby cities to a Walmart store located in Westminster, Maryland, U.S. We can obtain the store’s coordinates using Google Maps.
Pydantic Schema
We can utilize pydantic_extra_types.coordinate
module to validate Latitude and Longitude data. Let’s create a Pydantic schema in schemas.py
with updated imports:
# updated imports
from pydantic import BaseModel, PositiveInt
from pydantic_extra_types.coordinate import Latitude, Longitude
class NearbyCitiesByCoordsSchema(BaseModel):
lat: Latitude
long: Longitude
km_within: PositiveInt
Add the following to the main.py
:
# updated imports
import csv
from fastapi import Depends, FastAPI, HTTPException, status
from geoalchemy2.functions import ST_DWithin, ST_GeogFromText, ST_GeogFromWKB
from sqlalchemy import and_, select
from sqlalchemy.ext.asyncio import AsyncSession
from database import get_async_session
from models import City
from schemas import NearbyCitiesByCoordsSchema, NearbyCitiesSchema
from services import is_city_table_empty
@app.post("/nearby-cities-by-coordinates")
async def get_nearby_cities_by_coords(
coords_schema: NearbyCitiesByCoordsSchema,
db_session: AsyncSession = Depends(get_async_session),
):
lat, long, km_within = (
coords_schema.lat,
coords_schema.long,
coords_schema.km_within,
)
target_geography = ST_GeogFromText(f"POINT({lat} {long})", srid=4326)
nearby_cities_query = select(City.city).where(
ST_DWithin(City.geo_location, target_geography, 1000 * km_within)
)
result = await db_session.execute(nearby_cities_query)
nearby_cities = result.scalars().all()
return nearby_cities
Code explanation:
- Extract latitude, longitude, and distance range from the input schema.
- Create a geographic point (
target_geography
) using the provided coordinates and theGeoAlchemy2
library'sST_GeogFromWKB
function. - Execute a database query to find cities within the specified distance from the target geography using the
ST_GeogFromText
function. - Retrieve and return the names of nearby cities as a list.
Example
Continuing with the Walmart store illustration, we can provide its coordinates (latitude: 39.58586, longitude: -76.98407) and retrieve nearby cities within a 10 km radius.
Request
curl -X 'POST' \
'http://127.0.0.1:8000/nearby-cities-by-coordinates' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"lat": 39.58569,
"long": -76.98407,
"km_within": 10
}
Response
[
"Cope",
"Flagler",
"Cooksville",
"Dayton",
"Finksburg",
"Glenelg",
"Glenwood",
"Sykesville",
"West Friendship",
"Westminster",
"Abbottstown",
"East Berlin",
"Hanover",
"Mc Sherrystown"
]
Demo
Conclusion
In conclusion, this tutorial has guided you through the process of incorporating spatial data into your FastAPI applications using GeoAlchemy 2 and PostGIS. The tutorial explores efficient storage of geographical data with async PostgreSQL, SQLAlchemy, GeoAlchemy 2, and Alembic for migrations.
Two APIs have been implemented to showcase capability to query nearby cities through distinct approaches. The first API allows users to retrieve nearby cities based on specific city details (present in the database), including city name, county, and state information. The second API enables to obtain a list of nearby cities using latitude and longitude data
The final code is available here: https://github.com/notarious2/geolocations
Extra: Serializing Geography Data
We have learned how to store geographical data and perform necessary queries/manipulations based on it. Now, let’s consider a scenario where we have geographical data, but the task at hand is to retrieve or send it to the frontend. This is not as straightforward as it may seem and will require some effort.
For example, we want to retrieve all cities within a specific state. Response should contain city, county, state, and geographical information in the Well-Known Text (WKT) format, represented as “POINT (latitude longitude).”
The geography point field we created earlier, geo_location
, is stored in the Well-Known Binary (WKB) format under geoalchemy2.elements.WKBElement
.
Pydantic does not support such conversion by default. However, we can utilize the Shapely
library to transform the geo_location
field format from WKB to WKT. This transformation can be achieved using Pydantic's field validator with the "before" mode, which executes prior to Pydantic's internal parsing and validation processes.
First, install Shapely Integration: poetry add geoalchemy2[shapely]
. You must be rebuild docker-compose to apply changes (Stop docker-compose, run docker-compose build & docker-compose up
).
Schema
# updated imports
from geoalchemy2.shape import to_shape
from pydantic import BaseModel, PositiveInt, field_validator
from pydantic_extra_types.coordinate import Latitude, Longitude
class CitySchema(BaseModel):
city: str
county: str
state_code: str
state_name: str
geo_location: str
@field_validator("geo_location", mode="before")
def turn_geo_location_into_wkt(cls, value):
return to_shape(value).wkt
In our schema, we convert the geo_location
from WKB to WKT, making the field a string type.
Retrieve cities in a state, including the geography field in WKT format
in main.py
# updated imports
import csv
from fastapi import Depends, FastAPI, HTTPException, Path, status
from geoalchemy2.functions import ST_DWithin, ST_GeogFromText, ST_GeogFromWKB
from sqlalchemy import and_, select
from sqlalchemy.ext.asyncio import AsyncSession
from database import get_async_session
from models import City
from schemas import CitySchema, NearbyCitiesByCoordsSchema, NearbyCitiesSchema
from services import is_city_table_empty
@app.get("/cities/{state_code}", response_model=list[CitySchema])
async def get_cities_in_state(
state_code: str = Path(..., min_length=2, max_length=2),
db_session: AsyncSession = Depends(get_async_session),
):
query = select(City).where(City.state_code == state_code.upper())
result = await db_session.execute(query)
cities = result.scalars().all()
if not cities:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"No cities found for provided state: {state_code}",
)
return cities
Code explanation:
state_code
is a path parameter, which is constrained to have an exact length of 2 characters.- The SQLAlchemy query is constructed to filter cities where the state code matches the provided parameter.
- If no cities are found, a Not Found response is returned with a corresponding detail message.
- If cities are found, retrieved cities are returned as a response based on CitySchema
response_model=list[CitySchema]
Example
The following is an example to get the list of cities in the state of Maryland.
Request
curl -X 'GET' \
'http://127.0.0.1:8000/cities/md' \
-H 'accept: application/json'
Response
[
{
"city": "Abell",
"county": "Saint Marys",
"state_code": "MD",
"state_name": "Maryland",
"geo_location": "POINT (38.249554 -76.744104)"
},
{
"city": "Aberdeen",
"county": "Harford",
"state_code": "MD",
"state_name": "Maryland",
"geo_location": "POINT (39.510886 -76.18054)"
},
...
]