A “Tricky” Ways To Ingest Esri Versioned Geodatabase using Geopandas and SQLAlchemy

Achmad Fauzi Bagus Firmansyah
5 min readSep 2, 2019

--

Until now, many spatial databases have been introduced. From open sources, like PostGIS, until the enterprise one like ArcSDE by Esri. FYI, Esri has Geodatabase Extension that enables the user to store the spatial data in database ways, manage the versioning, create a replica, and connect it to the ArcGIS Portal. And this geodatabase doesn’t work alone. First, you must install supported databases, like PostgreSQL or Oracle or MSSQL. Esri use extension, named ‘sde’ to create a geodatabase from them.

Versioned ArcGIS Enterprise Geodatabase

Well, I won’t cover the whole thing about ArcSDE but I will focus on the “Versioning Geodatabase”. Versioning is a feature on ArcSDE that enables the user to create some version of data in the database. Let’s see you have one data and you want to create some version of the same data, so you have data version A as parent version and version B as a child version. But both are the same object. Next, you can share the version to other people based on your defined privileges. So people on the other side can read and edit the data the version B, and the other just read version A. Next, when one person edit the data, cause you gave him/her the privilege, you can keep the original/parent data or merge the data.

Another feature in versioned geodatabase is you can control the accessibility of version data. You can choose ‘public’ so everyone can edit and read, or ‘private’ so only you can edit and read, or ‘protected’ so everyone can read but only you can edit.

The three scenarios of versioning. (Source: https://www.esri.com/news/arcuser/0110/versioning101.html)

The Ingestion Problem

The editing and reading action can be done toward ArcGIS with specific licensed. So when you have the data you can ingest data to geodatabase with ArcGIS. But, as you know, there are many types of spatial data. Most people use shapefile format, but in my experiences, the frequency of using shapefile format is less than using geojson file. Well, this is the limitation of ArcGIS (I used 10.5 version) and I can’t ingest my geojson data directly. I must convert the geojson to shapefile and ingest the data. Imagine there are more than 500 geojson files, if I convert it one by one, it’s exhausting.

So after doing some experiment, I know that ArcGIS Geodatabase is merely a database. And in the database, you can do ingest data toward SQL. Like the CSV and other spreadsheet files, you can create a simple program to read and transfer data to the database. So I just need to find how to read geojson and transfer it to the database. And VOILA…. I know Python can do the trick, with Geopandas package I can read geojson files and using SQLAlchemy I can transfer it to the database. Nice!

The Process

Here is the workflow.

Workflow of my operation

First, prepare the package we used and then read the .geojson data. Let’s say it’s tes.geojson.

import geopandas as gpd
from sqlalchemy import create_engine
test=gpd.read_file('test.geojson')

Next, create the string of database connection. Cause my ArcGIS Geodatabase run on PostgreSQL, I create the PostgreSQL string connection. Then, create the engine and connect it.

db_string="postgres://user:password@host_address:port/db"
db = create_engine(db_string)
con=db.connect()

Notes: Change string value on user, password, host_address, port, and db with the appropriate value.

Here the important part. Do the step properly!

  1. Set the current version to your desired version. If you don’t specify the desired version, it will use the ‘DEFAULT’ version. And mostly, you must be a super admin /geodatabase administrator to edit the data.
  2. Start your edit session. This is important for versioning geodatabase. You must start the edit session before insert data. (Eg: version ‘mapping_2019’)
  3. Do inserting the data on the materialized view of your data. Unlike the other database, the ArcSDE does store data on materialized view for versioning. If your data is ‘alpha’, it will store data in ‘alpha_evw’. The ‘_evw’ subfix is key to point to the materialized view. (Eg: version ‘test_evw’)
  4. Insert the data. To be able uploading the geometry data, you must use sde.st_geometry() function. This function needs 2 parameters, that are 1) well known text (WKT) version of geometry and 2) string of crs projection id. Don’t be panic, GeoPandas will do the magic for change the geometry on geom column to WKT format (using wkt function on geometry). Here I used WGS 84 which crs id is 4326. Note: in ArcSDE, geometry column named as ‘shape’
  5. Next, ‘commit’ the data to actually insert the data or ‘rollback’ to undo the insert.
  6. Last, close your edit session. Please remember, always closes your edit session because when you don’t, the user can’t do the edit on ArcGIS.

This is the code:

con.execute('select sde.sde_set_current_version(\'mapping_2019\');')
con.execute('select sde.sde_edit_version(\'mapping_2019\',1)')
for index, row in test.iterrows():
try:
sql_=('insert into sde.test_evw(shape,test_name ) values('+
'sde.st_geometry(\''+ row.geometry.wkt + '\' ,4326),'+
'\''+str(row.name)+'\')'
con.execute(sql_)
con.execute('COMMIT')
except:
con.execute('ROLLBACK')
con.execute('select sde.sde_edit_version(\'mapping_2019\',2)')
con.close()
db.dispose()

Checking the Data

Next, you do inserting the data. But to make sure you have done that correctly, just read the data. How? You can do it on ArcGIS or you can do it with Python ways. Here is the trick:

  1. Import the package and prepare the data. Unlike before, we import two additional packages, that are 1) ‘pandas’ for importing table into dataframe, and 2) ‘shapely’ for converting wkt into a geometry that known in geopandas.
  2. Creating a string of database connection.
  3. Set the desired version
  4. And do select using sde.st_astext(). The result will be a Pandas Dataframe with shape as geometry column.
  5. Convert the dataframe into GeoDataFrame and utilize the shapely wkt into geometry. Last but important, specify CRS. Then, you can print or plot the data.

Here is the code:

import geopandas as gpd
from sqlalchemy import create_engine
import pandas as pd
from shapely import wkt
db_string="postgres://user:password@host_address:port/db"
db = create_engine(db_string)
con=db.connect()
con.execute('select sde.sde_set_current_version(\'mapping_2019\');') y_=pd.read_sql('select text_name,sde.st_astext(shape) as geometry from test_evw',con=con)
con.close()
db.dispose()
y_['geometry'] = y_['geometry'].apply(wkt.loads)
gdf=gpd.GeoDataFrame(y_,geometry='geometry')
gdf.crs = {'init' :'epsg:4326'}

So, that are my ‘tricky’ ways to ingest data into Esri Versioned Geodatabase with Geopandas and SQLAlchemy. Here, I used PostgreSQL as base of geodatabase. If you use other database platform, just adjust the code.

Additional Notes

  • To ingest data into Esri Geodatabase, you must create the schema first and enable the versioning of your data.
  • Like the other operations in SQL, please ensure you have the privileges to inserting the data.
  • It’s important to know the type of your version data, private or public or protected.
  • Create the connection efficiently, cause there is a restriction about time out operation.
  • If there is a deadlock (cause failure of operation), contact the admin to remove the lock.
  • Always use ‘try’ and ‘except’ in your code, to minimalize the error found.

--

--

Achmad Fauzi Bagus Firmansyah

An data scientist that love machine learning, statistics, and spatial analysis.