Esri, Microsoft join UC San Diego teaching practical geospatial data science and deep learning

Dmitry Kudinov
GeoAI
Published in
5 min readJun 3, 2019

--

In April — May 2019, University of California San Diego, Esri and Microsoft joined forces teaching the DSC 170 — Spatial Data Science class on La Jolla campus, led by Dr. Ilya Zaslavsky, Director of Spatial Information Systems Laboratory, San Diego Supercomputer Center.

Practical Deep Learning in GIS

It was a first of its kind opportunity for the students to learn about the concepts, methodologies, real-world applications and use-cases, and, most importantly, gain a hands-on practical experience of using Deep Learning tools with raw geospatial data to come to insights, extract knowledge, and produce valuable information products.

As part of the class, students were provided with access to powerful Azure Cloud virtual machines, equipped with NVDIA Quadro GP100 GPUs, to learn and complete the full-cycle training of a Single Shot MultiBox Detector (SSD) model to detect palm trees and houses in aerial imagery.

In this exercise, students created their own training set using ArcGIS Pro 2.3.2 desktop application, then exported the set into the Pascal VOC format supported by most of the machine learning frameworks. Next, with the help of ArcGIS API for Python 1.6.1, students trained their own SSD convolutional neural network models to detect and classify objects in the input imagery.

Labeling data in ArcGIS Pro is simple and fast.

Students learned about the SSD network architecture, experimented through multiple iterations with the SSD constructor in search for best detection and classification accuracy, figured out an optimal learning rate value, and monitored the training for signs of overfitting.

Training and Validation losses from one of the student submissions.

After the SSD model was successfully trained, students were asked to apply it to a much larger geographic area using the built-in “Detect Objects Using Deep Learning” geoprocessing tool, which allows for efficient tiling and batch inferencing of extremely large rasters.

Students experimented with Non-maximum Suppression post-processing on top of raw detections and were asked to reason about further steps to improve the detection accuracy.

One of the student submissions: using trained Single Shot MultiBox Detector model inside ArcGIS Pro to automatically find thousands of palms and hundreds of buildings in aerial imagery.

The resulting feature layers with palm tree and house detections were published to ArcGIS Online as hosted feature services, and submitted this way for grading . The students’ Jupyter Notebooks with the Single Shot Detector training and validation code were submitted for evaluation via Gradescope.

On average, every student spent about 5–6 hours of GPU time while experimenting, training, and running inference with various Single Shot Detector models.

As a result, we received a strong and positive feedback from the students and faculty, a few solid internship applications, and a request to repeat and extend these practical exercises in the oncoming class offerings.

The importance of “Geospatial” in Machine Learning

In one of the lectures, students worked through a remarkable example of training a scikit-learn Random Forest Regressor predicting child asthma rates using partial census-tract data from Connecticut. The trained Regressor was later used to predict the asthma rates for the census tracts which did not have the rate values populated.

The results with the scikit-learn Random Forest Regressor were not ideal, but still fairly good: the R² on the Test set was at 0.704, with the following importance of the explanatory variables (these were added to the original census-tract data using the ArcGIS Online GeoEnrichment service):

1. Smoked cigarettes in last 12 months: Percent = 34%

2. ACS HHs:Inc at/Above Poverty Level: Percent = 33%

3. 2018 Unemployment Rate = 20%

4. 2018 Median Household Income = 5%

5. 2018 Education: High School Diploma, Percent = 4%

6. 2018 Education: Bachelor`s Degree, Percent = 4%

Then, the set of the above six explanatory variables was expanded to include distance-to-roads, road-density, and pollution proximity rasters establishing geospatial relationship between each census tract and the transportation graph, as well as air pollution sources.

Since scikit-learn Regressor does not work with raster data out of the box, students were demonstrated the results of using the Forest-based Classification and Regression tool from the Spatial Statistics toolbox to train a new ArcGIS Random Forest Regressor. The results were truly impressive — with the additional cost rasters, the ArcGIS Random Forest Regressor was able to achieve the R² of 0.876 on the Test set — more than 17% improvement over the original scikit-learn Regressor which did not rely on geospatial components of the data!

Road density raster used in training ArcGIS Random Forest Regressor to establish geospatial relationship between census tracts and transportation graph.

Here is the explanatory variables importance, according to ArcGIS Forest-based Classification and Regression model:

1. 2018 Median Household Income = 27%

2. ACS HHs: Inc Below Poverty Level: Percent = 22%

3. Smoked cigarettes in last 12 months: Percent = 16%

4. 2018 Unemployment Rate = 8%

5. 2018 Education: Bachelor’s Degree: Percent = 7%

6. ROADDENSITY (raster) = 6%

7. AIRQUALITYEBK (raster) = 4%

8. DISTANCETOAIRTOXICRELEASES (raster) = 4%

9. DISTANCETOPRIMARYSECONDARYROADS (raster) = 3%

10. 2018 Education: High School Diploma: Percent = 3%

The value of geospatial: students were demonstrated a remarkable 17% growth of prediction accuracy, over the original results achieved with non-spatial variables, by adding proximity rasters to the Random Forest Regressor.

<ACK>

Special thanks to John Meza and his team, Microsoft Azure team for working on setting up the virtual machines for UC San Diego students to train the neural networks.

Class Instructors: Ilya Zaslavsky, Dmitry Kudinov

Teaching Assistants: Ashin George, Hammadabdullah Ayyubi

--

--

Dmitry Kudinov
GeoAI

Senior Principal Data Scientist at Esri Inc. Research of AI applications in remote sensing and transportation.