Using Snowflake Cortex AI to explain geospatial AI and ML models to users.

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readAug 1, 2024

Introduction

CARTO is the leading Location Intelligence Platform and our mission is to make geospatial analytics accessible beyond Geographic Information System (GIS) experts to all audiences because 80% of data generated has a location component, yet only 10% of it is actually used. The emergence of generative AI is turbocharging the access to complex problem solving and we’re very excited for the impact it will have in geospatial analytics. A key way our users are leveraging GenAI today at CARTO is by using it to explain complex models to users that need to understand why and where something is happening. In this blog post, we show you how we easily integrated Snowflake’s Cortex AI (Snowflake’s Generative AI service) to generate plain English wildfire risk assessments for specific regions.

Core problem

At CARTO, we help users create geospatial map dashboards, models, and workflows. A key need we’ve noticed is that when a data scientist creates an ML score/index for a particular prediction and maps it, business users might face a significant barrier to understanding it.

To use a real world example: imagine receiving a letter informing that your insurance premium is increasing due to a fire risk index. Knowing there’s a risk is just not enough– home-owners want to understand the cause, and insurance companies are receiving more requests for transparency on how they rate different locations.

GenAI can lower the barrier to understanding and increase transparency for users. Specifically it can interpret complex machine learning modeling results produced by a Data Scientist and efficiently explain them to a user in plain English. In the screenshot below, you can see why a particular area (hexagon) in the Bay Area has a high risk.

Today, we’ll walk through how we apply GenAI to explain a composite index for wildfires in California with the following steps:

Rank areas in California from 1:No Risk to 5: Very High by creating a composite index and using a geospatially weighted random forest model with CARTO’s Native App.
Use Cortex AI’s COMPLETE Function to explain the weights at each location in plain language.
Visualize results so users can easily access and interact with the data using Streamlit.

Step 1: Creating a Composite Index

Wildfires are a complex and dynamic process influenced by various natural and human factors such as wind, vegetation, and urbanization. These factors affect distinct regions differently, making it difficult to develop accurate and localized risk models. We will take advantage of CARTO’s Analytics Toolbox (a Native App available on the Snowflake marketplace) to create a model and compute spatial composite indicators for each area in California. A composite indicator is an aggregation of variables that measures complex and multidimensional concepts that are difficult to define and cannot be measured directly. To derive a spatial score, two main functionalities are available:

Aggregation of individual variables, scaled and weighted accordingly, into a spatial composite score (CREATE_SPATIAL_COMPOSITE_UNSUPERVISED)
Computation of a spatial composite score as the residuals of a regression model which is used to detect areas of under- and over-prediction (CREATE_SPATIAL_COMPOSITE_SUPERVISED)

We’ll use a supervised scoring method that leverages a regression model to relate an outcome of interest (wildfires) to a set of variables (temperature, wind, land cover, etc). Based on the model residuals, it focuses on detecting the areas of under and over-prediction. Below is the CREATE_SPATIAL_COMPOSITE_SUPERVISED UDF we used to create the score. Users also have the option to create a no-code workflow

-- Compute a supervised of index according to expected number of wildfires (random forrest error)
CALL CARTO.CARTO.CREATE_SPATIAL_COMPOSITE_SUPERVISED(
    $$
        SELECT 
            vars.h3, 
            vars.temp_avg, 
            vars.wind_avg,
            vars.land_type,
            acc.wildfire_count
        FROM 
            WILDFIRE_VARIABLES_H3 vars
        LEFT JOIN
            WILDFIRE_K1_H3 acc
        ON
            vars.h3 = acc.h3
        WHERE
            wildfire_count > 0
    $$,
    'h3', 
    'WILDFIRE_INDEX_SUPERVISED_H3',
    $$
    {
        "model_options": {
            "input_label": "wildfire_count",
            "scaler": {
                "class": "snowflake.ml.modeling.preprocessing.MinMaxScaler"
            },
            "regressor": {
                "class": "snowflake.ml.modeling.ensemble.RandomForestRegressor",
                "options": {
                    "criterion": "absolute_error"
                }
            }
        },
        "r2_thr": 0.5
    }
    $$
);

The result is a score from 1 to 5 that we’ll codify into 1 — No Risk, 2- Low, 3- Very Low, 4 — High, 5 — Very High. To dive deeper on how best to create wildfires risk models you can read this study we published recently.

Step 2: Explaining the index using Snowflake Cortex AI

Integrating Generative AI using Snowflake’s Cortex AI service makes it possible to summarize map data in natural language for our users. We generated explanations of why there is a “very high” wildfire risk in the above map of California using the Snowflake Cortex AI COMPLETE function, which allows us to choose a model and define a prompt in SQL.

To get this explanation, we passed the calculated risk of wildfire along with its contribution factors for every H3 cell across California. We create a new column in our Snowflake dataset to store all the responses from the Cortex COMPLETE function.

The Snowflake Cortex AI prompt below is used to create and write Cortex output to a new column:

SELECT *, SNOWFLAKE.CORTEX.COMPLETE(
    'mixtral-8x7b',
        CONCAT('Wildfire risk in certain place in August is ', WRI_KMEANS_5_CAT_JOINED, 
        '. Please explain it in a couple of sentences using the following information about this place: ', 
        'monthly average temperature is ', TAVG_AUG, ' Celsius, ',
        'monthly maximum temperature is ', TMAX_AUG, ' Celsius, ',
        'June precipitation is ', PREC_AUG, ' inches, ',
        'average wind speed is', WIND_AUG, 'm/s.',
        'average vapour pressure is', VAPR_AUG, 'Pa.',
        'Make your explanation as short as possible.'
        )

Example output from the Cortex AI COMPLETE Function using the prompt above:

The wildfire risk in this place in August is high due to several factors. The average temperature is already warm at 24.07 Celsius, but the monthly maximum of 32.27 Celsius can create even drier conditions. Additionally, the precipitation is very low at only 4.43 mm for the month, leaving the landscape parched. The average wind speed of 2.9 m/s can help fan the flames of any fires that do start. Lastly, the average vapor pressure of 1.12 Pa indicates low atmospheric moisture, which can further exacerbate wildfire conditions

As with step 2, this can also be done from CARTO’s interface, which has nicely integrated Cortex AI into it as shown below:

Step 3: Visualize results

We accomplished two key steps: creating a composite index to measure complex wildfire risk and using Snowflake Cortex AI to explain this index in natural language.

To make all this data easy to use, we needed a visual and interactive front end. We’ve created a new interactive app on Streamlit to visualize wildfire risk in California. Using Carto and Cortex AI, we’ve ranked areas by risk levels in natural language. Our app provides an easy-to-use interface for users to access and interact with the data. Check out the Streamlit app here (special thanks to the ActionEngine team for getting this streamlit integration done) !

Here is the example code needed to create a CARTO map on Streamlit. :

import pydeck as pdk
from carto_auth import CartoAuth
from pydeck_carto import get_layer_credentials
from pydeck_carto.layer import MapType
from pydeck_carto.styles import color_bins
import streamlit as st

# Authentication with CARTO
carto_auth = CartoAuth.from_oauth()

# Render CARTO layer in pydeck
listings_layer = pdk.Layer(
    "CartoLayer",
    data="SELECT * FROM  carto-demo-data.demo_tables.losangeles_airbnb_data",
    type_=MapType.QUERY,
    connection=pdk.types.String("carto_dw"),
    credentials=get_layer_credentials(carto_auth),
    get_fill_color=color_bins("price_num", [30, 100, 150, 300], "Sunset"),
    point_radius_min_pixels=2.5,
    opacity=0.4,
    pickable=True,
    stroked=False,
)

map_style = pdk.map_styles.CARTO_ROAD
view_state_la = pdk.ViewState(latitude=34, longitude=-118.4, zoom=9, pitch=20, bearing=30)
tooltip={"html": "Price: <b>{price_num}</b>", "style": {"color": "white"}}
la_listings = pdk.Deck(listings_layer, map_style=map_style, initial_view_state=view_state_la, tooltip=tooltip)

# Add the map to streamlit
st.pydeck_chart(la_listings)

Conclusion

We’ve used Snowflake Cortex AI to explain in plain English why a risk assessment for a particular area is high or low with the right data and a few steps. This shows the advantage of having ML, LLM, and apps right next to where the data is, with a real-world example. You’ve seen how easy it is to call UDFs like the CORTEX AI COMPLETE Function and how our Workflows UI has incorporated it as well.

And this is just the beginning. If you’d like to see where we’re going next, check out CARTO’s AI Agents, where we will empower users by allowing them to talk to the map.

Using Snowflake Cortex AI to explain geospatial AI and ML models to users.

Introduction

Core problem

Step 1: Creating a Composite Index

Step 2: Explaining the index using Snowflake Cortex AI

Written by Jaime Sanchez