Leveraging Geospatial for Lifescience

In collaboration with CARTO (Jaime Sanchez, Noah Smolen and Helene Mckenzie)

Location data is becoming increasingly important in today’s connected world, as it can reveal patterns and relationships that may be missed in traditional data analysis. Snowflake allows users to leverage specialized geospatial data in order to make decisions at scale. By seamlessly integrating with leading tools in the spatial analytics space — such as our partner CARTO — Snowflake is enabling Life Sciences customers to analyze billions of geospatial objects in a scalable manner, as well as combine geospatial intelligence with other analytical techniques.

The purpose of this blog is to highlight the relevance of this data within lifesciences via a couple of different use cases.

The context

Introduction to Geospatial analytics and data

Geospatial data encompasses a wide range of data and information that describes phenomena occurring at specific locations on Earth. It includes anything from the precise location of a single point, such as a landmark, to expansive datasets that map out patterns across entire cities, countries — or even the globe.

Modern analytics systems allow users to interact with geospatial data, perform spatial analysis, and generate maps that communicate complex spatial information clearly and effectively. This is often referred to as “Location Intelligence”.

The visualization of geospatial data thus transforms raw numbers and coordinates into meaningful, accessible information that is used across a wide range of domains for decision-making especially in urban planning, environmental management, disaster response, last mile delivery in retail, etc.

Defining key terms in dealing with Geospatial data

In order to understand geospatial applicability in lifesciences, we need to understand certain fundamental definitions of this type of data and how systems store and interact with it. Some key concepts to understand include:

  • Vector data represents discrete geographic features, including points, lines, and polygons. Each feature has a specific location in space, as well as various attributes. Examples for this in lifesciences could include a series of points representing customer locations, or a line representing the optimized delivery route for some pharmaceutical.
  • Raster data represents the Earth’s surface as a continuous grid of cells or pixels, where each cell has a value representing a specific attribute, such as elevation or land cover.
  • Geodatabases allow users to store and manage various types of geospatial data within a single database environment, allowing for efficient data retrieval, analysis, and management
  • H3 encoding (H3) is a type of data in which location is stored in a hierarchical hexagonal grid, in which the earth’s regions are divided into hexagonal values. Each hexagon in the H3 grid system has a unique identifier called an H3 index. This identifier encodes the location and resolution of the hexagon. The hexagonal shape allows for optimal spatial representation, especially in terms of minimizing edge effects and providing a consistent distance between cell centers. By encoding location in a short H3 index — rather than a complex geometry — H3 representation provides efficient spatial indexing, making it easier to perform geospatial queries such as finding nearby points, aggregating data within a region, and spatial joins. You can learn more about H3 for optimizing spatial analytics in this blog.
  • Geospatial data visualization tools typically allow users to build a map composed of many layers, ranging from raw physical topographies to infrastructures like roadways, as well as less “tangible” layers like demographics and points of interest. Typically, the user can toggle each of these layers on or off in order to analyze them individually or combined, giving them different insights. For example, a couple of datasets available to toggle or overlay can include:

Demographics : A layer that provides insights into the population distribution, density, age groups, income levels, and other socio-economic factors. It’s often represented through shaded areas or thematic maps

Points of Interest (POIs): This includes locations of significant places like schools, hospitals, parks, restaurants, and tourist attractions. These points are usually marked with symbols or icons.

Figure 3: No-code spatial analysis with CARTO Workflows

The Industry Context

Leveraging Geospatial analytics in Lifesciences

Lifesciences deal with location information in many ways. The most common example is Supply chain where — similar to retail — we look for warehouse and route optimization solutions.

For example, a pharmaceutical can calculate the ideal path for vehicles to efficiently supply and distribute drugs They can also build alternate routes in case of external foreseeable events based on real time feeds on constraints like traffic, construction work, road closures etc leveraging geospatial analytics.

However, one of the more overlooked aspects of geospatial analytics that could bring in a transformative impact is by leveraging a concept called Geographic Resolution to accelerate clinical trials or driving more sales impact by driving field sales to target the right HCPs.

Geographic resolution refers to the spatial detail of your data. High resolution data refers to data where spatial phenomena are depicted in a high amount of detail, such as a grid where each cell represents 5x5 meters. Low resolution data is the reverse; for example a grid cell might be 100x100 miles.

By transforming and analyzing data at various levels of detail, users can unlock insights for various use cases. For example we can use geographic resolution to swiftly convert different types of geographic entities into precise points on a map.

With geographic resolution, we can translate data from a precise address point or latitude/longitude pair to street-level data, to a zip-code, to a county — and back again!

Geographic resolution also helps facilitates Layered visualizations.

Layered visualizations allow multiple datasets to be overlaid in a dashboard to provide comprehensive insights.

For example, we could use a heatmap, showing the intensity of data points such as patient density or disease prevalence, with different shades or colors representing data values across geographic areas, such as average income or access to healthcare services.

This way, an end user can zoom in/out to understand country level, state level, county level or street level distributions.

Use cases across value chain

Within lifesciences where these capabilities come to life are in three different scenarios:

Site Mastering/Resolution for Clinical Trials: In cases of Site Mastering where one site or hospital care center could be called many different things, geographic resolution could help consolidate all of them into a single view. For eg. MGB is a site that could also be called Mass Mass General Brigham but is still different from Mass General West , leading to challenges in integrating a consolidated view of site related metrics to gauge trial performance. Geographic resolution could help also consolidate the sites based on other attributes including zip code or address to pinpoint the one where a patient receives care.

Patient Recruitment & Site optimization for Clinical Trials: Driving engagement to the right sites to help with patient recruitment and retention is key for life sciences and clinical research organizations (CRO). Speed is crucial for timely decision-making in clinical trials, where understanding patient distribution and access to care can significantly accelerate recruitment timelines and also determine potentials for dropouts. Detailed geographic resolution helps target recruitment efforts by identifying clusters of eligible patients and can identify optimal locations for trial sites based on patient proximity and access to facilities. This can then help drive recruitment efforts to that intended target or site.

HCP Targeting for Commercial operations: HCP targeting involves precision in identifying and engaging healthcare professionals. By clustering patient demographics around specific clinics or health centers, companies can pinpoint areas with the highest concentration of potential patients for particular therapies. This data-driven approach allows sales teams to identify the closest and most relevant sites within a county or community. Targeted discussions can then be conducted with HCPs at these locations, focusing on therapies that meet the specific needs of the local population.

The Platforms

Snowflake and Geospatial analytics

Traditional Geographic Information Systems (GIS) may have limitations when it comes to handling location data at scale, such as limited processing power, lack of spatial analysis capabilities, and difficulty integrating with other systems. These systems are often not designed to handle large volumes of data that deal with spatial coordinates. They may also lack the necessary tools and functionality to perform spatial analysis on location data, such as mapping or geocoding, which can limit the insights that can be gained from the data.

Specialized systems and technologies, such as Snowflake geospatial and Carto, are needed to handle this data at scale and unlock its full potential for analysis and decision-making.

Figure 1 below explains the native out of the box capabilities and functions available within Snowflake.

Figure 1: An overview of Snowflake geospatial capabilities

CARTO and their partnership with Snowflake

CARTO is a geospatial analytics platform which is fully-cloud native, meaning all computing is pushed down to Snowflake data allowing for unparalleled scalability and security. It provides their users with a highly specialized toolkit for making decisions with their spatial data. Users can benefit from the following :

  • Analytics Toolbox: a suite of advanced analytical tools including for the creation of tilesets, H3 indexes, hotspots and isolines.
  • Workflows: a no-code solution for building complex spatial processes.
  • Builder: a data visualization and dashboard tool which allows you to generate maps that communicate complex information seamlessly with fast response times.
  • Data Observatory: a hub of over 12,000 spatial datasets which can be accessed via the CARTO platform or through the Snowflake Data Marketplace.
Figure 2: List of Carto tools available in Snowflake marketplace

CARTO’s mission is to make advanced geospatial analysis accessible to modern data stack users, such as through their Analytics Toolbox and no-code interface CARTO Workflows (pictured below in Figure 3).

The solution

Building a clinical trial patient site selection use case with Snowflake and Carto

We can string these concepts together now to quickly build an application that aims to visualize and solve the site optimization problem for clinical trial operations. Here we leverage CARTO’s Native App to geocode, cluster and enrich clinical sites with key characteristics for a cohort site selection.

The solution outline for this is captured in the diagram below and essentially follows the steps delineated :

Figure 4: Geospatial analytics with Carto for identifying patients close to clinical trial sites
  1. Data Collection: Patient information is gathered from various trial sites, capturing key characteristics crucial for the study.
  2. Data integration from marketplace: By overlaying patient data with external datasets from the marketplace, including Social Determinants of Health (SDOH) and points of interest, we gain a richer context.
  3. Geocoding and Spatial Transformation: As mentioned earlier, geospatial tools excel at resolving multiple geographic entities efficiently. They can transform zip codes, latitude/longitude coordinates, and addresses into precise points on a map. By leveraging CARTO’s GEOCODE function to convert addresses into geographic points we can transform input geographic data into an appropriate geographic resolution & harmonize them across sources for precise spatial analysis.
  4. Site Selection Optimization:By leveraging pre-built Carto UDFs like GEOCODE, CREATE_ISOLINES and ST_BUFFER, you can determine the closest site for patients and create a heatmap to identify areas where access to care facilities is longest. This insight helps in optimizing site selection and improving patient acces based on different parameters (for e.g via site accessibility).
  5. Patient Clustering and Visualization: Build interactive dashboards in CARTO to visualize patient data across different geographic levels, refining site selection and recruitment strategies.Within minutes, Carto enables the creation of fast, simple dashboards. These dashboards can cluster patients around a point of interest, such as a care center, based on their characteristics. This capability is instrumental in driving recruitment for clinical trials.
  6. Dashboard Creation and Analysis: Carto native app on Snowflake allows zooming in /out to a block level or zoom out to a state level. These interactive dashboards in CARTO help visualize patient data across different geographic levels making the insights easier to consume.

The outcome is an enhanced experiences for end users in visualizing patient data across different geographic levels making the insights easier to consume as can be seen in the visual in the animation below:

Figure 5: Heatmap of patients distributed across locations based on distance to a hospital center

You can view the live interactive version of this demo in the link here

The Conclusion: What Snowflake and Carto unlock?

By offering a fully cloud-native connection to Snowflake, CARTO offers users unparalleled scalability and integration for their spatial analysis. This enables Life Sciences organizations to process vast amounts of geospatial data seamlessly, uncovering patterns and insights that are crucial for optimizing operations and improving outcomes. It helps optimize site selection, improve patient recruitment, and ensure efficient resource allocation, ultimately leading to more successful and impactful trials.

Key benefits of this joint approach include:

  • Streamlined Deployment and Maintenance: With one-click installation through the Snowflake Marketplace, CARTO can be deployed as a Native App in Snowflake! This can save you valuable time and resources in deployment and ongoing maintenance. Users can be confident they are always using the latest version and patches of CARTO, complete with all the newest features.
  • Enhanced Performance: By residing within the Snowflake environment, CARTO’s performance is optimized, as data processing occurs in the same place as the data source.‍
  • Heightened Security: With data never leaving Snowflake’s secure environment, organizations can work with larger datasets and complex analytics operations in a safe and controlled manner.

Learn more about leveraging CARTO inside Snowflake here. For more details to build such solutions please do not hesitate to reach out to your Snowflake representatives.

Additional reference: Read up on Carto and lifesciences

--

--