Innovation: Building an API in Azure for the Snowflake Data Cloud

How Hashmap, an NTT DATA Company, Delivered an API to Securely Access Data from Snowflake for a Multinational Gas and Oil Company

Jackson Esco
Hashmap, an NTT DATA Company
8 min readMay 2, 2022

--

Key Accomplishments

  • Fast and methodical delivery practices
  • Cost reduction
  • Quality driven design practices
  • Self-service enablement

Data Stack

  • Data Acquisition tool: Zema
  • Data Sources: 3rd party sources (FTP, REST APIs)
  • Cloud Data Platform: Snowflake Data Cloud
  • Data Consumption: Python-based Azure Function
  • CI/CD: Azure DevOps

The Situation

“Initially, we were brought in to help with the Snowflake data cataloging process, but after taking a look at their needs and priorities, it was decided we work on creating an API to access this catalog.”

–Yu Fai Wong, a Cloud & Data Engineer with Hashmap

The Hashmap team is fortunate to have a great relationship with this multifaceted, multinational oil and gas corporation. We have completed many projects with them over the past few years. For this project, the Hashmap team was initially tasked to create a data catalog containing the subscription data purchased by the client. They wanted to be able to track who is using the data and how much they are using it if at all. After reviewing the client’s needs and priorities over time, the project gradually morphed into an API development project to allow for more controlled consumption of this data.

With our agile delivery principles, the team quickly adapted and re-focused on what would bring the client the most value for their business.

The Challenges

In this existing setup, the client did not have the cloud infrastructure in place to host data access processes, especially for external groups.

The tool in place for data acquisition did not fully adhere to the business requirements where portions of the data were not being captured or defined as well as missing lineage data.

Access to the data was not standardized through a virtualized/open layer. This caused the external groups to create multiple user accounts on their platform to access it, leading to a burden on the administration and additional costs.

Lastly, the security paradigm was kept at an individual level for each external group in a non-federated fashion. This, of course, leads to a higher level of security administration which does not scale well with the growing number of products and their consumers.

What the Requirements Looked Like

“They needed an API that was fast and automated. They wanted to be able to track the data usage of their Snowflake account based on who was calling the API. This API needs to be developed to feed data from Snowflake to other internal applications that need it.”

— Yu Fai Wong

With the evolving scope, the need was really to make the data they were maintaining in their Cloud Data Platform managed and shared in a uniform and secure way.

This was more comprehensive as it spanned multiple dimensions, like:

Central Repository

Desire to create a cloud-based repository for corporate data subscriptions, contracts, invoice details, etc.

Data Stewardship

Need for a single data management team to maintain the repository.

Secure Data Access

Making their data more broadly available to other parts of the organization using the customer’s Snowflake Data Cloud to assist with negotiations and reduction in data spending. Think — data mesh.

Monitoring for ROI

Monitor and track the usage of the data solution by internal stakeholders to develop an ROI.

Maintain speed via automation

The top focus areas of success, from the client’s perspective, were speed and automation.

Why Hashmap?

The customer chose to partner with Hashmap, an NTT DATA Company, on this solution to accelerate delivery. The project was a collaborative effort between the two parties that provided a ground for the exchange of best practices as well as assisted the client in developing an understanding of approaches, patterns, and solution options for these types of data projects.

The Solution

A 4-step process to deliver an API:

  1. Gather the requirements for an externally facing API
  2. Design a secure API service infrastructure using Azure Functions
  3. Build and integrate the API adhering to the client’s security policies
  4. Deploy the API for use

Technologies

The entire infrastructure is hosted and managed by Azure.

The API was implemented using a few key cloud services:

  • Azure Functions
  • Azure API Manager
  • Azure Front Door
  • Azure Key Vault
  • Azure Active Directory

Azure Functions

A very cost-effective cloud service that lets you pay-as-you-go, is fast to develop and host, and scales both vertically and horizontally. There are limitations to this service such as memory capacity (approximately 1GB), but the limitations were not negative factors for this implementation.

Our choice of language to implement the Azure Functions was Python. The coding portion of the API was written in Python for simplicity and ease of maintenance. This was equally important because both the client and the Hashmap/NTT DATA teams were well-versed in Python so it would be easier to maintain in-house once handed over.

Azure API Manager

A scalable API management platform that has an advanced security setup with VNets that require premium SKUs. Once this tool is set up, it allows users to easily control API settings and security.

Azure Front Door

It is a content delivery network (CDN) service that is easy to set up, has load balancing, and has built-in security to be used on the internet at a global scale.

OAuth 2.0

An industry-standard protocol that uses a centrally managed identity keeper for authentication and authorization and does not require storing the user’s credentials in the application.

Roles are used to manage permissions across products and services.

This standard protocol was already used at the client site but for a different set of tools such as with PowerBI, in which case it was used to authenticate users at the time of access using token exchange mechanism between PowerBI, IdP, and Snowflake.

This same OAuth security net was extended at this point by the Hashmap/NTT DATA team to now provide secure, gated access to data into the cloud data platform. The diagram below provides an overview of participating entities with a chain of authority to ensure compatibility.

Here, the client access works its way through the following steps:

  1. User obtains an OAuth token from Azure Active Directory (the IdP)
  2. User sends a request to a URL provided by Azure Front Door
  3. Azure Front Door passes the token to Azure API Management
  4. Post security validation, the token is passed to the Azure Function
  5. Azure Function makes the request to Snowflake
  6. Azure Function returns data to the client

Here, the “client” could be a data engineer, an individual user, or other application(s) that require the data the API makes available.

“We created this API to secure access without needing to write SQL from any of the clients; they would effectively just call endpoints and receive JSON data back. Their user credentials were handled by Azure AD and Snowflake. No username or passwords are stored anywhere in the application at all. Which is good for things like single-sign-on and being able to use their existing credentials to gain access.”

— Yu Fai Wong

Automation

The team also set up a CI/CD pipeline in Azure DevOps to automatically deploy the code when changes are made for the Azure Function.

The team added a few other processes such as “automated view creation”. This process would allow the application to scan raw tables and create the appropriate views where the views did not exist. This would allow only certain roles to view the data to determine how RBAC should be applied.

Data Quality

Because of a few inconsistencies with the 3rd party data collection service, the team also implemented a data freshness and notification process so that the appropriate parties would be informed when data did not arrive in Snowflake as expected.

Team Integration

The team also assisted in integrating some of the data science groups directly into the Snowflake structures since they would need direct access in their activities versus using the new API.

“We also made sure all access to the API was “read-only,” so there could never be any real danger to the Snowflake database. We created this API with low maintenance costs in Azure Functions, allowing a lot of space for future scaling, both horizontally and vertically.”

— Yu Fai Wong

The Outcome

Within 3 months our team was able to provide an operational API-based interface, including passing a rigorous security audit on the first attempt demonstrating the team’s “security-first” approach.

This project provided much-needed value to the client not only through innovation of their data processes but also by reducing administration load and cost of use, thereby lower TCO.

Yu Fai and the team provided an easy way for our client’s external consumers to access data housed in Snowflake, without having to know SQL, the database schema, or accessing Snowflake directly. The client is now able to serve 3rd party data to external and internal groups over a REST API all via a single sign-on.

Related Resources

Let’s Do Data and Cloud Together!

At Hashmap, an NTT DATA Company, we work with our clients to build better, together. We are partnering with companies across a diverse range of industries to solve the toughest data challenges — we can help you shorten time to value!

We offer a range of enablement workshops and assessment services, data modernization and migration services, and consulting service packages for designing and building new data products as part of our service offerings. We would be glad to work through your specific requirements. Connect with us here.

--

--