Building a Multi-Cloud Asset Inventory Data Lake with CloudQuery and Snowflake

As public cloud adoption continues to grow, management of cloud computing at scale becomes increasingly complex with the sheer number of cloud assets and related information and metadata. Use cases including understanding security posture, managing cloud resources, cloud cost optimization, and asset inventory management all require scalable and performant data warehouses and data lakes.

To help our customers with a scalable and performant data lake platform, CloudQuery developed a destination plugin for Snowflake, which allows for customers to build Multi-Cloud Asset Inventory Data Lakes using CloudQuery to sync data from their cloud platforms to Snowflake.

Customer Needs

When we worked with customers building their Multi-Cloud Asset Inventory Data Lakes, we saw the following themes:

  • Number of resources and assets in the cloud.

We’ve seen customers with millions of cloud resources syncing from thousands of cloud accounts. Managing configuration of these resources includes syncing data such as metadata about each of those resource and their relations to other assets and resources. Configuration of cloud resources change frequently and as a result, some teams sync data on these millions of cloud resources on a more frequent basis. We see some teams syncing data daily and on occasion, multiple times a day depending on use case.

  • Growing usage of multi-cloud.

We have customers adopting multi-cloud and utilize multiple different cloud providers. Customers are looking for one destination to host all this data from multiple different cloud providers and sources. Customers are looking for consolidation of data and not multiple different data platforms for each source.

  • Need for a centralized data lake for infrastructure data.

Why Snowflake

Snowflake offers a data platform solution that can consolidate multiple data sources in a single data platform. This data lake platform in Snowflake can then become the “single source of truth” of data where multiple different use cases can be built and layered on top of the foundational data lake platform.

For the sheer amount of data and performance necessary to successfully support customers with large scale cloud infrastructure and resources, Snowflake was a great addition for CloudQuery’s growing list of supported destinations.

CloudQuery and Snowflake

CloudQuery is a high performance open-source data integration platform that can sync data from multiple sources to different target destinations. We recently added support for Snowflake as a target destination. Sources include cloud providers such as AWS, Azure, and Google Cloud (GCP).

CloudQuery supports syncing data from over 30+ official sources and includes the ability to build custom plugins for custom integrations. These include extensive support for the following cloud providers:

A full list of supported source plugins can be found here: https://www.cloudquery.io/docs/plugins/sources

Setup and Examples

To sync data from CloudQuery to Snowflake, the following guides can help you get started:

Syncing AWS to Snowflake: https://www.cloudquery.io/integrations/aws/snowflake

Syncing Azure to Snowflake: https://www.cloudquery.io/integrations/azure/snowflake

Syncing GCP to Snowflake: https://www.cloudquery.io/integrations/gcp/snowflake

Snowflake Destination: https://www.cloudquery.io/docs/plugins/destinations/snowflake/overview

Once we have our data loaded into Snowflake, we can execute example queries on data such as this simple example:

If you have questions for the CloudQuery team, visit cloudquery.io and reach out to CloudQuery on GitHub and Discord! We’re excited to hear your feedback about using CloudQuery with Snowflake.

--

--