Centralized logging: Shipping Snowflake logs and audit data to Logz.io

Deepak Shivaji Patil
Globant
Published in
7 min readJun 14, 2022

In this article, we will learn how to enable centralized logging for Snowflake. Our centralized logger for this use case will be Logz.io

So, let’s get started.

Things covered in this article:

  1. A bit of background on tools like Snowflake, Logz.io and the need for centralized logging
  2. Problem with enabling centralized logging in Snowflake
  3. The proposed solution
  4. Prerequisites
  5. Actual implementation steps
  6. Conclusion
  7. Some bonus tips on automating the explained process
  8. References to some useful links

Background:

Why is centralized logging important?

Centralized logging provides two important benefits. First, it places all of your log records in a single location, greatly simplifying log analysis, correlating tasks and debugging. Second, it provides you with a secure storage area for your log data.

What is Snowflake?

Snowflake is a fully managed SaaS (software as a service) that provides a single platform for data warehousing, data lakes, data engineering, data science, data application development, and secure sharing and consumption of real-time / shared data. Snowflake provides out-of-the-box features like separation of storage and compute, on-the-fly scalable compute, data sharing, data cloning, and third-party tools support to handle the demanding needs of growing enterprises.

What is Logz.io?

Logz.io is a cloud-native observability platform and provides a cloud-based log analysis service which is based on the open-source log analysis platform — the ELK Stack (Elasticsearch, Logstash, Kibana). The features provided by Logz.io include: alerting, user control, parsing services, support, integrations, and audit trail.

Problem:

Snowflake does not have direct built-in integration with Logz.io to export the logs and audit related data.

So, we have to export the Snowflake logs and audit data first to some common location (such as AWS S3 bucket), and then Logz.io can pull the data and present it for visualization and analysis on the Kibana dashboard.

Solution:

The proposed solution is as below.

As explained in the above image, we will upload the data from various Snowflake tables into an external stage. The external stage will copy the data into the AWS S3 bucket.

We will also configure Logz.io to listen to the AWS S3 bucket and to pull the logs.

Once the logs and audit data are available in the AWS S3 bucket, Logz.io will start pulling it and can display it on the Kibana dashboard.

Prerequisite:

  1. Create an AWS S3 bucket which we will use to store the logs
  2. If you want to run all below script via commands, please install SnowSQL (it’s a command line interface for Snowflake). Or you can run below command/scripts on Snowflake UI
  3. You will require the ACCOUNTADMIN’ role to execute all below commands

Implementation Steps:

Let’s go step by step now to unload logs and audit data to the AWS S3 bucket.

Please note, we will retrieve data from tables selectively and will be storing it in a json based format in a file.

By default, data unloaded in the external stage is in csv format.

Json format is more readable and commonly used, so we will be unloading Snowflake data in Json format.

1. Create an IAM role and IAM policy

Refer below link and create IAM role and IAM policy which will be used by Snowflake to unload data into S3 bucket.

https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration.html

Only follow Step 1 and Step 2 from the above link.

2. Create S3 storage integration

create or replace storage integration s3_integration
type = external_stage
storage_provider = s3
enabled = true
storage_aws_role_arn = '<PUT_HERE_AWS_ROLE_ARN>’
storage_allowed_locations = ('<PUT_HERE_AWS_S3_BUCKET_PATH>’);

3. Create a JSON file format

create or replace file format my_json_format
type = json
COMPRESSION = 'gzip'
null_if = ('NULL', 'null');

4. Create S3 stage

use database ‘<PUT_HERE_DB_NAME>’;
use schema ‘<PUT_HERE_DB_SCHEMA_NAME_TO_USE>’;
create or replace stage my_s3_stage
storage_integration = s3_integration
url = ‘<PUT_HERE_AWS_S3_BUCKET_PATH>’
file_format = my_json_format;

5. Execute unload command to push data from tables to stage and in turn to AWS S3

use database ‘<PUT_HERE_DB_NAME>’;
use WAREHOUSE ‘<PUT_HERE_WAREHOUSE_NAME>’;
copy into @my_s3_stage/login_history from (SELECT OBJECT_CONSTRUCT(‘application’, ‘snowflake’ ,’environment’, ‘<PUT_HERE_ENV_NAME>’, ‘log_type’, ‘login_history’, ‘EVENT_TIMESTAMP’, EVENT_TIMESTAMP, ‘EVENT_TYPE’, EVENT_TYPE, ‘USER_NAME’, USER_NAME, ‘CLIENT_IP’, CLIENT_IP, ‘REPORTED_CLIENT_TYPE’, REPORTED_CLIENT_TYPE, ‘FIRST_AUTHENTICATION_FACTOR’,FIRST_AUTHENTICATION_FACTOR, ‘IS_SUCCESS’, IS_SUCCESS, ‘ERROR_CODE’, ERROR_CODE, ‘ERROR_MESSAGE’, ERROR_MESSAGE) from snowflake.account_usage.Login_history) FILE_FORMAT = (TYPE = JSON) ;copy into @my_s3_stage/access_history from (SELECT OBJECT_CONSTRUCT(‘application’, ‘snowflake’ ,’environment’, ‘<PUT_HERE_DB_NAME>’, ‘log_type’, ‘access_history’, ‘QUERY_START_TIME’,QUERY_START_TIME, ‘USER_NAME’, USER_NAME, ‘DIRECT_OBJECTS_ACCESSED’,DIRECT_OBJECTS_ACCESSED, ‘BASE_OBJECTS_ACCESSED’, BASE_OBJECTS_ACCESSED, ‘OBJECTS_MODIFIED’, OBJECTS_MODIFIED) from snowflake.account_usage.Access_History ) FILE_FORMAT = (TYPE = JSON);

If you refer to the above scripts, you can see that we are using the ‘copy into’ command to unload/extract data from various tables like login_history, access_history, query_history and sessions tables etc.

Also, as you can see in the scripts, we are using Object_Construct to selectively get data from columns and to add that against our specified name while preparing a Json formatted file. Just to have a better understanding and visibility into the logs, you can add custom data while unloading data from tables to Json format. For example, in the above script, I am injecting application, environment and log_type etc tags/keys in the Json. This will help us categorize logs by environment and by application wise.

You can repeat and write scripts like above for all below tables in which Snowflake stores logs and audit related data.

Please note, all these tables/views are present in snowflake.account_usage schema.

6. Validate files in AWS S3 bucket

Once the above commands are run, you should see logs files are created in your S3 bucket.

7. Configure Logz.io to pull data from the AWS S3 bucket

  • Login to Logz.io portal and navigate from the left-hand side menu to Logs→Send your logs→Select AWS→Select S3 Bucket
  • Once you do above, a new page will appear
  • Click + Add a bucket and select the option to Authenticate with a role
  • Copy and paste the Account ID and External ID in your text editor
  • Fill in the form to create a new connector
  • Enter the S3 bucket name and, if needed, the Prefix where your logs are stored
  • Click Get the role policy. You can review the role policy to confirm the permissions that will be needed. Paste the policy in your text editor
  • Keep this information available, so you can use it in AWS

7.1 Create the IAM Role in AWS:

  • Go to your IAM roles page in your AWS admin console
  • Click Create role. The created role wizard will appear
  • Click Another AWS account
  • Paste the Account ID you copied from Logz.io
  • Select Require external ID, and then paste the External ID from above step
  • Click Next: Permissions to continue

7.2 Create the policy:

  • In the Create role screen, click Create policy. The Create policy page loads in a new tab
  • In the JSON tab, replace the default JSON with the policy you copied from Logz.io
  • Click Review policy to continue
  • Give the policy a Name and optional Description, and then click Create policy
  • Remember the policy’s name — you’ll need this in the next step
  • Close the tab to return to the Create role page

7.3 Attach the policy to the role:

  • Click (refresh), and then type your new policy’s name in the search box
  • Find your policy in the filtered list and select its checkbox
  • Click Next: Tags, and then click Next: Review to continue to the Review screen

7.4 Finalize the role:

  • Give the role a Name and optional Description. We recommend beginning the name with “logzio-“ so that it’s clear you are using this role with Logz.io
  • Click Create role when you’re done

7.5 Copy the ARN to Logz.io:

  • In the IAM roles screen, type your new role’s name in the search box
  • Find your role in the filtered list and click it to go to its summary page
  • Copy the role ARN (top of the page). In Logz.io, paste the ARN in the Role ARN field, and then click Save

Finally, you should have all the details to fill in the screen below

8. Validate logs:

That’s all. Now it’s time to validate if logs are showing up in the Logz.io dashboard.

In Logz.io, from the left-hand side menu, navigate to Logs→Kibana

Search with the word ‘snowflake’. You should see the logs like below.

Summary:

In this article, we showed how we can enable and implement a centralized logging strategy for Snowflake.

This will give you a bird’s-eye view of your Snowflake SaaS infrastructure and will help in monitoring your Snowflake which will further become beneficial for doing logs analysis, audit trailing and keeping an eye on your daily credit usage of Snowflake as well.

Automation Approach?

If you want to automate logs shipping continuously, you can make use of ‘tasks’ available in Snowflake. With the help of tasks, you can schedule unloading/export of logs at regular intervals.

Seems easy and simple, right? In reality, it’s not. Snowflake does not support differential data copy into the external stage. (i.e. It will unload ALL the data every time)

So, how can we still achieve differential data copy?

Please watch out for the next article.

Until then, Happy Coding.

References:

--

--

Deepak Shivaji Patil
Globant
Writer for

Working as DevOps Tech Lead. More than 11 years of experience in architecting , building and deploying enterprise grade application on public cloud, mainly AWS.