Developing Your First Fivetran Function Connector with AWS SAM & Snowflake

Simplify the “why” and “how” when it comes to data ingestion with this guide on using Fivetran function connectors.

Mohini Kalamkar
Hashmap, an NTT DATA Company
8 min readMay 19, 2022

--

Recently, I was working on a project for a client, and one of their requirements was the ability to ingest data from an API. I evaluated possible solutions, and I found that using Fivetran connectors for serverless/cloud functions allows you to ingest any data simply by sending the proper JSON response. Since the client was already using Fivetran in their tech stack, I determined that the Fivetran function connector was the best choice to ingest data from the API.

This is a win-win for both me and the client — it satisfies the project’s requirements and allows the client to achieve their desired outcome. With that in mind, I’d like to guide you through the process of understanding Fivetran and their function connectors, using serverless functions, and configuring your own Fivetran function connector using the Transport — API (London Tube) and the AWS Serverless Application Model (SAM).

Why Fivetran?

Fivetran offers data integration from the source to the destination. Fivetran has over 150 ready-to-use connectors that have the ability to automatically adapt to changes in schemas and API. This ability to adapt to schemas and API changes makes Fivetran a very powerful ELT tool. It responsibly and regularly maintains the connector and evolves the schema to reflect operational and product changes in the source systems.

So why Fivetran? Because it’s easy to set up and automate data pipelines — which gives you the ability to work smarter, not harder.

What is a Fivetran Function Connector?

A function connector allows you to code a custom data connector as an extension of Fivetran.

For example, if you have a custom data source or a private API that Fivetran doesn’t support, you can develop a serverless ELT data pipeline using our function connectors.

Fivetran

Building a custom data pipeline from scratch is not easy. It can be time-consuming and hard to maintain. With Fivetran’s function connector, you only have to write a serverless/cloud function, such as an Azure Function, AWS Lambda, or Google Cloud Function, to extract data from the source.

Once you have configured the serverless function as a connector in Fivetran, Fivetran will then be able to load and transform the data to the destination.

How Fivetran executes a cloud function?
How Fivetran executes a cloud function

Note: If you still have questions about Fivetran function connectors, you can learn more here.

What are serverless functions?

Serverless functions are single-purpose, programmatic functions that are hosted on managed infrastructure.

With serverless functions, the developer only writes the function without having to provision or manage infrastructure. For more on serverless functions, I recommend checking out this resource on Splunk.

This blog post provides steps to set up a Fivetran function connector for Transport — API (London Tube) with AWS Serverless Application Model (SAM).

Why use the AWS Serverless Application Model?

The AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. With just a few lines per resource, you can define the application you want and model it using YAML.

Amazon Web Services

While SAM is optional when developing your AWS function for Fivetran, I highly recommend it. Why? Well, with SAM, it’s really convenient to debug/test Lambda functions locally, package, and deploy it to AWS.

Let’s get started!

1. Get to know the tech stack:

2. Prerequisites for installing SAM on your desktop/laptop:

  • Python
  • Docker
  • Access to an AWS account that is set up to deploy your function.
  • You can set access to an AWS account by running the aws configurecommand.

3. Install SAM:

To install the SAM CLI on Mac, I used the following commands:

brew tap aws/tapbrew install aws-sam-cli

4. Create the function connector for the Transport API:

  1. Create the SAM app using thesam initcommand. This has the hello_world directory.
sam init -runtime python 3.7 -name my-sls-app

2. Create the transport_tube directory.

3. Create main.py and requirements.txt within the above directory as shown here.

4. Update template.yaml as shown below:

TransportTubeFunction:Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunctionProperties:CodeUri: transport_tube/Handler: main.lambda_handlerRuntime: python3.7Events:HelloWorld:Type: Api # More info about API Event Source: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#apiProperties:Path: /transporttubeMethod: get

5. Test the application locally with the following command:

sam local invoke TransportTubeFunction

The above command will return a JSON result, like the following:

{"state": {"bakerloo": "bakerloo"}, "schema": {"tflLineStatus": {"primary_key": ["linename"]}}, "insert": {"tflLineStatus": [{"linename": "bakerloo", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "central", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "circle", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "district", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "hammersmith-city", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.583Z"}, {"linename": "jubilee", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "metropolitan", "linestatus": "Special Service", "timestamp": "2022-04-12T18:33:41.617Z"}, {"linename": "northern", "linestatus": "Part Closure", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "piccadilly", "linestatus": "Severe Delays", "timestamp": "2022-04-12T18:33:41.583Z"}, {"linename": "victoria", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}, {"linename": "waterloo-city", "linestatus": "Good Service", "timestamp": "2022-04-12T18:33:41.6Z"}]}, "hasMore": false}

6. Package the SAM application.

This command below creates a zip file of your code and dependencies and uploads the file to Amazon S3.

sam package --template-file template.yaml --output-template-file deploy.yaml --s3-bucket zmohinisnowflake --region us-east-1

7. Deploy code to the cloud.

Note: To deploy, your IAM user should have permission to deploy the cloud formation stack.

This command deploys your SAM application using AWS cloud formation:

sam deploy --template-file deploy.yaml --stack-name ztestsamlambdaorsam deploy --guided

The above command will create a cloud formation stack named ztestsamlambda. After a successful deployment of this stack, you can then test your Lambda function.

Once the Lambda function is tested successfully, the next step is to configure the Fivetran function connector.

5. Configuring the Fivetran function connector:

First, sign in to Fivetran, click on ‘Add Connector’, and then, select ‘AWS Lambda’.

Fivetran provides a setup guide on the page when you set up a new connector. Follow the steps in the guide for creating the required IAM policies and role.

6. Set up the Fivetran function connector as shown below:

Note: The Transport API doesn’t require a secret key. You can provide any sample JSON. ex: {“key1":“value1"}

Fivetran Function Connector Configuration

7. Save and test the connector.

This will load your data into the Snowflake Data Cloud.

In the screenshot below, you can see that the Transport Tube API data has successfully been loaded into Snowflake’s aws_lambda schema (which was mentioned in a previous step).

API data ingested in Snowflake

If you want to try using the Transport Tube API with Azure or GCP, you can create a file namedmain.py (as used above) and test it. Then, configure the Fivetran function connector for Azure Functions and Google Cloud Functions accordingly.

Closing Thoughts

Using a Fivetran function connector provides an easy and reliable approach for a user to write their own serverless/cloud functions if prebuilt connectors are not available for a source.

A summary for creating a Fivetran function connector:

  1. Apikey/secretkey/Apitoken — for authentication (if required)
  2. Cloud function — to get the data from the API
  3. Primary key of source table — this will be passed in a JSON response back to Fivetran
  4. Column to be used in ‘state’ (required for an incremental load) — which will be passed in a JSON response back to Fivetran

Coupling a Fivetran function connector with AWS SAM provides quite a few benefits. AWS SAM is an extension of AWS CloudFormation, so you get the reliable deployment capabilities of AWS CloudFormation (definitely a win in my book!), and it provides a Lambda-like execution environment locally which helps in local debugging and testing.

I hope this post blog helped you to get started with the process of developing your first Fivetran function connector. If you give it a try, let me know your questions or success stories in the comments — I’d love to hear from you!

Mohini Kalamkar is a Cloud Architect and Lead Cloud Engineer (Data) with Hashmap, an NTT DATA Company. Hashmap provides world-class Data, Cloud, IoT, and AI/ML solutions and consulting expertise across industries with a group of innovative technologists and domain experts accelerating high-value business outcomes for our customers. Have a question? Ask it in the comments or reach out to the Hashmap team for more info.

Let’s Do Data and Cloud Together!

At Hashmap, an NTT DATA Company, we work with our clients to build better, together. We are partnering with companies across a diverse range of industries to solve the toughest data challenges — we can help you shorten time to value!

We offer a range of enablement workshops and assessment services, data modernization and migration services, and consulting service packages for designing and building new data products as part of our service offerings. We would be glad to work through your specific requirements. Connect with us here..

--

--