Couchdrop
Published in

Couchdrop

Azure Data Factory — Loading files into Google Cloud Storage and Amazon S3

Using Microsoft Azure’s Data Factory you can pull data from Amazon S3 and Google Cloud Storage to extract into your data pipeline (ETL workflow). However, Microsoft does not allow you to load (put/upload) files back into these platforms at the end of your extract, transform, load cycle.

couchdrop sfpt azure data factory

To get past this and to enable the ability to load files back into these platforms you can utilise the SFTP connector and Couchdrop. Couchdrop is a cloud SFTP / FTP conduit that acts as a fabric on top of cloud storage and offers webhooks, an API and supports web portal uploads, etc. In this case Couchdrop supports Google Cloud Storage, Amazon S3, SharePoint, Dropbox and anything in between. Using Couchdrop with Azure Data Factory you can pull data from any cloud storage platform, transform it and then load it to the same or a different cloud platform, all through SFTP.

ETL Use Case Examples:

  • As a vendor your clients send you files via SFTP (or another means such as web portal) where you can receive a webhook event on upload to then initiate your ETL process.
sftp data factory etl
Have your clients send you data via SFTP to then be processed through automated ETL operations

Have your clients send you data via SFTP to then be processed through automated ETL operations

  • As a client you can expose your data to your vendor for them to then process the uploaded file on a webhook event.
Expose data to your vendor to be automatically pulled into ETL operations
Expose data to your vendor to be automatically pulled into ETL operations

The Steps:

  • Step 1. Configure storage in Couchdrop
  • Step 2. Configure user(s)
  • Step 3. Configure webhooks (optional)
  • Step 4. Configure Couchdrop’s SFTP in Data Factory

Configuring Couchdrop is straightforward and only takes a couple of steps. You can create users who are locked to specific buckets and are limited to specific file operations (upload only, download only, read/write, etc.). As well configure webhooks based on upload/download events on certain folders. This enables you to trigger different workflows based on the uploaded folder and user. Couchdrop also offers an API to assist with onboarding users programmatically.

Step 1. Configure storage in Couchdrop

Navigate to your storage portal and configure a new storage connector. Below we are configuring Google Cloud Storage.

Connecting Google Cloud Storage in Couchdrop as an SFTP endpoint

Step 2. Configure user(s)

For this example we have created a user (gcsuser) who can only upload data to the gcs bucket we created above. In theory this could be an external party uploading data. You could create another user who has read/write access who can pull the data down based on the webhook event of the ‘gcsuser’ uploading a file and extract it into your workflow.

couchdrop cloud ftp
Configuring user who can only upload to Google Cloud Storage
couchdrop cloud sftp
Configuring additional settings for Couchdrop SFTP user

Step 3. Configure webhook (optional)

Under the specific folder in Couchdrop’s SFTP virtual file system you wish to send a webhook on — select the event you wish the webhook to be sent on and the URL and save.

data factory webhook etl
Configuring webhook under folder in Couchdrop’s Virtual File System

Sample Couchdrop SFTP webhook output:

{
"account": "demouser",
"filename": "/demo/customers/bobsburgers/burgersaucereceipe.txt",
"authenticated_user": "demo1",
"storage_engine": "hosted",
"storage_engine_id": "7e88f06d-3aa5-45d9-97c2-3c5fa28ca0b4",
"event_type": "upload",
"ip_address": "123.253.47.202",
"success": true,
"total_size": 40,
"additional_info": "",
"system": "sftp",
"transaction_id": "836851c7-f745-4476-8a0a-b4df14c4cd0e",
"region": "us1",
"text": "File /demo/customers/bobsburgers/burgersaucereceipe.txt uploaded by demo1 via sftp from 123.253.47.202"
}

Step 4. Configure Couchdrop’s SFTP in Data Factory

As Couchdrop SFTP works as a standard SFTP server, you simply need the hostname (sftp.couchdrop.io) and your Couchdrop SFTP’s user credentials.

couchdrop sftp cloud ftp
Configuring Couchdrop’s SFTP in Microsoft Data Factory

To get up and running with Couchdrop’s cloud SFTP server and integrate it into your ETL process, navigate to Couchdrop’s website to sign up or learn more.

On a final note, Couchdrop is simply a conduit and does not store data, nor does it ‘sync’ data to storage platforms. It processes transfers in memory directly to your endpoint which is overwritten.

--

--

--

Couchdrop is the secure file gateway and cloud SFTP / MFT platform. Couchdrop acts as your secure access method between systems or for clients to send you files to your cloud storage backend, that can be completely automated.

Recommended from Medium

Simple Introduction to mining LiteCoin in PC using CPU- [Educational Purpose only]

EKS — Episode-1 “ Creating EKS cluster using eksctl”

VISION Early Adopter Program is Coming !

The Cool Git: 5 Not so Well Known Tricks

Library Fine HackerRank

Convenient Way to Mutate Immutable Objects

How to manage your project step-by-step

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jayden Bartram

Jayden Bartram

COO of Couchdrop and Movebot

More from Medium

How to Automate Dataset Comparison Using Terraform And BigQuery

Cloud Migration — Options for virtual machines, databases, web apps and containers explained

Why Choose Cloud Data Storage Over On-premises Servers?

Develop Machine Learning Models with Zero Coding in Azure Machine Learning Studio