Modern File Transfer Solution on Azure — SFTP

Luca Pisano
3 min readMay 17, 2022

--

Are you thinking about a way to modernize your applications that make use of SFTP file transfers?
Traditional SFTP integrations leverage on a server to which they send files and a client polls at regular intervals, waiting for a file to became available for processing. This approach introduces time lags between the file being produced and it being consumed.
What about enhancing this approach using event driven file transfers?

With SFTPGo, an Open Source software available on GitHub, you can modernize the way systems exchange data between each other and call target applications in near real time.

The aim of this article is to present an architectural overview for file transfer modernization using Kubernetes and queues.

The presented solution is designed to be Scalable, Highly Available and Secure.
This article focuses on an Azure deployment but the solution itself is cloud agnostic, meaning it can be deployed either on AWS, GCP, OpenShift or on-prem using Kubernetes with some tweaks.
The following components are used for this scenario:

  • Azure Kubernetes as containerized compute environment
  • Azure Service Bus queues for massive processing
  • Azure Firewall to protect the L4 Load Balancer
  • Azure AD Managed Identities
  • Azure API Management to centralize APIs
  • Azure Functions to perform serverless routing of messages in the queue

Let’s describe the flow:

Networking & High Availability

When the client sends a file, the socket traverses an Azure Firewall that inspects, at Level 4, the source IP and the packets. Here you can apply IP based filtering as well as deep packet inspection (if you choose Premium tier).

Then the packets go to the L4 Load Balancer, which is deployed by Azure Kubernetes Service with a YAML Service. The load balancer only exposes a private IP, meaning that no-one from the public Internet can bypass the firewall. By default, Azure LB has a connection timeout of 4 minutes. For slow-long transfers, 4 minutes may not be enoungh. You can increase load-balancer connection timeout up to 30 minutes using annotation listed here.

Each SFTP pod is in active-active configuration, meaning all of them are available to receive traffic. Having multiple pods means a client can, potentially, connect to different servers. To ensure clients connect to the same server for a specific session, use Session Affinity.

To ensure High Availability for the overall solution, AKS node pools should be distributed over different Availability Zones across the choosen region.

Configuration data is stored in a Azure MySQL database and files are saved in Azure Blob Storage as blobs. This means you can apply data lifecycle rules to files to change the storage class for old files and save money.

Authentication

SFTPGo can handle client authorization natively or using custom authenticator plugins. Custom authenticator plugins are especially useful to implement highly secure scenarios such as using Azure AD authentication for the clients. Whenever SFTPGo receives an authentication request from a client, it calls an API exposed by the custom authentication plugin and waits for a HTTP response code. In the plugin you can validate client credential towards Azure AD or any other identity provider.

Event Based Handling

Wherever an event occurs, such as a file being uploaded, SFTPGo pods send a message into a ServiceBus topic with the information of which file has been uploaded and by who. You can configure SFTPGo to send events regarding uploads, downloads or any other file manipulation.

To react to the message, you can hook a consumer to the ServiceBus queue or leverage a routing software to perform message dispatch based on file properties. In this solution I’ve used a custom MessageRouter pod that listens on the queue waiting for new messages to be routed, then evaluates the body of the message, and routes it to the relevant target queues or applications using HTTP or AMQP or MQTT protocols.
In this way, target applications receive a near real time notification about the file being uploaded (or whatever event is configured) and can properly react.

References
SFTPGo on GitHub
MessageRouter on GitHub

UPDATE

Nowadays Azure natively supports SFTP protocol for Blob Storage. Anyway, there’re some limitations on it that can be overcomed using the SFTPGo solution.

Disclaimer: This article is published as an individual and is not related to my current job.

--

--

Luca Pisano

I'm a professional cloud architect focused on modern and cloud native development patterns