File transfer functionality with help from the paramiko and boto3 modules

Kiran Kumbhar
Jun 28 · 4 min read
Image from unspalsh. Credits @iammrcup

Hello everyone. In this article we will implement file transfer (from ftp server to amazon s3) functionality in python using the paramiko and boto3 modules.

Prerequisites

  • Python (3.6.x)
  • AWS S3 bucket access
  • FTP server access

Python Libraries

  • paramiko
  • boto3

Note: You don’t need to be familiar with the above python libraries to understand this article, but make sure you have access to AWS S3 bucket and FTP server with credentials. We will proceed with python functions step by step and I’ll leave a github link at the bottom of the article.

Step 1: Initial Setup

Install all of the above packages using pip install:

pip install paramiko boto3

Also install awscli on your machine and configure access id, secret key and region. here is the link on how to do it.

Step 2: Open FTP Connection

Lets have a look at the function which will make ftp connection to server.

We will make a new SSH session using paramiko’s SSHClient class. We need to load local system keys for the session. For FTP transport over ssh we need to specify server host name ftp_host and port ftp_port. Once the connection is made, we authenticate the FTP server to open the new ftp connection using transport.connect(). If authentication is successful, we initiate FTP connection using SFTPClient of paramiko. We’ll get theftp_connection object, with which we can perform remote file operations on the FTP server.

Step 3: Transfer file from FTP to S3

This will be a big function that will do the actual transfer for you. We will break down the code snippets to understand what is actually going on here.

First things firs t— connection to FTP and S3

initial ftp and s3 connection setup

The transfer_file_from_ftp_to_s3() function takes a bunch of arguments, most of which are self explanatory. ftp_file_path is the path from the root directory of the ftp server to the file, with the file name. For example,folder1/folder2/file.txt. Similarly s3_file_path is the path starting from root of the S3 bucket, including the file name. The program reads the file from the ftp path and copies the same file to S3 bucket at the given s3 path.

We will also read the file size from ftp. According to the size of file we will decide the approach — whether to transfer the complete file or transfer it in chunks by providing chunk_size (also known as multipart upload).

Avoid duplicate copy

This small try catch block will compare the provided s3 file name with the same path. It will also check the size of the file. If it matches we will abort transfer, thereby closing FTP connection and returning from function.

Transfer the small files in one go

Transfer files at once

If the file is smaller than the chunk size we have provided, then we read the complete file using the read() method. This will return the file data in bytes. We then upload this byte data directly to s3 bucket, with the given path and file name, using theupload_fileobj() function.

Transfer big files in chunks AKA Multipart Upload

Transfer file in chunks

We will transfer thefile in chunks! This is where the real fun begins…

First we count the number of chunks we need to transfer based on the file size. Remember, AWS won’t allow any chunk size to be less than 5MB, except the last part. The last part can be less than 5MB.

We iterate over for loops for all the chunks to read data in chunks from ftp and upload it to S3. We use the multipart upload facility provided by boto3 library. create_multipart_upload() will initiate the process. The chunk transfer will be carried out by transfer_chunk_from_ftp_to_s3() function, which will return the python dict containing information about the uploaded part called parts.

The python dict parts_info has key ‘Parts’ and value is a list of python dict parts .This parts_info dict will be used bycomplete_multipart_upload() to complete the transfer. It also takes the upload id from multipart dict returned after initiating multipart upload. After completing multipart upload we close the ftp connection.

How to transfer the chunk?

This function will read the ftp file data of chunk size in bytes by passing chunk size to ftp_file.read() function. This byte data will be passed as a Body parameter to s3_connection.upload_part() function. upload_part() will take other parameters like name of the bucket, s3 file path. PartNumber parameter is just the integer indicating number of part, like 1,2,3 etc.

Once part is uploaded, we return part-output dict with Etag and PartNumber , which is then passed as value to the dict called part_info to complete the multipart upload.

We did it!

That’s it! You have transferred file from ftp to s3 successfully — you should now see the message on the console.

Visit the Github Link for the complete python script. Thank you for reading this so far. I hope you found this article helpful. Cheers!

Better Programming

Advice for programmers.

Kiran Kumbhar

Written by

Full Stack Web developer in progress :) https://github.com/kirankumbhar/

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade