Speeding up data import from SFTP into PostgreSQL in Rails

Harrison Kamau
Oct 18, 2019 · 3 min read
Image for post
Image for post
photo from unsplash.com

Most recently, I was tasked with retrieving a CSV file from a remote Secure File Transfer Protocol(SFTP) server, extract certain fields from the file containing at least 5000 rows and save each of the rows as a record in PostgreSQL. Here is a summary of how I was able to pull that off :)

I broke down the task into the following chunks:

  • Connect to SFTP
  • Check for a file match(regex came in handy here)
  • Download the file into an in-memory storage
  • Parse the file data from in-memory into an array
  • Destroy the existing locations database (more on this later)
  • Iterate through the locations array saving each item as a record
  • Reconnect to SFTP and archive the processed file

SFTP, known as SSH File Transfer Protocol or Secure File Transfer Protocol is a secure way of transferring files between two remote systems. The difference between SFTP and FTP (File Transfer Protocol) is that SFTP is a protocol packaged with SSH to transfer data/files over a secure connection.

To connect to an SFTP server, Ruby has net-sftp and net-ssh gems. Add them to your Gemfile and run bundle install .

gem 'net-sftp'
gem 'net-ssh'

I had this helper method for starting a standalone SFTP session. It will start up an SSH session using the given arguments, and will then start a new SFTP session with the SSH session. This will block until the new SFTP is fully open and initialized before returning it. Check this documentation for details.

sftp-session helper method

With that in place, we can iterate over the directory we need to check for our file.

This regex expression is a redacted version of the original one for security reasons.

check for a file match

I decided against writing the data to disk yet and chose to pass the IO object as the destination:

:read_size is the maximum number of bytes to read at a time from the source. It defaults to 32,000 bytes.

The locations database was meant to be recreated every day. So here is how to do it.

Use the Rails’ destroy_all method to clear all the locations from the database.

Location.destroy_all

Much later, I realized that optimization for speed was needed to reduce the time it took to import these locations into the database. Luckily, I found the activerecord-import gem used for bulk insertion of data into a database using ActiveRecord. So I changed the above code to:

In addition, add the following activerecord-import configurations to the Location model:

require 'activerecord-import/base'require 'activerecord-import/active_record/adapters/postgresql_adapter'# I'm going to use postgresql hence the 'postgresql_adapter'

Putting it all together

For brevity, check out the code in the lib/locations/downloader.rb file. I converted this into a Rake task. If you are new to Rake tasks, check this blog for more information.

Clone this repo and run a few tests with actual SFTP credentials or customize the code to just process a huge CSV file from anywhere. The speed of import will dazzle you.

The archive logic can be found in the lib/locations/archiver.rb .

It’s my hope that reading this blog has given you the tips and confidence required to deal with the menace of importing huge CSV datasets into your database.

Thanks for your time!

The Andela Way

A pool of thoughts from the brilliant people at Andela

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store