Ruby on Rails: Using Tempfile and Fog-AWS Gem to read excel files saved in AWS S3

icelandcheng
Nerd For Tech
Published in
6 min readJul 17, 2021
Image by WilliamsCreativity from Pixabay

AWS S3(AWS Simple Storage Service) has become a popular cloud service for people to store objects like files or images. It is more likely that we need to implement a function as accessing files from AWS S3 or uploading files to AWS S3 in our application. Recently, I also encounter a need to read a bunch of .xlsx files stored in AWS S3 in my Rails application, and I just want to read the content, so it is not a good idea to save all the files in our application forever. After trying some solution, I found Fog-AWS Gem and Ruby Class Method: Tempfile is a useful way to implement the result could meet the need. For sharing how I use them, I wrote this article. In this article, there are two sections: The first section is about how to use Fog-AWS Gem to access files stored in AWS S3 and the second section is about how to use Ruby Class Tempfile method to store .xlsx file temporarily and open it in Rails application.

Access file stored in AWS S3

Fog-AWS is a useful gem to access files from AWS S3 or upload files to AWS S3 from your Rails application. Just add this line to your application’s Gemfile, and run bundle install, then you could start using Fog-AWS.

gem 'fog-aws'

Initialize connection to AWS S3

Before accessing the file from AWS S3, you need to connect it first. Fog-AWS provide a Class Fog::Storage to initialize the connection. There are two ways to connect AWS S3. One is using aws_access_key_id and aws_secret_access_key

s3 = Fog::Storage.new(
provider: 'AWS',
region: 'us-west-2',
aws_access_key_id: YOUR_S3_ACCESS_KEY,
aws_secret_access_key: YOUR_S3_SECRET_KEY
)

If your application is deployed on AWS EC2, then you could use the other way to connect to AWS S3. That is using AWS IAM role. By setting specific EC2 IAM role which could access AWS S3, your application could connect to AWS S3 by setting use_iam_profile = true when initializing a connection.

s3 = Fog::Storage.new(
provider: 'AWS',
region: 'us-west-2',
use_iam_profile: true
)

List all files and access files in AWS S3 bucket

You could list all the files in a specific AWS S3 bucket.

s3_files = s3.directories.new(key: 'my-bucket').files

To access the file in AWS S3, Class Fog::Storage provides get method to get the file we need.

s3_file = s3.directories.new(key: 'my-bucket')
.files
.get('my_folder/test.xlsx')

We can’t directly open the file after we get it from AWS S3, because it is just a Fog::Storage::AWS::File object as below, we should parse it through URL.

<Fog::Storage::AWS::File
key="my_folder/test.xlsx",
cache_control=nil,
content_disposition=nil,
content_encoding=nil,
content_length=5488,
content_md5=nil,
content_type="binary/octet-stream",
etag="38eed92f509046d915583875dd45454f",
expires=nil,
last_modified=2021-07-13 03:41:48 +0000,
metadata={"x-amz-id-2"=>"XXX", "x-amz-request-id"=>"XXX"},
owner=nil,
storage_class=nil,
encryption=nil,
encryption_key=nil,
version=nil,
kms_key_id=nil
>

Create file URL and parse the file

Class Fog::Storage provides URL method to create file URL. It also allows us to set how long the URL will expire.

s3 = Fog::Storage.new(
provider: 'AWS',
region: 'us-west-2',
use_iam_profile: true
)
s3_file = s3.directories.new(key: 'my-bucket')
.files
.get('my_folder/test.xlsx')
url = s3_file.url(Time.zone.now + 60)

After the file URL is ready, we could use the parse method provide by Ruby Class URI to parse the file through its URL. The parsing result is an object of Ruby Class StringIO as below

s3_temp_file = URI.parse(url).open=>#<StringIO:0x0000555e74fe1438 @base_uri=#<URI::HTTPS https://my-bucket.s3-us-west-2.amazonaws.com/my_folder/test.xlsx?X-Amz-Expires=60&X-Amz-Date=20210713T151552Z&X-Amz-Security-Token=XXX#X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIARPKY2DTZXLCOCK65/20210713/us-west-2/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=XXX>, @meta={"x-amz-id-2"=>"XXX", "x-amz-request-id"=>"4MNS2VFY3DT0JKAT", "date"=>"Tue, 13 Jul 2021 15:16:02 GMT", "last-modified"=>"Tue, 13 Jul 2021 03:41:48 GMT", "etag"=>"\"38eed92f509046d915583875dd45454f\"", "accept-ranges"=>"bytes", "content-type"=>"binary/octet-stream", "server"=>"AmazonS3", "content-length"=>"5488"}, @metas={"x-amz-id-2"=>["t+D6xAC6o1MrEwhHjyBFkPdQoBfvTmfQh4ytJ/lAheZ89R/u6yj78d6Secc0AuQhfUODdwedOIQ="], "x-amz-request-id"=>["4MNS2VFY3DT0JKAT"], "date"=>["Tue, 13 Jul 2021 15:16:02 GMT"], "last-modified"=>["Tue, 13 Jul 2021 03:41:48 GMT"], "etag"=>["\"38eed92f509046d915583875dd45454f\""], "accept-ranges"=>["bytes"], "content-type"=>["binary/octet-stream"], "server"=>["AmazonS3"], "content-length"=>["5488"]}, @status=["200", "OK"]>

when reading the object, the content is not easy to use and to understand.

s3_temp_file.read=>"PK\x03\x04\x14\x00\x00\x00\b\x00P\x1C\xEER\aAMb\x81\x00\x00\x00\xB1\x00\x00\x00\x10\x00\x00\x00docProps/app.xmlM\x8E=\v\x021\x10D\xFF\xCAq\xBD\xB7A\xC1Bb@\xD0R\xB0\xB2\x0F{\e/\x90dC\xB2B~\xBE9\xC1\x8Fn\x1Eo\x18F\xDF\ng*\xE2\xA9\x0E-\x86T\x8F\xE3\"\x92\x0F\x00\x15\x17\x8A\xB6N]\xA7n\x1C\x97h\xA5cy\x00;\xE7\x91\xCE\x8C\xCFHI`\xAB\xD4\x1E\xA8\t\xA5\x99\xE6M\xFE\x0E\x8EF\x9Fr\x0E\x1E\xADxN\xE6\xEA\xB1pe'\xC3\xA5!\x05\r\xFFrm\xDE\xA9\xD45\xEF&\xF5\x96\x1F\xD6\xF0;i^PK\x03\x04\x14\x00\x00\x00\b\x00P\x1C\xEERt\x80r,\xEF\x00\x00\x00+\x02\x00\x00\x11\x00\x00\x00docProps/core.xml\xCD\x92QK\xC30\x10\xC7\xBF\x8A\xE4\xBD\xBD\xA6\x15\x1D\xA1\xCB\x8BcO\n\x82\x03\xC5\xB7\x90\xDC\xB6`\x93\x86\xE4\xA4\xDD\xB77\xAD[\x87\xE8\a\xF01w\xFF\xFC\xEEwp\xAD\x0EB\xF7\x11\x9Fc\x1F0\x92\xC5t3\xBA\xCE'\xA1\xC3\x9A\x.....

In order to use the file content and process it, we should either save the parse result as a file by using Ruby Class File or just save it as a temp file by using Ruby Class Tempfile and open it. In the next section, I’ll share how to use Class Tempfile to save the parsing result as a temporary file and open it.

Save Excel file as a temp file and open it

Tempfile has already built-in Ruby Class, so we don’t need to install it. When you create a Tempfile object, it will create a temporary file with a unique filename and you can perform all the usual file operations on it, for example like reading data, writing data, changing its permissions, etc. When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted

Create temp file

We could either use new or create method provide by Tempfile to create a temp file. The below code shows using new method to create a temp file. This will create a temp file which file name includes foo

tempfile = Tempfile.new('foo')
=> #<File:/var/folders/57/m85kgny17n70l64v009zhcfhmyj0p9/T/foo20210715-83391-157to5v>

Writing some content into a temp file, it will return how many char was written in.

tempfile.write('hello world')
=> 11

If we want to use create method to do the same thing as using new method, the code is like below, it also creates a temp file which file name include foo

Tempfile.create('foo') do |f|
f.write('hello world')
end

The only difference between new and create method is that create method accepts a block.

Set the temp file type and read the file

We could create a temp file with the specific file type, for example like .xlsx, .csv…etc.

Tempfile.new(['foo', '.xlsx'], encoding: 'ascii-8bit')=> #<File:/var/folders/57/m85kgny17n70l64v009zhcfhmyj0p9/T/foo20210715-33983-e1vs8.xlsx>

After we create the temp file, the file content should be written to temp file before we use it, and we should set the file to be read from the first line, because after the file was written some content, it will be read in the last new line, and there is no content.

tempfile = Tempfile.new('foo')
tempfile.write('hello world')
tempfile.read
=> ""

We could use rewind method provide by Class Tempfile to set the temp file to be read from line one.

tempfile = Tempfile.new('foo')
tempfile.write('hello world')
tempfile.rewind
tempfile.read
=> "hello world"

So the code to save a .xlsx file from AWS S3 as a temp file is like

s3_temp_file = URI.parse(url).open            
tempfile = Tempfile.new([test, '.xlsx'], encoding: 'ascii-8bit')
tempfile.write(s3_temp_file.read)
tempfile.rewind

Open the excel type tempfile(.xlsx)

Finally, we could start using the temp file. The tempfile is a .xlsx file. To read this kind of file, Gem Roo provides useful methods to open it. Gem Roo also support to open other types of excel file like .csv, .xlsm..etc.

Just add Roo to Gemfile and run bundle install, then we can use the Class Roo::Spreadsheet to open the temp file.

xlsx = Roo::Spreadsheet.open(tempfile)

After the .xlsx file is open, just indicate which sheet need to read

sheet = xlsx.sheet(0)

Then all the rows and cells in this sheet could be accessed by using the method provided by Roo::Spreadsheet All the methods' detail could be found on Gem Roo Github page.

Reference

--

--

icelandcheng
Nerd For Tech

Programming Skill learner and Sharer | Ruby on Rails | Golang | Vue.js | Web Map API