Ruby on Rails: Using Tempfile and Fog-AWS Gem to read excel files saved in AWS S3
AWS S3(AWS Simple Storage Service) has become a popular cloud service for people to store objects like files or images. It is more likely that we need to implement a function as accessing files from AWS S3 or uploading files to AWS S3 in our application. Recently, I also encounter a need to read a bunch of .xlsx files stored in AWS S3 in my Rails application, and I just want to read the content, so it is not a good idea to save all the files in our application forever. After trying some solution, I found Fog-AWS Gem and Ruby Class Method: Tempfile is a useful way to implement the result could meet the need. For sharing how I use them, I wrote this article. In this article, there are two sections: The first section is about how to use Fog-AWS Gem to access files stored in AWS S3 and the second section is about how to use Ruby Class Tempfile method to store .xlsx file temporarily and open it in Rails application.
Access file stored in AWS S3
Fog-AWS is a useful gem to access files from AWS S3 or upload files to AWS S3 from your Rails application. Just add this line to your application’s Gemfile, and run bundle install
, then you could start using Fog-AWS.
gem 'fog-aws'
Initialize connection to AWS S3
Before accessing the file from AWS S3, you need to connect it first. Fog-AWS provide a Class Fog::Storage
to initialize the connection. There are two ways to connect AWS S3. One is using aws_access_key_id
and aws_secret_access_key
s3 = Fog::Storage.new(
provider: 'AWS',
region: 'us-west-2',
aws_access_key_id: YOUR_S3_ACCESS_KEY,
aws_secret_access_key: YOUR_S3_SECRET_KEY
)
If your application is deployed on AWS EC2, then you could use the other way to connect to AWS S3. That is using AWS IAM role. By setting specific EC2 IAM role which could access AWS S3, your application could connect to AWS S3 by setting use_iam_profile = true
when initializing a connection.
s3 = Fog::Storage.new(
provider: 'AWS',
region: 'us-west-2',
use_iam_profile: true
)
List all files and access files in AWS S3 bucket
You could list all the files in a specific AWS S3 bucket.
s3_files = s3.directories.new(key: 'my-bucket').files
To access the file in AWS S3, Class Fog::Storage
provides get
method to get the file we need.
s3_file = s3.directories.new(key: 'my-bucket')
.files
.get('my_folder/test.xlsx')
We can’t directly open the file after we get it from AWS S3, because it is just a Fog::Storage::AWS::File
object as below, we should parse it through URL.
<Fog::Storage::AWS::File
key="my_folder/test.xlsx",
cache_control=nil,
content_disposition=nil,
content_encoding=nil,
content_length=5488,
content_md5=nil,
content_type="binary/octet-stream",
etag="38eed92f509046d915583875dd45454f",
expires=nil,
last_modified=2021-07-13 03:41:48 +0000,
metadata={"x-amz-id-2"=>"XXX", "x-amz-request-id"=>"XXX"},
owner=nil,
storage_class=nil,
encryption=nil,
encryption_key=nil,
version=nil,
kms_key_id=nil
>
Create file URL and parse the file
Class Fog::Storage
provides URL method to create file URL. It also allows us to set how long the URL will expire.
s3 = Fog::Storage.new(
provider: 'AWS',
region: 'us-west-2',
use_iam_profile: true
)
s3_file = s3.directories.new(key: 'my-bucket')
.files
.get('my_folder/test.xlsx')
url = s3_file.url(Time.zone.now + 60)
After the file URL is ready, we could use the parse
method provide by Ruby Class URI
to parse the file through its URL. The parsing result is an object of Ruby Class StringIO
as below
s3_temp_file = URI.parse(url).open=>#<StringIO:0x0000555e74fe1438 @base_uri=#<URI::HTTPS https://my-bucket.s3-us-west-2.amazonaws.com/my_folder/test.xlsx?X-Amz-Expires=60&X-Amz-Date=20210713T151552Z&X-Amz-Security-Token=XXX#X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIARPKY2DTZXLCOCK65/20210713/us-west-2/s3/aws4_request&X-Amz-SignedHeaders=host&X-Amz-Signature=XXX>, @meta={"x-amz-id-2"=>"XXX", "x-amz-request-id"=>"4MNS2VFY3DT0JKAT", "date"=>"Tue, 13 Jul 2021 15:16:02 GMT", "last-modified"=>"Tue, 13 Jul 2021 03:41:48 GMT", "etag"=>"\"38eed92f509046d915583875dd45454f\"", "accept-ranges"=>"bytes", "content-type"=>"binary/octet-stream", "server"=>"AmazonS3", "content-length"=>"5488"}, @metas={"x-amz-id-2"=>["t+D6xAC6o1MrEwhHjyBFkPdQoBfvTmfQh4ytJ/lAheZ89R/u6yj78d6Secc0AuQhfUODdwedOIQ="], "x-amz-request-id"=>["4MNS2VFY3DT0JKAT"], "date"=>["Tue, 13 Jul 2021 15:16:02 GMT"], "last-modified"=>["Tue, 13 Jul 2021 03:41:48 GMT"], "etag"=>["\"38eed92f509046d915583875dd45454f\""], "accept-ranges"=>["bytes"], "content-type"=>["binary/octet-stream"], "server"=>["AmazonS3"], "content-length"=>["5488"]}, @status=["200", "OK"]>
when reading the object, the content is not easy to use and to understand.
s3_temp_file.read=>"PK\x03\x04\x14\x00\x00\x00\b\x00P\x1C\xEER\aAMb\x81\x00\x00\x00\xB1\x00\x00\x00\x10\x00\x00\x00docProps/app.xmlM\x8E=\v\x021\x10D\xFF\xCAq\xBD\xB7A\xC1Bb@\xD0R\xB0\xB2\x0F{\e/\x90dC\xB2B~\xBE9\xC1\x8Fn\x1Eo\x18F\xDF\ng*\xE2\xA9\x0E-\x86T\x8F\xE3\"\x92\x0F\x00\x15\x17\x8A\xB6N]\xA7n\x1C\x97h\xA5cy\x00;\xE7\x91\xCE\x8C\xCFHI`\xAB\xD4\x1E\xA8\t\xA5\x99\xE6M\xFE\x0E\x8EF\x9Fr\x0E\x1E\xADxN\xE6\xEA\xB1pe'\xC3\xA5!\x05\r\xFFrm\xDE\xA9\xD45\xEF&\xF5\x96\x1F\xD6\xF0;i^PK\x03\x04\x14\x00\x00\x00\b\x00P\x1C\xEERt\x80r,\xEF\x00\x00\x00+\x02\x00\x00\x11\x00\x00\x00docProps/core.xml\xCD\x92QK\xC30\x10\xC7\xBF\x8A\xE4\xBD\xBD\xA6\x15\x1D\xA1\xCB\x8BcO\n\x82\x03\xC5\xB7\x90\xDC\xB6`\x93\x86\xE4\xA4\xDD\xB77\xAD[\x87\xE8\a\xF01w\xFF\xFC\xEEwp\xAD\x0EB\xF7\x11\x9Fc\x1F0\x92\xC5t3\xBA\xCE'\xA1\xC3\x9A\x.....
In order to use the file content and process it, we should either save the parse result as a file by using Ruby Class File
or just save it as a temp file by using Ruby Class Tempfile
and open it. In the next section, I’ll share how to use Class Tempfile
to save the parsing result as a temporary file and open it.
Save Excel file as a temp file and open it
Tempfile has already built-in Ruby Class, so we don’t need to install it. When you create a Tempfile object, it will create a temporary file with a unique filename and you can perform all the usual file operations on it, for example like reading data, writing data, changing its permissions, etc. When a Tempfile object is garbage collected, or when the Ruby interpreter exits, its associated temporary file is automatically deleted
Create temp file
We could either use new
or create
method provide by Tempfile to create a temp file. The below code shows using new
method to create a temp file. This will create a temp file which file name includes foo
tempfile = Tempfile.new('foo')
=> #<File:/var/folders/57/m85kgny17n70l64v009zhcfhmyj0p9/T/foo20210715-83391-157to5v>
Writing some content into a temp file, it will return how many char was written in.
tempfile.write('hello world')
=> 11
If we want to use create
method to do the same thing as using new
method, the code is like below, it also creates a temp file which file name include foo
Tempfile.create('foo') do |f|
f.write('hello world')
end
The only difference between new
and create
method is that create method accepts a block.
Set the temp file type and read the file
We could create a temp file with the specific file type, for example like .xlsx, .csv…etc.
Tempfile.new(['foo', '.xlsx'], encoding: 'ascii-8bit')=> #<File:/var/folders/57/m85kgny17n70l64v009zhcfhmyj0p9/T/foo20210715-33983-e1vs8.xlsx>
After we create the temp file, the file content should be written to temp file before we use it, and we should set the file to be read from the first line, because after the file was written some content, it will be read in the last new line, and there is no content.
tempfile = Tempfile.new('foo')
tempfile.write('hello world')
tempfile.read=> ""
We could use rewind
method provide by Class Tempfile
to set the temp file to be read from line one.
tempfile = Tempfile.new('foo')
tempfile.write('hello world')
tempfile.rewind
tempfile.read=> "hello world"
So the code to save a .xlsx file from AWS S3 as a temp file is like
s3_temp_file = URI.parse(url).open
tempfile = Tempfile.new([test, '.xlsx'], encoding: 'ascii-8bit')
tempfile.write(s3_temp_file.read)
tempfile.rewind
Open the excel type tempfile(.xlsx)
Finally, we could start using the temp file. The tempfile is a .xlsx file. To read this kind of file, Gem Roo provides useful methods to open it. Gem Roo also support to open other types of excel file like .csv, .xlsm..etc.
Just add Roo to Gemfile and run bundle install
, then we can use the Class Roo::Spreadsheet
to open the temp file.
xlsx = Roo::Spreadsheet.open(tempfile)
After the .xlsx file is open, just indicate which sheet need to read
sheet = xlsx.sheet(0)
Then all the rows and cells in this sheet could be accessed by using the method provided by Roo::Spreadsheet
All the methods' detail could be found on Gem Roo Github page.