Unloading a file from Redshift to S3 (with headers)

Pericles Theodorou
Carwow Product, Design & Engineering
3 min readAug 30, 2016

Couple of months ago, we started using Redshift to solve a couple of scaling problems: we needed a super fast datastore for our BI team to run heavy reports on and also start decoupling our production and reporting databases.

A few days ago, we needed to export the results of a Redshift query into a CSV file and then upload it to S3 so we can feed a third party API. Redshift has already an UNLOAD command that does just that!

Connecting to Redshift

Let’s start by adding the Redshift configuration into a file.

The search_path option allows us to specify which schemas we are going to query when given an a DB object e.g. a table.

We are going to be working with Ruby but the code should be simple enough to adjust it to whatever language you prefer.

Let’s create a Gemfile. You are going to need two gems:

gem ‘sequel’
gem ‘sequel-redshift'

Let’s now connect to Redshift via the sequel gem.

require ‘bundler’
Bundler.require
config = YAML::load_file(File.join(__dir__, 'database.yml'))
REDSHIFT = Sequel.connect(config['redshift'], max_connections: 10))

We should be now connected to the Redshift cluster 🎉🎉🎉

Running queries on Redshift

Querying the Redshift cluster is now as simple as:

sql = "UPDATE my_schema.my_table SET my_field = 'yay!' WHERE id = 1;REDSHIFT.run(sql)

We can do all sorts of queries but for the sake of simplicity let’s only have the following one for now:

full_sql = “SELECT * FROM my_schema.my_table WHERE my_column > 10”

Using the UNLOAD command

The syntax of the command can be found here but this is more or less how we could do it.

s3_bucket_file = "'s3://path/to/bucket/file.csv'" 
aws_credentials = "'aws_access_key_id=my_key;aws_secret_access_key=my_secret’"

REDSHIFT.run("unload #{full_sql} to #{s3_bucket_file} credentials #{aws_credentials} DELIMITER AS ‘,’ ADDQUOTES NULL AS ‘’ allowoverwrite parallel off;”)

Running the above should upload the results of the query as a CSV file on S3. Job done! However, if we inspect the file, we’ll find out that the headers are missing! OMG!

If you are familiar with Postgres COPY command, you would expect a similar option in the UNLOAD command but that’s not the case unfortunately.

Include the headers of the CSV file

This feels a bit cumbersome and dirty but that’s how we managed to get the headers working on our CSV file.

Grab the column names for the table we are interested in

column_query = <<QUERY
SELECT column_name FROM information_schema.columns
WHERE table_schema = 'my_schema' AND table_name = 'my_table'
ORDER BY ordinal_position;
QUERY
columns = REDSHIFT[query].all.map {|c| c[:column_name] }
=> [:id, :another_field, :yet_another_field]

UNION between the headers and the SELECT columns

This sounds a bit weird so let me explain. In SQL, something along the lines of SELECT ‘a_string’ is a valid query. So we can use that to unify our columns with the results of the actual query.

To make it easier to visualise the results, here’s a simple example that mirrors what we will end up doing

SELECT ‘1’ UNION SELECT ‘2’; ?column?
— — — — —
1
2
(2 rows)

Now that we have a better understanding, let’s see how we can actually implement it.

column_names = columns.map { |c| “\\’#{c}\\’” }.join(‘, ‘) column_castings = columns.map {|c| “CAST(#{c} AS text) AS #{c}”}.join(‘, ‘)

The first line above escapes the single quotes that are present from the UNLOAD query.

The second line casts the results of the query to text because the UNION will only work with the same types.

The final select query is now:

select_query = “SELECT #{column_names} UNION (SELECT #{column_castings} FROM my_schema.my_table) ORDER BY 1 DESC”

The ordering is needed so that the headers of the CSV are on the first line. Otherwise, they could end up anywhere in the CSV.

And there you have it. That’s how we managed to upload some Redshift results onto S3 with the headers. You can find a rough implementation here.

Interested in making an Impact? Join the carwow-team!
Feeling social? Connect with us on Twitter and LinkedIn :-)

--

--