Import CSV Sample (Exoplanets!) Data into PostgreSQL

Lorna Mitchell
Mar 3 · 4 min read

PostgreSQL is and remains one of our most popular and growing storage platforms; other storage technologies come and go but modern Postgres is a solid choice for so many applications. When you spin up your first Aiven PostgreSQL, you’ll want to take some time to play with the features … but there’s a problem. Your new shiny database is empty.

Finding and using some open datasets is a great way to fill this gap, and one option is go try the Kaggle platform. It’s a place to find open data, advice about data science, and some competitions you can participate in to hone your skills. There’s quite a selection of datasets to choose from, but today we’ll be using the exoplanets data from the Kepler mission. You’ll need a (free) account to log in and download the data. Go ahead and extract the zip file too, we’ll be using cumulative.csv for the example in this post.

Get Started with Aiven

We will also be using the Aiven CLI. This tool requires Python 3.6 or later, and can be installed from PyPI:

pip install aiven-client

You will also need to authenticate your Aiven account against the CLI tool. Replace your own details in the command below:

avn user login <email@example.com>

You have everything you need to create an Aiven database in the cloud.

Create PostgreSQL service

avn project create exoplanets

Aiven offers many options when creating services but to get us going quickly, we’ll use the newest postgres available and the smallest package, called hobbyist. One of the most fun things though is being able to choose any cloud platform you like so take a moment to check the list and copy the CLOUD_NAME field of your favorite:

avn cloud list

I chose google-europe-west1 for my example, but you can replace that with the cloud you chose. Here is the command to run to create the postgres database:

avn service create -t pg -p hobbyist --cloud google-europe-west1 pg-exoplanets

It takes a few minutes for the node to be ready, but the Aiven CLI has a handy “wait” command that doesn’t return until the service is ready to talk to us. This is less critical when we’re running the commands by hand as we do here, but it’s super useful when your CI system is spinning up the data platforms by itself!

avn service wait

When the command returns, our PostgreSQL cluster is ready to use. Let’s create a database to hold the sample data; the command below creates one named “exoplanets”:

avn service database-create --dbname exoplanets pg-exoplanets

Now we have our own sad and empty database, let’s look at the sample data and get it imported.

Adding CSV data to PostgreSQL

Here’s how to install the ddlgenerator tool and then generate the CREATE TABLE statement from the CSV we downloaded earlier:

pip install ddlgenerator ddlgenerator postgres cumulative.csv > create.sql

Have a look inside the file and you will see that we have the structure we need to explain to PostgreSQL how to hold the data. The avn service cli command will give us a psql prompt on the new database:

avn service cli pg-exoplanets

From within psql we can connect to the database we created, and then run the SQL file to create the table structure:

\c exoplanets \i create.sql

Adding the final piece to the puzzle, and still from the psql prompt, the next command brings in the CSV data:

\copy cumulative from data/cumulative.csv csv header

Nice work! The cumulative table should now have some data for you to play with!

Dreaming of Exoplanets

You can explore the dataset, which describes observations and compares the Kepler assessment of each exoplanet with its official status in the pre-existing literature. For example, try this to see the false-positives identified by Kepler:

select kepler_name, koi_pdisposition from cumulative where koi_disposition = 'CONFIRMED' and koi_pdisposition = 'FALSE POSITIVE';

You can also connect this data to other tools to use the dataset further. Either grab the connection details from the web console, or use jq with avn for a one-liner:

avn service get pg-exoplanets --json | jq ".service_uri"

Wrapping up

avn service terminate pg-exoplanets

For even more fun and learning, how about one of these resources:

Not using Aiven services yet? Get a free trial at https://console.aiven.io/signup

Originally published at https://aiven.io.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Lorna Mitchell

Written by

Polyglot programmer, technology addict, open source fanatic and incurable blogger (see http://lornajane.net)

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store