Getting Started with Heroku, Postgres and PgAdmin — RUN__ON Part 2

15 min readAug 27, 2018

CS Weekly 7 — RUN_ON Part 2

This is Part 2 of a mini-series that discusses several topics in creating a full-stack web application called RUN__ON.

Part 1 — Setting up CI and Project Infrastructure
Part 2 — Setting up Heroku Postgres (You are here)

This mini-series is apart of an effort called CS Weekly, a manifesto towards committing oneself to weekly programmatic explorations in an article found here:

Code Something Weekly: How and Why

Introducing C.S. Weekly — A Manifesto

medium.com

Working With Databases Locally and Remotely

I’m not going to lie, this topic is a bit challenging to approach in an elegant manner. I’ll attempt to outline the problems of working with various databases and an opinionated method to manage everything using pgAdmin and Heroku.

When developing an application that makes heavy use of a database(s), one can imagine that at least 3 databases exist:

Production database — live on a server somewhere, containing the application data
Development database(s) — either locally on each developer’s computer, or hosted centrally somewhere else
Test database(s) — like the development databases, for use in continuous integration and running tests.

Given these various versions, one can run into issues making sure the schemas and data are consistent across the board. Imagine for a second that there’s a minor unnoticed inconsistency somewhere between the development and production databases. Code running on the development database seems to work, but might not work as expected live.

It’s thus desirable to avoid the possibility of inconsistencies and therefore a streamlined mechanism to synchronize everything is rather valuable.

Working with Heroku Datastores

Thankfully, there are ways to create database dump files which contain all of the information to reproduce a database. Furthermore, Heroku offers ways to both import and export databases by using such dump files.

The problem with the import method is that it requires saving a dump file somewhere with a publicly accessible http URL (ie: it cannot be done through Heroku dashboard, unfortunately). They quickly add:

For security, we recommend obscuring the filename and also removing the file once the import has completed.

But I’m not going to lie, this doesn’t sit well with me. At all. They recommend setting up an Amazon S3 account to mitigate risks etc, but for a simple API this is yet another hoop to step through and manage, particularly as a solo developer.

Imagine accidentally leaving a dump file with sensitive information sitting somewhere. Yea, not the most pleasant of thoughts.

Thus, I’ve justified to myself working in a rather opposite/non-intuitive manner: I create changes to the Heroku production database and export them locally.

In practice, this pipeline looks as follows:

Import the production database locally
Play around with proposed changes
Run the new SQL in production
Re-sync local databases

Given everything, it’s important to start getting set remotely from the start, before development spirals out of control. If there’s anything learned from both parts of this series thus far, it’s that big projects require a lot of infrastructure and thought. Diving head-first into code is great for prototyping, but is not ideal and will most certainly cause a need to backtrack when actually attempting to get everything working in harmony.

Setting Up Heroku Postgres

Sign into Heroku/create an account and then go to the apps dashboard https://dashboard.heroku.com/apps.

new > create new app

Installing the CLI

Heroku warns against installing with npm as unlike using other installers it doesn’t expose autoupdates.

Check the page below and use the appropriate installer for your system:

The Heroku CLI | Heroku Dev Center

How to download, install, and start using, the Heroku CLI. The Heroku CLI used to be part of the Heroku Toolbelt.

devcenter.heroku.com

Once done, optionally add the heroku-pg-extras plugin as follows:

heroku plugins:install heroku-pg-extras

Adding Postgres

From the application dashboard on Heroku, go to the ‘resources’ tab and search for postgres. Provision it and use the free option.

There’s now a clickable link to the Datastore associated with your application.

Go ahead and open the link to your datastore, and take note of exactly how to get here again. Go to the settings tab and notice an option to reveal credentials.

The URI information to connect to your database

It’s important to know how to find these credentials as they’ll be needed to communicate with the heroku database locally.

Installing Postgres/pgAdmin locally

If you don’t have postgres locally, download it along with pgAdmin:

PostgreSQL: Downloads

2ndQuadrant Unified Data Analytics (2UDA - pronounced tudor) is a data analytics application suite that unifies…

www.postgresql.org

As stated on the downloads page, the installer should include pgAdmin (among various other clients). Ensure that this is the case and that these options are checked during installation.

It’s easy to verify if postgres exists by simply just running one of the clients

createdb --version

If for some reason you have postgres installed but not pgAdmin, you can add it separately:

Download

pgAdmin Website

www.pgadmin.org

Accessing Remote Databases Locally

Note: For the following, keep your Heroku Postgress datastore credentials open in a tab, as we’ll be using them to connect.

pgAdmin is a GUI for postgresql databases that can be used to access and modify databases that not only exist locally, but also remotely.

For a fresh install of pgAdmin, the dashboard likely contains only one server. This is your local server:

PostgreSQL 10 is the default local server on my version of pgAdmin

Expanding it reveals local databases, existing on your physical device.

ps I wrote about MyMusic in a previous CS Weekly article

However, as one might intuit, there are different credentials when connecting to a remote server. Thus we have to actually configure a new server with those credentials.

right click server(s) > create > server …

Fill out the following:

Name: This is solely for you. Name it whatever you want, I chose ‘Heroku-Run — On’
Under the connection tab: hostname/address. If you go back to your datastores ‘reveal credentials’, this is the host credential. It should look like **-**-**...amazonaws.com
Keep the port at 5432, unless your credentials list otherwise
Maintenance database — this is the database field in the credentials
Username — this is the user field in the credentials
Password — the password field in the credentials. I highly advise checking save password so that you don’t have to copypasta this every time you want to connect.
In the SSL tab, mark SSL mode as require

At this point, if we were to hit ‘save’ (please don’t), something very strange would happen. You’d see hundreds if not thousands of databases appear in pgAdmin. This has to do with how Heroku configures their servers. You’ll still only have access to your specific database, not those of others. In order to avoid parsing so many databases (don’t believe me, check out the image below), we have to white list only those databases we care about.

go to the Advanced tab and under db restriction copy the database name (it’s the same value as the Maintenance database field filled earlier).

Failure to white list your database:

Preliminary Schema

At this point we have:

A Heroku application provisioned with Postgres
A connection to the production database

The next step is to start creating the first version of a schema and syncing everything with the development and test environments.

Since the point of this article isn’t database design, let’s just quickly consider what data we’re likely to want to store in such an application:

Users — id, name, picture, joined
Stories — id, title, author, story, created, edited

While it’s true that Stories could be broken out into phrases, I think it will suffice to keep things simpler for now.

Creating Users Table in pgAdmin

Expand database > database > schemas > public > Tables
Right click > create > table…
General tab, fill in the name users. I keep mine lowercase, otherwise the generated SQL will use quotation marks. Ensure owner matches your user credential, the schema is public. Additionally I like to set the tablespace to pg_default, which is just indicating how things are stored on disk.

Under Columns, add

id: serial, not null, primary key
name: varchar, length 20, not null
picture: text
joined

Use the ‘edit’ icon to the left for the joined column, as we’ll need to specify some extra details. A menu should open up. Click the ‘definition’ tab of which can be used to specify a default value:

Re-click the edit icon to leave the menu. The column definitions look as follows:

Now this next step is entirely optional, but I think it’s worth doing just to demonstrate pgAdmin functionality.

Go the Constraints tab and the Unique sub-tab
Add a unique constraint for the ‘name’ column, which means that users cannot share the same username.
Give the constraint a name and optionally a comment in the ‘general’ tab. This is mosly for your own housekeeping.

Finally in the Definition tab, list the appropriate column

Before clicking ‘save’ and thus creating the table, there’s one last thing to do. Checkout the SQL tab to see the SQL statements that were generated from filling in all of the above. Ensure everything looks as expected, then hit save!

Creating Stories Table

Alright, we’re pros at using pgAdmin now so I’ll fast-forward a bit.

The columns should look as follows. Note the jsonb type for the story column and the integer type for the author column. Make sure the created and edited columns have the default value filled out as done previously in the users table.

The reason the story is of type jsonb will become clear in the future, but essentially we can represent a single story as a custom defined Object, rather than storing separate Text/String phrases or a massive String.

Next we have to associate the author as a foreign key to the users table.

Constraints tab > Foreign Key sub-tab

Skip over to the columns tab. Fill out the information and very importantly, once it’s entered hit the little ‘plus’ sign hiding to the right.

Finally, head to the Action tab and set update/delete to cascade. For those unfamiliar with cascading, it’s just to keep the referenced Author up to date with the Story. In the case of an Author being deleted from the application, their stories will cascade delete.

Delete any rows referencing the deleted row, or update the values of the referencing column(s) to the new values of the referenced columns, respectively.

Editing Data with pgAdmin

pgAdmin can not only run SQL scripts, but also provides a visual way to edit data. Using the visual editor, quickly verify the tables work as expected.

Right click users > view/edit data > all rows

Fill in an id and a name, then hit save. If the created column was properly set to use a default value, the current timestamp should automatically be inserted:

Nice, an automatically created timestamp

Next, test out the stories table. Add an id, title, author with the same id as the created user, some placeholder JSON and click ‘save’. The created and edit timestamps should automatically update.

Now test deleting the user to see the cascade in action.

Head to the users table, click the leftmost ‘1’ in order to highlight the row, then hit the delete button at the top.
Head back to the stories table tab. You might still see the row there, but hitting the refresh button (the lightning symbol at the top) should now no longer show the previously created row, since the user was deleted!

Adding an Update Trigger Function in pgAdmin

We have the nice edited column in the stories table, but it won’t actually do anything if you edit the data (verify this for yourself). Instead, we need to actually implement a trigger function.

trigger functions > right click > create > trigger function…

We’re going to create a trigger function that can be used for any table that has an edited column. Quite simply, a trigger function is some function to run on a specified event. The function is declared separately from the events that trigger it.

in the general tab, give it a name. I called mine edited
(optionally) leave a comment — I wrote “Updates edited column timestamp”
In the definition tab, add the following code

BEGIN
  NEW.edited := NOW();
  RETURN NEW;
END;

In the options tab, select volatile (means that this value can change during scans, which is true if a user rapidly sends post requests to the same resource)

Creating a trigger function was simple enough, but as stated prior events have to be associated to it. Back to the stories table:

expand stories > right click triggers > create > trigger…

Give it a useful name and comment

Associate it to the proper trigger function:

In the Events tab, indicate to run this function on table updates. Furthermore, specify only the title and story columns, as we don’t really care if the Author changes their name and I sure as hell hope the id and created columns aren’t changing:

We only trigger on update of title and story columns

Optionally, inspect/verify the SQL tab to see what will be generated.

Finally — test things out. Play around with the story data as before, noting that the edited tab should update automatically if and only if the story and/or title columns change values.

NOTE: While verifying your trigger function, if you hit ‘save’ and notice the edited column didn’t change, that’s because you need to then re-fetch the data using the ‘execute/refresh’ button (lightning symbol).

Exporting Remote Database for Local Development

We’ve done a lot.

Created a remote Heroku Postgres database
Configured a connection to the remote database in pgAdmin
Added a users and stories tables
Configured constraints (unique user.name and foreign key stories.author)
Added a trigger function (update()) and a trigger event

Finally, let’s discuss how to clone this environment locally and setup ways to do this easily in the future. Lastly, we’ll look at how to recreate the database for CI testing purposes.

There are two steps to clone a database from heroku

Create a dump file from the remote
Restore local database from a dump file

Additionally, we’ll look at how to download a backup of our remote database

Creating a Dump File

One of the clients accompanying a download of postgres is pg_dump.

Running pg_dump --help, we see some useful options:

--data-only — dump only the data, not the schema
--clean — clean db objects before recreating
--create — include commands to create database in dump
--no-owner — skip restoration of object ownership
--schema-only — dump only the schema, no data
--no-privileges — doesn’t grant owner privileges`
--if-exists — add IF EXISTS when dropping objects

Furthermore, we’re given connection options:

--dbname, --host, --port, --username

Thus, we can actually use pg_dump with a combination of these options (and a few other handy ones) to create a local dump of a remote database.

For testing and development purposes and depending on the API, we most likely don’t want the data. The --schema-only option allows us to get the tables, constraints, trigger functions etc, but without any of the table data. Instead, development and testing seeds can be used to fill tables as necessary.

--no-owner and — no-privileges are a life saver in that they remove the ownership and grants from the username created by Heroku. Instead we can use an owner of our preference when we import the schema locally, such as postgres or runon. --clean is used to remove existing objects when creating from the dump file. Thus we don’t have to manually destroy a development database, tables will be cleaned and re-created.

Because there’s so much going on with all of these options, I elected to create a simple script. This is entirely cosmetic/optional:

Note: I add a newline for each pg_dump option just to improve readability. In the actual script, those are all in one line.

#!/bin/bashecho -n "dbname > "
read dbnameecho -n "host > "
read hostecho -n "port > "
read portecho -n "username > "
read usernameecho -n "create file with name > "
read filenamepg_dump 
--dbname=$dbname
--host=$host
--port=$port
--username=$username
--file=$filename
--verbose
--clean
--no-owner
--schema-only
--no-privileges
--if-existsecho "done"
exit 0

Depending upon your setup, you can greatly simplify things by storing your credentials as environment variables. If like me you went the lazy route of just inputting the credentials for each access, do note that you don’t have to open your browser. There’s actually a nice way to access your credentials from the command line:

heroku pg:credentials:url --app [name]

If the above gives you any trouble, just verify that you’re logged in heroku login and that the app name you’re supplying matches the application you’ve created.

Another tip is that in most bash shells that I’ve ever worked with, you can simply highlight a line and ‘right-click’ to copy. Successive right clicks at an empty prompt will paste.

Heroku Backups

There is a built-in way to not only create a backup on the Heroku server, but download it. The reason I don’t use this method to generate a dump file, however, is that the API offers no control over options for pg_dump. Sometimes we might want to export only a specific schema, for example, pg_dump can dump specific schemas or tables.

However, it is quite clearly desirable to backup production data.

To create a backup, use:

heroku pg:backups:capture --app [name]

To download a backup, use:

heroku pg:backups:download --app [name] --output [filename]

I combined these into a script:

#!/bin/bashname=latest_backup.dumpif [[ $1 == "" ]]; then
 echo "requires app name"
 exit 1
fiif [[ $2 != "" ]]; then
 name=$2
finame=".data/backups/$name"echo -e "creating heroku backup for app $1 called $name \n\n"heroku pg:backups:capture --app $1
heroku pg:backups:download --app $1 --output $nameecho "done"
exit 0

Restoring A Database from a Dump File

There’s a postgres client that does the inverse operation of pg_dump called pg_restore.

However, if you take a look at the dump that was created earlier from running pg_dump with those exact options, it’s actually a plain-text SQL file. Since this is the case, we actually use psql.

Again, let’s examine the command line options:

--file=FILENAME — the dump file
--echo-all — echo all SQL commands executed
--echo-errors — echo any errors encountered
--dbname, --host, --port, --username

So, in a very similar manner, let’s create a script that produces a development database using psql from a dump file.

#!/bin/bashdbname=runon_dev
username=postgres
port=5432
host=localhostif [[ $1 == "" ]]; then
 echo "provide path to dump file"
 exit 1
fiecho -e "Restoring database from $1 \n\n"psql
--dbname=$dbname
--file=$1
--echo-all
--echo-errors
--host=$host
--port=$port
--username=$usernameecho "done"
exit 0

Note: You need to run createdb or use PgAdmin for the first time, just to create the local database. Optionally you can run dropdb and createdb after each restore, it’s entirely up to you.

Creating Database in Travis Tests

In the previous section, I was adding dumps and backups to a directory .data from root. This directory is ignored from source control.

However, we’re going to want to source control specific dumps that can be used for Travis testing.

From root:

mkdir sql

Now run (from root):

./scripts/createdump

supplying a filename of sql/create-tables.sql

Then, in .travis.yml add to the existing before_script and just call this file with psql:

before_script:
  - psql ...
  - psql ...
  - psql --dbname=$DB_NAME --file=sql/create-tables.sql

Wrap Up

Working with multiple databases both remotely and locally presents a challenge. This article demonstrated a workflow using postgres databases both remotely via Heroku and locally. A connection with pgAdmin was used to create and modify the remote database. Tables, constraints, triggers and trigger functions were all examined. Dump files and psql were used to generate local and testing databases for continuous integration.

The code for the entire project is available on Github:

vapurrmaid/run__on

run__on - A variant of mad libs, used as a tutorial for a series on medium.com

github.com

The code specific to just this article is found in a pull request:

Add Postgres by vapurrmaid · Pull Request #2 · vapurrmaid/run__on

ignore dumps and backups located in .data/ add sql files created from specific dumps in sql/ create database from sql…