Since I always forget (database management isn’t my day-job): a short guide on how to quickly import large datasets in TXT or CSV into PostgreSQL. For smaller sets I still use ruby and FasterCSV to import the set, but nothing beats native DB imports in terms of speed. And speed doesn’t matter when importing a few megabytes of data, but it certainly matters when it gets more than that.

In this example I’ll use my current use-case, importing a large Drive-Time Matrix table, with drive times and distances between two postal codes. The head of the TXT file is formatted as such:

"Frompc4","Topc4","Time","Distance"
"1011","1011",0,0
"1011","1012",6,1737
"1011","1013",9,3378
"1011","1014",13,6056
"1011","1015",10,3198
"1011","1016",10,3112
"1011","1017",6,1706
"1011","1018",5,1791
"1011","1019",7,3146

Now let’s assume we want to import this in a table ‘DTM’ with the following columns: from_pc (integer), to_pc (integer), time (integer), distance (integer). It is recommended to use the psql command, since PgAdmin may disconnect while the copy command is running (run psql -h localhost -U username -d database).

A few ‘problems’ here, which we ‘ll tackle in the example:

  • Column-names aren’t equal
  • The CSV-file has a header row, which shouldn’t be imported
  • And maybe: Value-types aren’t equal for the postal-codes (postgres is rather strict when it comes to types)

The command we’ll be using is copy, which typically is used like copy DESTINATION from SOURCE.

Most sites tell you that you’ll be able to import a csv file using the following command:

\copy dtm from '/path/to/csv/DTM.csv' DELIMITERS ',' CSV;

But that doesn’t solve all issues: column names aren’t equal, the header row is imported as well, and Postgres may not make any assumptions about how to convert from string to integer.

Skipping the header is simple:

\copy dtm from '/path/to/csv/DTM.csv' DELIMITERS ',' CSV HEADER;

To map the columns, mention them in the csv-file’s order:

\copy dtm(frompc,topc,time,distance) from '/path/to/csv/DTM.csv' DELIMITERS ',' CSV HEADER;

Well, and in case you were worrying about type-conversion, for me it worked automagically :)

Gerelateerd

This article was published earlier on my very own blog:: PostgreSQL CSV import //

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.