DB Migrations through CSVs

Pedro Ismael Jara
Dec 6, 2018 · 9 min read

When faced with the problematic situation of migrating a DB we all start panicking because we know we have a complex task ahead of us and usually the biggest problems we face are:

Most of us get stuck even in the first step of the process, I was confronted with the task of migrating an old DB, and even though I could find tools that were capable of adapting the origin DB to the destination DB, they would not solve any of all the other problems that migrations face. After a lot of thought, it occurred to me: Why migrate through SQL? Why not use the power of my application (App) for all the data validation and all the information adaptation into the new schema?

I was much closer now, if I used the DB conversion tools I would only need to connect my App to both DBs and use my App for moving data from one end to the other, but this would require SQL knowledge of which I did not have plenty and I wanted to use the power of Rails to its fullest, there had to be a nicer and cleaner situation that would exempt me from SQL altogether and would only require me to use the knowledge of the field I’m more adept with. After several hours of trial and error, I thought “Why not skip the DBs compatibility issue and just use some universal medium?” So that’s how I ended up exporting all the tables that I needed into CSVs that could be easily parsed by my RoR application, running all my model validations through the easily adapted information to the new schema.

So now we have a universally adaptable method for any DB type to any DB type through any App we utilize:

First Steps

Before we begin any migration, the first step we should realize is to take the time to study the structure of both DBs, both origin, and destination, as to have a clear plan we can follow when we start migrating each table. We should be able to answer questions like:

With this information, the following steps will prove to be really simple, otherwise, migrating a DB with a structure we do not comprehend into another DB we are not familiar with, will be catastrophic.

Having finished a plan where we can see the data we wish to migrate and what it represents in our own structure, we can start exporting each individual table into their own respective .csv. In this case, we used csv due to the Ruby CSV libraries that allow a simple and effective processing and it’s a format very easy to export from any DB, but note that you could use any format that your App can interpret and your DB can export to. There are several ways you can achieve this, through console commands or your favorite visual tool. Here you can find some examples you can follow:

https://support.spatialkey.com/export-data-from-database-to-csv-file/

But I have some tips you should keep in mind that will help with the whole process:

2. Don’t overload your CSV with unnecessary columns or information you do not require in your migration

3. Depending on the data you are going to migrate, you should consider using special characters such as ‘ | ‘ and ‘¬’ for your column and fields separators instead of the default ‘,’ and ‘;’ so it does not cause conflicts with the information that is present in the CSV

4. Save those queries you are using to export each individual table with the required fields and the WHERE clauses you need, somewhere safe, you never know when you might need to revisit or re-use that code!

Now if we need to revisit that table, we know the columns we migrated and the filters and models we exported.

Now that we have a beautiful CSV for our App (in my case RoR) to parse, we can begin the process of moving that information to our DB through the power of magic and computers.

The Data Migration Process

Processing the CSV could be considered the hardest part of this process, but with a few tips and guidelines, you’ll find it’s really simple and easy, I’ll also be providing some examples along the way that you can copy and adapt to your own needs. In my case, I’ll be using Ruby version 2.3.4, Rails version 5.1.3, and the gem pg 0.18 so Ruby can communicate with our PSQL DB.

During the development of this part, we will use a simple example. We have a DB in SQL Server that we want to transfer to Postgres. It contains some Pokemons with a name and an id that will be carried over to the new schema that has the same structure.

$ rails g task your_task_name my_task # create lib/tasks/your_task_name.rake

And to run the code in our task, we will simply do:

$ rake your_task_name:my_task

Following this method, we make sure our DB remains clean and consistent, and the migration only completes when it’s guaranteed to finish correctly.

Remember, once you finish running this task for the first time, you’ll have to check the logs for all records that migrated unsuccessfully to run through the process again with a reduced dataset that contains only the failed records and the fixes for the errors that occurred while migrating. Depending on the consistency of your information and the number of records, this is a very valid alternative, having 10 records fail from 100.000, then fixing those particular cases and running the migration task again with the reduced data set is very simple compared to waiting hours for a task to finish, only to fail at the last moment and having to run it all over again. Although it’s highly recommended that you log the failed resources that fail in the first scenario with a complete log trace to perform the necessary fixes in your task, validations or information.

Leaving us with the following complete task:

This will make sure, all models you instantiate for that class, later on, will not try to step on the IDs you forcibly occupied with the migration.

Now we are ready to run our rake task by doing typing the namespace:task. In our case example, that would translate to:

$ rake pokemon_migrate:import

Tips:

But what about migrating files? Pictures? Buckets? Can we utilize this method for that too? Well of course! Not only that, I would highly recommend that you do. And on the next section, we will go in-depth with the explanation on how to do it and examples that you can follow and adapt to your own needs.

Migrating Files

When it comes to files, our applications could have a lot of validations, some which may not be present in the DB, and the tools required to do this kind of migration can be really hard to implement with SQL or other DB migrations software, but if the medium is our own application we can utilize all the tools provided by our own App.

Let’s see this more clearly with an example, remember our Pokemons? It looks like they have profile photos too:

Here we are using CarrierWave as uploader::

https://github.com/carrierwaveuploader/carrierwave

And we have the following table in our origin DB that links a Pokemon to its ProfilePhoto:

ProfilePhoto id : Integer pokemon_id : Integer url : String

We need to find the Pokemon associated with the profile photo and then migrate its picture to the one linked by the URL. Depending on your situation and requirements, you have two routes you can take, each with its own pros and cons:

Pros: Run all validations on the pictures you are gonna upload to your application. Run the versioning for all the sizes you might need of each photo. Use the new storage or bucket if you have one.

Cons: Slow and very high resource consumption.

2. With the URL, migrate the column to link to the original file location.

Pros: Quicker process and not very consuming when it comes to resources

Cons: The original location might have corrupted files or might not have all the versions you require for the photo. Must update all URLs for each version of the photo and must ensure there is an URL for each version. Might have to recreate the versions of the photos anyways.

Maintains the original location of the photo. Maybe you do not have admin access to the holding bucket.


By now, you’ll have learned that DB migrations don´t have to be a very complex and hard process, you don’t need to learn how to use any migration software, SQL, etc. We can manage to migrate an entire DB and it’s filed, no matter the origin and destination, by using the things we already know and have at our disposal.

I urge you to go ahead and try it for yourself, to give me your opinion and feedback and to add your expertise and knowledge so we can improve upon this.

Wolox

Wolox stands for innovation, engineering and working culture that transforms problems into solutions and ideas into products. www.wolox.com.ar

Pedro Ismael Jara

Written by

Wolox

Wolox

Wolox stands for innovation, engineering and working culture that transforms problems into solutions and ideas into products. www.wolox.com.ar