In this article we will discuss how to migrate hundreds of thousands of attachments from Paperclip to ActiveStorage without downtime.
At Sortlist, one of my first tasks after joining the team, was to migrate from Paperclip to Rails built-in ActiveStorage or to Shrine (another candidate can be CarrierWave). The reason for this is because we are adepts of keeping things up to date and Rails 5.2 came with ActiveStorage. Paperclip was already deprecated for some time and we wanted to move on with our lives ✈️.
We came to the conclusion that we would migrate to ActiveStorage because we don’t really need the full processing power and configuration options of Shrine. Even though (at the time of writing this article) ActiveStorage is not as mature as Shrine or CarrierWave, it has the Rails community behind it, so we were happy with that.
After reading a few articles about this on the web, the process seemed pretty straightforward, but the problem was that all the tutorials that we have found were dedicated to small Ruby on Rails applications with a limited number of attachments. In these cases, the migration is very fast with no downtime whatsoever. Some examples: GoRails and the RailsConf 2019 video below.
Since at Sortlist we have hundreds of thousands of attachments, we can’t afford to wait hours or possibly even days to migrate all of our attachments and during this time to have all attachment actions unavailable 😱, so we had to come up with a different approach and this is where things become interesting. I’ll start explaining below, but first let’s refresh our knowledge on how Paperclip and ActiveStorage work.
As you may know, Paperclip works by attaching file data to the model by changing the schema of the model. In addition to that, you can also write normal Rails validations for attachments. For example (or some other variations):
If you need more attachment types on a specific model, than your model schema can become out of hand. 🙊
On the other hand, ActiveStorage creates 2 new database tables
ActiveStorageBlobs which is the table handling the attachment data and
ActiveStorageAttachments table, which is a polymorphic table between the blobs table and your Rails models. This means you can also have as many attachment types as you want on your model, without every changing the schema 🔝.
So now that we know how they work and because we didn’t want to have any unavailable attachments during the whole migration process, we decided to split the migration into 2 big steps or Pull Requests: Hidden ActiveStorage and ActiveStorage Rollout.
Hidden ActiveStorage (A)
Keep everything from Paperclip as is, but also add ActiveStorage. We are going to use both of them at the same time. This means that during the time the attachments are migrated from Paperclip to ActiveStorage, if someone decides to upload an attachment, the user would still use the working Paperclip implementation (the same seamless flow the user is used to), but in the background, we would also duplicate the new attachment into ActiveStorage by making use of Observers. (The user doesn’t need to know that 😁)
Step ActiveStorageRollout (B)
After the migration finishes, remove everything related to Paperclip and only use the new ActiveStorage implementation.
This logic makes sense to me, so let’s do it! (Easier said than done 😅)
Let’s move on to some coding 🎉.
Step A.1. Install ActiveStorage
We will start by installing ActiveStorage. Normally, Rails 5.2 already comes with it, so all we need to do is run:
in the terminal.
This will generate the migrations to create the 2 tables mentioned above:
ActiveStorageAttachments. Before migrating the tables, we add a new column called
:storage_url of type string to the
ActiveStorageBlobs table, so the final migration file looks like this:
:storage_url column is used to store the direct URL of the attachment directly in the database. We will use this new column as a getter for the direct URL of the attachment.
Why? Because we can easily clone the database to any environment and still have working attachments. This means we don’t care about different storage configurations between environments, we know that the attachments work everywhere and new attachments will only be uploaded on the specific environment, but work on other environments if we clone the database where the attachment was uploaded. (Great if you are using multiple environments in your development workflow🥂).
The second reason is that we make use of direct SQL queries for some of our pages and this makes it easier to write them since the table contains the direct URL of the image to be used on the frontend. For example, we can create a query like so:
or other variations. Discussing (materialized) views is out of the scope of this article, but we may discuss it in more detail in the future if you are interested 🥳.
Step A.2. Install ActiveStorage validations
Paperclip offers validations (example above) and we wanted something similar. Out of the box, ActiveStorage does not come with validations 😕, but we found an alternative: https://github.com/igorkasyanchuk/active_storage_validations
If you are using
active_storagegem and you want to add simple validations for it, like presence or content_type you need to write a custom validation method.
This gems doing it for you. Just use
Even though we will not be using the gem yet, we can move on by just installing it by adding it to our
Gemfile and then running
gem "active_storage_validations", "~> 0.6.1"
Step A.3 Configure Cloud Storage Provider & ActiveStorage
In this step we will configure our cloud storage provider. Going into details for this specific task is beyond the scope of this blog post, so to sum it up, we can just configure a new bucket and permissions on AWS S3 (we are using AWS, but the process is similar for other storage providers) and add the necessary environment variables into your Rails project to allow access to your new AWS bucket.
For example, at Sortlist, for Paperclip we were using
AWS_S3_REGION etc. In addition to these, for ActiveStorage we created
AWS_AS_S3_REGION etc. As you might guess,
AS refers to ActiveStorage so we know which keys are for which service 🔑.
In Rails 5.2 you should already have an
app/config/storage.yml file. We want ActiveStorage to make use of the newly created AWS credentials, and this is the place to do it. After editing, the file should look similar to the following:
We are not done yet 😒. We must tell Rails to use ActiveStorage, so we need to open the
production.rb file and edit/add this line:
config.active_storage.service = :amazon
And finally we are finished with all the configurations involving ActiveStorage🤩.
Step A.4 Migration Rake Task
Most of the information/tutorials that we’ve found on the web do this directly in a Rails migration, but as discussed, this can be a long running action, so we moved it into a rake task:
- There are multiple ways to execute this rake task, some of which include running it through background jobs with Sidekiq for example, or Ruby threads. At Sortlist, an unwritten internal rule we try to abide by is that we don’t really want to fill Redis with these kinds of long running tasks, so we went with the second option: Ruby threads. As you may know, there are certain situations where Ruby threads can be used and this is one of them. Because going into detail for the above options is beyond the scope of this article and for the sake of simplicity, we will keep a simple version of the rake task. We can discuss it in a future article if you are interested 😉.
- Firstly, we gather all the models (with the exception of abstract classes) that are used in our application into an array.
- Secondly, we iterate over the array and check if the schema of the model contains a
column_namematching the Regex containing
file_nameand if it does, we save them into an array. For example a model can have a
- The next step is skipped if the model doesn’t have any columns that match the above example 👍🏼.
- For the found columns we iterate over them and create an ActiveStorage record only if ActiveStorage does not contain that record. The reason for this is that if for some reason we cancel the rake task or it crashes, we can restart it and it will continue from where it left off, saving us time in the end (if you have hundreds of thousands of attachments).
- The code for
:duplicate_active_storagefollows below and I will also explain the use of the
Step A.5 ActiveSupport Concern
duplicate_active_storage.rbconcern is the one being used by the previous rake task to create the ActiveStorage attachment. This concern will also be included (
include DuplicateActiveStorage) in the models that contain attachments (those models which are configured to use Paperclip).
- In the previous rake task, we set the instance variable
@is_migrationto make sure that we don’t trigger an
after_commitcallback after creating the ActiveStorage attachment (which will result in an infinite loop 😱).
duplicate_active_storagemethod gathers the attachment columns of the models for which we want to duplicate the Paperclip attachments into ActiveStorage, and for each one it creates the ActiveStorage attachment based on some conditions:
Conditions for creation:
- If the record was updated, we check if the Paperclip attachment was updated. If it was updated, then we also update the ActiveStorage attachment. If the attachment wasn’t updated, we can skip all actions related to ActiveStorage. This happens in case of an
- In case of an
after_destroycallback, we check that the instance of the model was not deleted (we use the
acts_as_paranoidgem for soft deletion of records, hence the
try(:deleted?)check, because we are paranoid 🤥). If the record was deleted, we will also remove it from ActiveStorage.
- The rest of the logic of creating an ActiveStorage attachment is standard and is very similar to all the other tutorials on the web regarding this topic, but for completeness, I’ll briefly sum it up: so what is actually happening is that we are constructing the path for the direct URL of the attachment using the
keymethod. This is actually the direct URL of the Paperclip attachment (the one from the old S3 bucket). We then pass on this direct URL to ActiveStorage, which will first download it and then upload it into the new configured S3 bucket. The downloading is necessary, because ActiveStorage does not know how to represent the attachment, so it needs to analyse it first. We then create the associated polymorphic ActiveStorage attachment record and we are good to go ☀️.
Step A.5 Update the storage_url of the ActiveStorage blob
After the attachment has been attached, we have to figure out what to do with the custom column that we created earlier (
storage_url). The most intuitive way to update that column is to create a callback on the
ActiveStorage::Blob model. We tried that first, but it seemed that nothing happened. After some research into the problem, we realised it was not easily achievable and we would have to monkey-patch 🐒 ActiveStorage (source).
Monkey-patching is generally a bad idea and is recommended only if you really know what you are doing. ActiveStorage is quite new and we are expecting a lot of improvements in the near future, so we decided against it.
But there is some good news ahead 😊. We were already making use of the Observer pattern quite a lot in our app so we gave it a try and we have a winner! 🍾 The result being:
The observer pattern is implemented in our project by making use of the Rails Observer gem. More information can be found in the associated links.
Hidden ActiveStorage (A) conclusion:
With the above code in place, thanks to the concern that we’ve created, no matter which action is taken (create/update/destroy) through Paperclip, it will also be handled by ActiveStorage in the background. In addition, the
ActiveSupport::Concern is used by the rake task to migrate the old attachments (in our case it took about 2–3 days which includes some minor fixes we had to do along the way). So basically we achieved DRY code and no downtime (our initial goal for this step of the process⚽️).
Let’s move on to ActiveStorage Rollout (B) 🎊.
Step B.1 Remove Paperclip
Now that the data is ready, thanks to the first step, we can completely remove Paperclip and rollout to ActiveStorage as the main and only way of storing attachments. This step is actually quite short and straightforward. The steps that we need to take here are:
- Remove Paperclip gem from the
Gemfileand run a
- Remove any Paperclip configurations from
development.rbor other environments you may have.
- Change any references of
has_attached_filefrom your models to the ActiveStorage equivalent (e.g.
- Make sure you update the model validations to make use of the new version from the
active_storage_validationsgem we have installed earlier.
- Create a migration to drop all the Paperclip columns from your models. For example:
- At this point we can also remove the rake task, the
duplicate_active_storage.rbfile and the
includes DuplicateActiveStoragefrom the models in which it was being used.
And this will conclude our journey regarding the migration from Paperclip to ActiveStorage. 🥳
Hopefully this made sense to you, but feel free to leave any comments if you think that some points may need further explanations or if anything is unclear.
Interested in joining the product & engineering team of a fast-growing tech start-up and you want to use top-notch technology? Sortlist is always looking for talented people to join us in Belgium or Romania. Check out our positions here: