The Search For the Perfect Foot: A Tale of Database Normalization
A Tale As Old As Time (Or At Least Since 1840)
Once upon a time, the business that employed you had a beautiful glass slipper, a set of use-cases that accurately represented their business needs. The developers, hopped up on caffeine and optimism, discussed and planned meticulously for many software solutions to aid in these business needs, and after many pros-and-cons lists and some back-and-forth with their product manager, they eventually even settled on one. They found the perfect foot to fill the glass slipper.
Over time, the slipper kept cracking and expanding proportional to the needs of the business, and the once-perfect foot no longer fit the shoe. Frantic, the developers started weighing their options. Should they find another foot? Should they patch it up and pad it until it fits the shoe again? Or should they just simply convince the product manager to find a new glass slipper? If you’ve been a software developer long enough, you know that often times, it is some weird amalgamation of all three of those options — a chimera born from the same good intentions that pave the road to hell.
Just like this happens at most places, it happened to the Advertising Applications team here at GumGum, the team I work on. Our team creates a lot of the internal apps that our GumGum employees use, as well as some external-facing applications used by publishers and advertisers with whom we work. We also maintain a legacy app that used to be a monolith of sorts, doing anything from publisher setup and ad/campaign reporting, to providing engineering tools for our ad server that for all intents and purposes of this article, we shall call “Old Faithful.”
Old Faithful has been slowly pulled apart to produce a bevy of internal and external-facing apps for our various teams to use, as well as a Rest API used by said apps, and is finally on the verge of deprecation after years of tearing it apart piece by piece. The root of our problem at hand is that all of the users for this app — our “Old Faithful users” — had a very specific set of roles and permissions that applied only to this app. It used a different authentication provider than the newer apps, and the roles and permissions didn’t directly map to the behaviors our users would need in the newer apps we were creating.
On top of this, we also had an external, publisher-facing app that shared some of the same user tables as Old Faithful, but had its own route for authentication. Until recently, this was fine, as it worked and had been this way for a long time. But a recent ask was to create another external, advertiser-facing app that would need similar user roles and permissions to both our existing publisher-facing app, as well as some of the internal apps we have developed over time. Do we continue down this road, creating yet another set of tables for users and permissions for yet another app? We suddenly had a lot of feet and the glass slipper didn’t fit any of them.
We Found A Foot! (And Not In That Weird TV Crime Show Way)
Our goal then became to come up with a solution that would allow us to do two things: (1) create easily distinguishable user types, “internal” and “external” users, with the ability to tie external users to publishers or advertisers as needed, and (2) route the authentication for these users through our Advertising Authentication service, without the need for different authentication per user type. We had a new glass slipper, now we just needed a foot.
Next, we compared all of the user-based tables in our MySQL database to each other and noticed a glaring similarity — all of them had common fields like id, name, email, etc that were used in the various authentication processes. Putting all of the common data into one place and getting rid of all the duplicate data, better known as “normalizing”, was deemed the appropriate way to go after much discussion since it would suit both of the business needs described above. We found a foot!
Let’s 👏 Normalize 👏 Normalizing 👏 Database 👏 Tables
To start, it would help if we took a peek at a simplified version of our table schema prior to performing the normalization. The different colored boxes surrounding the tables represent a different authentication process for each color.
We decided to use the advertising_apps_users table to store all of the common user data, which facilitates the eventual deprecation of the columns in that table once they were are moved to user_internal_details table as shown below. We would also be able to deprecate the unused publisher_facing_app_users table once the id, name, and email are ported over to advertising_apps_users and the pub_facing_app_users.pub_facing_app_user_fields are moved to the user_external_details tables. Factoring in that we would also be deprecating Old Faithful soon, it will also allow us to deprecate the unused old_faithful_users table. What we will be left with is this:
As you can see from the colored boxes in the diagrams above, we wanted to move from multiple authentication routes to using one for all of our Advertising Applications tools. We also can clearly define our users as “internal” or “external” by simply doing an inner-join between the users and the internal or external details table, respectively. An added bonus is that the only table we need to add for the new advertiser-facing application was a small table to provide mappings from the user to the advertiser, without having to provide another table to check during the authentication process.
Moving The Duplicate User Columns AKA Migrating Our DUCs
In retrospect, the most difficult part of this whole process was actually the data migration. There was a laundry list of items we would need to take care of:
- The publisher_facing_users table data needed to be split into two tables, with the common data moved into the advertising_apps_users table. Because both tables used auto-generated ids, we had to increment the ids of publisher_facing_users by MAX(id) of the advertising_apps_users table i.e. all the new entries being inserted would increment from the next available id in the advertising_apps_users table. In the instance that the email from publishers_facing_users table already existed in the advertising_apps_users table e.g. a GumGum developer’s email used to sign in to both our internal apps and our external publisher-facing application, we would defer to the existing id in the advertising_apps_users table. This was the trickiest part because it prevented a simple increment across all of the incoming user ids.
- The uncommon publisher_facing_users table data moved into the external_apps_users_details table (mapped to the new ids from step 1).
- The publisher_user_mappings table that maps users and publishers would need to have the ids updated appropriately (mapped to the new ids from step 1).
- The uncommon data from the advertising_apps_users table needed to be moved into the internal_apps_users_details table.
To visualize what these steps do, we basically go from this:
So, picture that, except imagine it with a lot more fields per table, and about 20,000 more external users to port over. We scripted all of this work and ran it through our test database on multiple occasions, each time validating that all of our internal/external mappings were holding up appropriately. Once it ran smoothly in test and we had a chance to evaluate it, we knew we could move onto deploying these changes after ensuring coordination with other teams.
Teamwork Makes the Dreamwork
Since our API and some of the apps were referencing these tables, we had to enlist the help of everyone on our team to make the updates to the respective software. In addition, other teams, such as data engineering and our ad server team had changes to make on their end as well.
Luckily, while discussing the plan with the other teams that also use these tables, we didn’t have to stick our newfound foot in our mouths. However, it was clear that we needed a deployment strategy for these changes, since it became apparent we would need to coordinate a multi-app, multi-team release without interrupting the normal workflow of the business. To do this, we found an opportune time to set a one-hour maintenance window across our apps, allowing us to update the database schemas, migrate the data, and deploy the necessary software changes. The coordination went smoothly thanks to a dedicated Slack channel, and all of the updates/deployments were done successfully well ahead of the maintenance window end-time, so our apps were only down for a fraction of the time we originally planned for.
Time To Wrap Up This Bad Metaphor, Right?
In summary, we did the following:
- Normalized the tables that weren’t part of the deprecation of Old Faithful in hopes of provided a more centralized structure for the data needed for authentication across our various GumGum apps.
- Migrated the data from the tables and columns that went unused after the normalization process.
- Coordinated a multi-team deployment in order for these changes to take effect company-wide, ensuring that these teams/projects were all pointing to the normalized table data.
After doing the above, we now have one service providing authentication for our internal/external users across all of our Advertising Applications tools, with central place for us to also define roles/permissions (tables omitted in the diagrams). The changes went unnoticed from the users of our apps, the apps have been working smoothly since the refactor, and a we are now a few features away from deprecating Old Faithful and its related tables entirely.
For now, we have the perfect foot, and the shoe fits, but who knows how long it will take for this new glass slipper to start cracking?