Journey of adding new country code to the database

Published in

partoo

7 min readMay 29, 2024

Introduction 👋

Kosovo was assigned the ISO 3166–1 alpha-2 country code “XK” on July 3, 2019. This allowed Kosovo to have its own unique country code for various purposes such as international organizations, domain names, and telecommunications. Little did I know, my first journey at Partoo would involve embarking on a quest to incorporate this tiny two-letter, I confidently dove in, thinking it would be as easy as adding “XK” somewhere in the code. Oh, was I in for a surprise!

Confronting enums and foreign keys

As I worked on adding Kosovo’s country code to Partoo’s database for new businesses, I faced a maze of database issues. The main problem was the enums used in our database, which were spread across different tables like a tangled mess. These types, although useful, made it hard to keep everything organized. Especially tricky was dealing with foreign keys connected to the main country table, which used this enum. Adding Kosovo’s code turned out to be more complicated than expected, leading to big changes and lots of fine-tuning in our database structure.

Cracking the code

I searched through all available documentation, hoping to find a simple guide on how to update enums in the database. In our current codebase, we use Python enums to manage logical aspects of our app, these enums help in making the code more readable and maintainable by defining a set of named constants that represent various states or options. On the other hand, database enums are used as column types, they serve a similar purpose by ensuring that only a specific set of values can be stored in a column, thereby maintaining data integrity. Understanding the link between these two types of enums is crucial as it impacts how we manage consistency between our application’s logic and the underlying database schema.

Legacy: old way to update column of type enums 🕰️

In Python, updating enums is as simple as adding a new value to the existing enum class. No sweat, right? Here’s a snippet of how it’s done:

import enum

class MyEnum(enum.Enum):
    VALUE = 'value'
    CHOICE = 'choice'
    NEW_CHOICE = 'new_choice'

Creating a migration file for SQL type

Ah, but here’s where the plot thickens! Migrating enums in SQL requires a careful dance of renaming, switching, and constraint juggling. Here’s a step-by-step breakdown of the process:

Rename the old enum
Create your new enum
Drop any constraint (foreign key, index, check, …) related to the column you are going to update
Switch the two types on the column
Put back the constraint
drop the old enum

Here’s a snippet of how it looks in code:

from alembic import op
import sqlalchemy as sa
import enum

# Declare old and new enum classes, do not import them from elsewhere
class MyNewEnum(enum.Enum):
    VALUE = 'value'
    CHOICE = 'choice'
    NEW_CHOICE = 'new_choice'
class MyEnum(enum.Enum):
    VALUE = 'value'
    CHOICE = 'choice'

# Declare var for enum name
enum_name = 'my_enum_type'
tmp_enum_name = '_tmp_' + enum_name

# Declare old and new Enum columns
my_enum = sa.Enum(MyEnum, name=enum_name)
my_new_enum = sa.Enum(MyNewEnum, name=enum_name)

def upgrade():
    # 1. rename the enum type you want to change
    op.execute(f'ALTER TYPE {enum_name} RENAME TO {tmp_enum_name}')
    # 2. create the new type
    my_new_enum.create(op.get_bind())
    # 3. drop constraint, can have multiples constraints
    op.execute('ALTER TABLE table_name DROP CONSTRAINT IF EXISTS constraint_name')
    # 4. alter column to use the new enum type
    op.execute(f'ALTER TABLE table_name ALTER COLUMN column_name TYPE {enum_name} USING column_name::text::{enum_name}')
    # 5. add back constraint
    op.execute('ALTER TABLE table_name ADD CONSTRAINT constraint_expression')
    # 6. drop tmp renamed enum type
    op.execute(f'DROP TYPE {tmp_enum_name}')

def downgrade():
    # 1. deal with the new value, possible actions :
    # - delete rows with the new value
    # - switch new value to one of the old value for rows having the new value
    # - other actions 
    # 2. rename enum type
    op.execute(f'ALTER TYPE {enum_name} RENAME TO {tmp_enum_name}')
    # 3. re create old enum
    my_enum.create(op.get_bind())
    # 4. drop constraint, can have multiples constraints
    op.execute('ALTER TABLE table_name DROP CONSTRAINT IF EXISTS constraint_name')
    # 5. switch enum type
    op.execute(f'ALTER TABLE table_name ALTER COLUMN column_name TYPE {enum_name} USING column_name::text::{enum_name}')
    # 6. add back constraint
    op.execute('ALTER TABLE table_name ADD CONSTRAINT constraint_expression')
    # 7. Drop tmp enum
    op.execute(f'DROP TYPE {tmp_enum_name}')

Simplifying database updates 💡

As I embarked on the task of updating Partoo’s database to incorporate Kosovo’s country code, I encountered a formidable challenge: the presence of enums scattered across various columns. This posed a significant hurdle, not just for the current migration but also for potential future updates involving different types of enums. Our primary goal is to transition from enums to varchar types with constraints in PostgreSQL; to achieve this, we’ll perform a complete database migration and establish standards for maintaining existing columns and incorporating new ones.

Understanding the challenge

To tackle this challenge head-on, I began by identifying the columns requiring migration within our PostgreSQL database. Using a SQL query, I extracted essential details such as table names, column names, and the corresponding data types.

-- Query to identify columns using enum types
SELECT table_schema, table_name, column_name, udt_name
FROM information_schema.columns
WHERE data_type = 'USER-DEFINED'
ORDER BY table_name;

The result revealed a multitude of columns spread across numerous tables, each with its own unique characteristics and usage patterns:

table_schema  |       table_name       |  column_name  |     udt_name
--------------+------------------------+---------------+------------------
 public       | business               | country       | country_code_enum
 ...

Next, I turned my attention to identifying the enums present in the database. Another SQL query provided insights into the enum names and their respective values.

-- Query to identify enums
SELECT t.typname AS enum_name,
    STRING_AGG(e.enumlabel, ', ') AS enum_value
FROM pg_type t
    JOIN pg_enum e ON t.oid = e.enumtypid
    JOIN pg_catalog.pg_namespace n ON n.oid = t.typnamespace
GROUP BY enum_name;

The result showcased the enums and their corresponding values:

enum_name           |                        enum_value
--------------------+-----------------------------------------------------------
 country_code_enum  | AF, AL, DZ, AS, AD, AO, AI, AQ, AG, AR, ...
...

I devised a systematic approach to generate alembic migrations. Using home made Python script, I grouped columns from the same tables and those without dependencies, generating a series of small, manageable migration files. This strategy not only simplified the review process but also facilitated tracking for any regressions or issues.

Implementing the solution

The migration files followed a standardized format, with each file containing instructions to alter table columns and add constraints to mimic enum behavior. Here’s an example of how each migration file looked:

def upgrade():
    ...
    op.execute(
        "ALTER TABLE business "
        "ALTER COLUMN country TYPE VARCHAR, "
        "ADD CONSTRAINT ck_country_code_enum CHECK (country IN ('XK', 'AF', 'AL', 'DZ', 'AS', 'AD', 'AO', 'AI', 'AQ', 'AG', 'AR', ...));"
    )
    ...

def downgrade():
    ...
    op.execute(
        "ALTER TABLE business "
        "ALTER COLUMN country TYPE country_code_enum USING country::country_code_enum, "
        "DROP CONSTRAINT ck_country_code_enum;"
    )

Once all columns were successfully migrated (after days of reviewing and merging PRs — I’ll have to talk about that in another article — ), the final step involved removing the obsolete enums from the database:

def upgrade():
    op.execute(
        "DROP TYPE IF EXISTS country_code_enum, ...";
    )

def downgrade():
    pass

New standard: how to use enums in datalayer & DB?

This is a new way of how to use a varchar column with a constraint instead of using an enum directly in the database, along with Alembic migration to handle the changes.

Create and add new column

Table Definition:

class MyEnum(StrEnum):
    VALUE_1 = "VALUE_1"
    VALUE_2 = "VALUE_2"


class MyTable(Base):
    __tablename__ = "my_table"
    id = Column(String, primary_key=True)
    my_column = Column(
        Enum(
            MyEnum,
            create_constraint=True,
            name="ck_name_of_constraint",
            native_enum=False,
        ),
        nullable=False,
    )

Here, we are defining a table my_table with two columns, my_column is defined as an enum column using Enum with MyEnum as the possible values, when we use create_constraint and native_enum we implicitly create the column with varchar type and constraint.

Alembic Migration:

def upgrade():
    op.create_table(
        "my_table",
        sa.Column("id", sa.String(), nullable=False),
        sa.Column(
            "my_column",
            sa.Enum(
                "VALUE_1",
                "VALUE_2",
                name="ck_name_of_constraint",
                native_enum=False,
                create_constraint=True,
            ),
            nullable=False,
        ),
    )

def downgrade():
    op.drop_table("my_table")

Instead of using a direct enum in DB, we are opting for a varchar column with a constraint. This can have advantages in terms of database compatibility and migration flexibility, as some databases might not support enums natively. It can also provide more flexibility in handling the data.

Update existing column

If we need to update the enum values, we can do so by making changes in datalayer code directly then we need to perform migration to update the constraint.

class MyEnum(StrEnum):
   VALUE_1 = "VALUE_1"
   VALUE_2 = "VALUE_2"
   VALUE_3 = "VALUE_3"  # Add the new value

Next, you’ll need to update the constraint in the database. This is necessary because the enum values are now constrained. You can do this using the following Alembic migration script:

def upgrade():
    op.execute(
        "ALTER TABLE my_table "
        "DROP CONSTRAINT ck_name_of_constraint, "
        "ADD CONSTRAINT ck_name_of_constraint CHECK (my_column in ('VALUE_1','VALUE_2','VALUE_3'));"
    )

def downgrade():
    op.execute(
        "ALTER TABLE my_table "
        "DROP CONSTRAINT ck_name_of_constraint, "
        "ADD CONSTRAINT ck_name_of_constraint CHECK (my_column in ('VALUE_1','VALUE_2'));"
    )

Note: if the new enumeration value is longer than the existing values, you need to update the column type by changing the size (in our case, we’ll have VALUE_33 with a size of 8 characters).

op.execute(
    "ALTER TABLE my_table "
    "DROP CONSTRAINT ck_name_of_constraint, "
    "ALTER COLUMN ownership TYPE character varying(8), "
    "ADD CONSTRAINT ck_name_of_constraint CHECK (my_column in ('VALUE_1','VALUE_2','VALUE_33'));"
)

You can also calculate the 8 character value automatically by obtaining maximum size among the strings in the tuple based on their lengths:

enum_values = ('VALUE_1','VALUE_2','VALUE_33')
max_size = max(len(value) for value in enum_values)

Conclusion 🎉

By migrating database columns using enum types to varchar (string) types and adding constraints to emulate enum behavior, we simplified the migration process and ensured database consistency. This comprehensive approach not only addressed the immediate need for updating the database but also laid the groundwork for smoother migrations in the future, ensuring Partoo’s database remains agile and adaptable to evolving requirements.

I forgot to tell you that we encountered a few issues during the full migration of the database tables, a few “deadlock issues”, hmmm we’ll need another article to talk about this part and explain why the importance of generating SQL scripts as a fallback strategy for manual execution …, see you in Partoo I mean Part 2 😄