Getting started with Liquibase

Jean-Louis Della Giustina

Published in

Pictet Technologies Blog

7 min readNov 16, 2023

3 good practices to empower your application database lifecycle

Introduction

Having used Liquibase for 10 years on various projects deployed with CI/CD pipelines, I’ve had the opportunity to observe different applications and weigh the pros and cons of various implementation choices. When faced with challenges during app deployment, debugging, and understanding why certain changes aren’t taking effect, you start to grasp how to avoid such issues through thoughtful design.

I will present some good practices to help you start using Liquibase without second-guessing yourself.

Contexts

Occasionally, questions arise regarding the appropriate contexts to define — or not — in software development. The key to resolution lies in a thorough consideration of the distinctions across various environments. For instance, if the same database underlies all environments, differing primarily in terms of data (utilizing mocks for tests and development, and generating tables due to unavailability of certain views), I advocate for a clear differentiation between the test environment and development, integration, or production. This strategy is intricately tied to the configuration profiles of my application.

Ideally, I align context names with my application’s configuration profiles, although this isn’t always feasible. It is common practice to employ a lightweight embedded database such as h2 for tests and another database akin to the production environment — like Oracle or Postgres — for development purposes.

In such scenarios, it proves valuable to define contexts based on both the environment and the database type (e.g., h2 or Oracle). When creating a changeset or changelog, you have the option to specify the context in which it will execute. Liquibase further provides the flexibility to instruct it not to run a script if a specific context is enabled, and it supports boolean operations on contexts (AND, OR, NOT). I strongly advise minimizing the use of context negations whenever possible. Adhering to this guidance should obviate the necessity for negations.

It’s important to note that context negations introduce unnecessary complexity and noise, making it challenging to discern which scripts will (or will not) run. Remember, the human brain more easily comprehends affirmations than negations!

Changesets

The structure of your Liquibase script file is crucial. I won’t delve into what a changeset is here, but consider it as an operation on your database. Always aim for the best granularity concerning changesets.

Avoid excessive granularity by consolidating meaningful instructions within a single changeset rather than implementing one instruction per changeset. For instance, tasks such as table creation, constraint addition, index implementation, and permission granting can be encompassed within a unified changeset. This pragmatic approach facilitates swift identification of a problematic SQL script in the event of an error and streamlines the rollback process.

Did you know that in Postgres, you can rollback changes to the database structure? Liquibase paired with Postgres makes handling these changes a breeze, turning it into a powerful tool.

Significantly, employing sizable changesets introduces complexity when deciphering failed instructions. Picture revisiting code crafted six months ago — opting for clarity in this context becomes invaluable.

It’s imperative that each changeset is uniquely identified and remains unaltered after execution. The historical record of executed scripts is stored with the checksum of the file containing the changeset. Any modification to this file will prompt Liquibase to throw an error upon execution. Think of it as the immutable nature of history — you cannot alter the past or rewrite history. The only viable operation is file deletion, a practice I would caution against unless specific circumstances warrant it, details of which I won’t delve into here.

In which language should I write my statements? When dealing with databases, writing plain SQL is common. However, slight differences exist between Microsoft SQL Server, MySQL, Oracle, or Postgres databases. Anticipating this issue, if you write plain SQL targeting database X and switch to Y months later, you may encounter incompatibilities.

This issue can be avoided by describing operations in another language like YAML or XML. With this abstraction, Liquibase will then translate it into the SQL statement depending on the targeted database.

Is this a real issue? It can, if you have different databases between your test and dev environments, for example. But I confess that I’ve never switched from one database to another while in production. We were more in a “throw away” process, starting both app and database from scratch. Or, we migrated from X to Y with a migration tool, not Liquibase.

However, this is also tied to how we use Liquibase. I’ve used Liquibase as a database evolution tracking tool rather than a migration tool, although there were exceptions…

An interesting feature is available when using XML to describe your operations: autocompletion. Depending on your IDE, I personally use IntelliJ IDEA because it allows me to code in different languages with a single tool. You’ll get autocompletion and suggestions thanks to the XSD schema that describes the syntax. Autocompletion is also possible with YAML, but it requires a plugin and only follows a specific (default) file hierarchy.

See the examples:

Sample of Yaml file for the same DB operation

How should I structure the files hierarchy ?

Folders and changelogs

The way I prefer to structure the hierarchy of scripts is directly linked to how I release an application and manage version control of my source code. Like many companies, we build applications using semantic versioning.

This means the app will have X.Y.Z version, and we will deliver X at a lower frequency than Y. Every time you release your application (X or Y), perhaps by pushing the code on a dedicated branch or tagging the source code to this specific version, you should also consider your database.

The best practice I can share is to avoid having overly large folders or changelog files. In my experience, the best granularity is to have one folder per major version of the application. One parent changelog will include all child changelogs, with each child changelog containing script inclusions for a specific application version.

In fact, it becomes easier to deploy the source code of the application at a specific version and the database simultaneously for that version. For example, if you encounter a bug in a specific environment, it’s easy to deploy a testing environment matching source code and database to reproduce and fix the issue.

This approach is also beneficial when you have multiple integration environments and you’re working on different versions of the applications (and the DB). It’s common, for instance, to have different environments — one synchronized with the current release in production for bug fixing, another that represents your next release used for integration, FAT, UAT, etc.

Intent split

It’s interesting to identify the types of operations you’re performing on the database. Am I pushing some data? Am I altering the structure of a table? Am I working on a stored procedure or managing rights? Which operations are applicable in every environment, and which ones are environment-specific?

I can easily say that the structure is quite the same for all environments, the data can be the same (enums, ref labels) or totally different (mock data for test or dev environment, not in production), and the rights granted can be different too. For example, I want to allow deletion in integration but not in production, where only archiving/soft deletion is possible.

What I’m trying to convey here is the importance of splitting DDL, DCL, and DML operations into different files. It makes it easier to choose which one to run in specific environments. Most of the time, I merge DDL and DCL since they are strongly linked and less environment-specific. As a habit, I place all DML in a specific file. It’s a matter of preference, and some may choose to split DML and DDL/DCL changesets into two different folders. Personally, I don’t do this as I prefer having both files side by side. A good tip is also to always use relative file paths, something developers are accustomed to, but it’s still a good reminder!

An exception to the rule

You might be wondering about the purpose of the “runAlways” attribute in a changeset and when to use it. A changeset with “runAlways” set to true will run every time Liquibase is executed (if certain conditions are met, obviously, such as contexts, etc.). I can share a practical use case and maybe an exception to the previously explained file structure.

It involves views, applicable also to procedures and functions.

Using the view usecase, I prefer keeping views in a dedicated file. Why? Because if you don’t do this, when a development requires a change to the view, you create a new file, a new changeset, and you recreate the entire view containing the modification. It’s not a cumulative update. What if another developer makes a parallel change to the same view? They create another file that recreates the view. When the code is merged, no conflict appears, and some changes are lost as the script runs after the other one, overwriting the changes.

You now understand why, when dealing with views, I suggest keeping them in a dedicated file and always making changes to that file. This way, conflicts emerge sooner, and a discussion between developers occurs on how to apply both changes. While Liquibase documentation suggests a similar approach with the “runOnChange” attribute, I still continue to use “runAlways.”

Continuing with the view use case, the reason here is that the view is built from a table. If there is any change in the table (implying a view change), but the view script is not updated, perhaps due to an omission, it would not run, and we wouldn’t see any error before runtime. It’s also a good way to secure that part of an application, and to ensure this, I run views changesets in the last order.