Database Migration and Replication: On Consistency

Mind the key replication axioms

Definition of database replication consistency

Basic replication

Database consistency

Source database transactions

Database replication consistency

  1. The target database is empty
  2. The target database has some, but not all equivalent source transactions executed each exactly once in the same order starting from the first source transaction
  3. The target database has all equivalent transactions of the source database each exactly once executed in the same order starting from the first source transaction
Application of source database transactions as independent transactions to the target database

Transaction replication and execution

  1. Statements within transactions. Transactions are a series of statements like query, insert, update and delete. The order of their execution in the source database must be preserved in the target database as well since several statements can modify the same data. A different statement order within a target transaction compared to its corresponding source transactions could result in different data values, and hence violate consistency.
  2. Queries. Only the modifying statements in a transaction are relevant: insert, update, and delete (and their order). Queries (read) can be left out. This means that if source transactions contain query statements, those do not have to be executed in the target database. If the source transactions are obtained from a source database transaction log then those might not contain the queries anymore.
  • Same order of in-transaction statements
  • Optionally with the queries left out (if not already removed in the source transactions)

Key replication axioms

  • Completeness. The complete set of source database transactions starting at the first transaction must be executed as corresponding target transactions in the target database. Complete also means no transactions are left out, and that both the source and target database start from an empty state.
  • Exactly once. Each source transaction is executed exactly once as a corresponding target transaction in the target database. Exactly once means that there are no duplicate transaction executions.
  • Ordered. The sequence of target transactions is executed in the exact same order as their corresponding source transactions. This means that there is no change of the target transaction commit order in relation to the corresponding source transaction commit order.
  • Isolated. The target database must be isolated from any other possible changes except those corresponding to the source transactions. If the target database has schemas or sets of tables that are completely independent of those tables that are replicas of the source database then their concurrent modifications are possible. Those independent schemas and tables are isolated by definition.

Error semantics and behavior

  • Access of the source database system fails (might be API invocation error, or also a source database file system corruption, among other reasons)
  • Access of the target database system fails
  • The replication system itself fails
  • Recover from the failure and continue processing, ensuring that none of the replication axioms are violated. This is only possible if the replication system keeps precisely track of the replication progress to ensure that the replication axioms are upheld in all possible failure circumstances.
  • Restart from the beginning, with an empty target database and execute target transactions corresponding to the source transactions starting at the first source transaction. This is a safe option if the replication system cannot ensure that the replication axioms are upheld and not violated.

Database replication vs. database migration

  • Replicate the source database to the target database continuously following the replication axioms
  • Stop any changes to the source database
  • Wait until all source transactions are applied as target transactions in the target database following the replication axioms
  • Turn down the source database

System architecture aspects

Caveats

  • Statement replication. Some replication systems do not execute target transactions in the target database that correspond to source transactions. Instead, they insert a target transaction for each source statement (!) of a source transaction. So if a source transaction has 5 DML statements, that results in 5 transactions in the target database. This leads to continuous inconsistency since the target database is not consistent unless and only when the last of the statements of a transaction is committed.
  • No ordering. Some systems do not guarantee that the order of the target transactions corresponds to the order of the equivalent source transitions. Since a replication axiom is violated the target database cannot be guaranteed to be consistent.
  • Not exactly once or at least once. Some systems state that duplicate target transactions are possible. Since a replication axiom is violated, the target database might reject a transaction or it might lead to inconsistent data.
  • Completeness not guaranteed. Some systems do not guarantee completeness and some source transactions might not get executed in the target database in the form of an equivalent target database transaction. Since this violates the replication axioms an inconsistent target database is possible.
  • Error does not stop the system. Some systems do not stop when they encounter an error but write a log entry and continue with the next target transaction. Since completeness is violated, the target database might become inconsistent.
  • Dead letter queue. Some systems implement the replication functionality based on queuing systems. It is possible that a dead letter queue is present that contains all failed transactions. The system puts a transaction into the dead letter queue and proceeds with the next transaction. Since this violates completeness, the target database might be inconsistent.
  • Statement replication => statement collection. If it is important that the target database is always consistent in the above sense, then the replication system could collect all statements of a transaction in the correct statement order and execute the whole transaction instead of individual statements in the target database.
  • No ordering => ordering. Ordering can be established within the replication system by preserving the order the source database established. If that is not possible, the transaction order could be recreated from metadata if it is available.
  • Not exactly once => duplicate detection. A replication system in general has state and its state could establish an exactly once execution of target database transactions by keeping track of the execution progress.
  • Completeness not guaranteed => completeness verification. The source database is assumed to be complete and provides a complete sequence of source database transactions. This completeness can be preserved by the replication system by keeping track of target database transaction execution in its internal state.
  • Error does not stop the system => immediate halt on error. A non-recoverable error can immediately stop the execution of target transactions so that the replication axioms are complied with.
  • Dead letter queue => remove. A dead letter queue cannot have an entry while target transactions are being executed. As soon as a transaction fails, the system has to stop for error resolution.

Initial load

Optimization

Restart from the beginning

Summary

Acknowledgements

Disclaimer

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store