Enforcing Referential Integrity in Ethereum Smart Contracts.

One-To-Many Joins in Solidity

Code repository: https://bitbucket.org/rhitchens2/soliditystoragepatterns

Photo by fabio on Unsplash

SolidityCRUD lays out a way to organize tabular data in a Solidity Smart Contract. The pattern supports Create, Retrieve, Update and Delete (CRUD) with some features we might take for granted until we first attempt it with Solidity:

  • Count the keys and check if a key exists or not.
  • Enumerate the keys and iterate over them.
  • Fetch, update, or delete a key efficiently.
  • Ensure the keys are unique.
  • All operations are gas-efficient at all scales.

The essence of the technique is a data structure that includes structs for holding data about the object, a mapping for random access by key, and an array containing an unordered list of keys.

Mapped Structs with Pointers to an unordered list of Keys.

In Solidity …

struct AlphabetStruct {
// fields about the alphabet letters
uint index; // points to the row in the index containing the key
}
mapping(bytes32 => AlphabetStruct) public alphabetStructs;
bytes32[] public alphabetIndex;

If you’re not familiar with this structure, have a look at SolidityCRUD, especially part 2.

That covers a lot of use-cases, but what about relationships? It’s commonplace to require strictly-enforced relationships between object instances. Possibly the most common relationship is one-to-many.

  • Customers have many sales. A sale has exactly one customer.
  • Owners can have many cars. A car has exactly one owner.
  • Users can own many projects. A project has exactly owner.

If the data can be said to be “correct” and “valid” without regard to referential integrity, then you don’t need it. If you do need referential integrity then the contract must enforce it. Let’s have a look at how it can be accomplished.

“One” and “Many”

I’m going to use the terms “One” and “Many” to refer to two different types of entity (class). The “Ones” have zero to lots of “Many” and the “Many” have exactly one “One”. If that’s too abstract, think “Customer” (One) and “Invoice” (Many).

struct One {
// fields about this class
uint oneListPointer; // points to a row on the oneList
}
// random access
mapping(bytes32 => One) oneStructs; // key could be address or uint
// sequential access
bytes32[] oneList;
struct Many {
// fields about this class
uint manyListPointer; // points to a row on the manyList
}
mapping(bytes32 => Many) manyStructs; // these are for access by key
bytes32[] manyList;

Refer to the One in the Many

If you’ve ever made a simple database, you’ll know that the key pointing to the “One” goes in the “Many”.

struct Many {
// fields about this class
uint manyListPointer; // points to a row on the manyList
bytes32 oneKey; <-- here.
}

Keep a “Many” list in the “One”

For this to work, we’ll need a list of all the Many instances related to each One instance. We’re going to treat that in a manner that’s very similar to our general treatment of tables. It’s a table with one column (the Many keys) that goes inside each One instance.

struct One {
// fields
uint oneListPointer; // supports the deleteOne() function

// build a table of Many keys with a delete capability
bytes32[] manyKeys;
mapping(bytes32 => uint) manyKeyPointers;
}

Each “One” will have a list of the Many keys that refer to it.

Get all the Many instances that point to this One

The array means we can enable something very similar to:

SELECT manyKeys FROM One WHERE oneId = ...

We’ll need a function to find out how many Many keys point to a given One.

function getOneManyCount(bytes32 oneId) 
public
constant
returns(uint manyCount)
{
if(!isOne(oneId)) throw;
return oneStructs[oneId].manyKeys.length;
}

We’ll need a function to iterate over the Many keys that point to a given One.

function getOneManyAtIndex(bytes32 oneId, uint row) 
public
constant
returns(bytes32 manyKey)
{
if(!isOne(oneId)) throw;
return oneStructs[oneId].manyKeys[row];
}

The mapping supports Delete

If we ever delete one of the Many records, we’ll break referential integrity unless we remove its key from the One it was pointing to. Remember, the Ones keep track of that. We therefore need a way to remove a Many key from a One’s manyKeys array.

As we saw when deleting a record from a table (we’re treating this as a table), we can delete from the unordered list of keys efficiently when we know exactly what row the key is on. That’s what this mapping gives us.

mapping(bytes32 => uint) manyKeyPointers;

This could be a mapping to a struct if there was more than one field involved. In this case, we need the row number and nothing else, so a uint will suffice. The important thing is we can locate a key to delete in the unordered list without difficulty.

The secret to deleting a key from an unordered list is to move the last item into the row to delete.

Efficient delete from an unordered list.

SolidityCRUD is all about how that works and why it’s good.

Enforce Referential Integrity

It’s vital that referential integrity is never broken. For example:

  1. Not allowed to delete a One if any Many refer(s) to it.
  2. Not allowed to insert a Many if the One doesn’t exist. In some cases (Zero-to-Many), it will be okay to have no One (null) but it’s seldom acceptable to refer to oneID that doesn’t actually exist.
  3. If a Many is updated to a different One or if a Many is deleted then the Many key needs to be removed from the manyKeys list in the One that is no longer referred to.

Insert a Many

We insert an instance in the usual way. A Many must have a One, so the oneId is the minimum extra information we need to insert a new Many.

We also update the One record. The One records keep a list of all the Many records that point to them which includes the new Many instance.

function createMany(bytes32 manyId, bytes32 oneId) onlyOwner returns(bool success) {
if(!isOne(oneId)) throw;
if(isMany(manyId)) throw;
manyStructs[manyId].manyListPointer = manyList.push(manyId)-1;
manyStructs[manyId].oneKey = oneId;

oneStructs[oneId].manyKeyPointers[manyId] =
oneStructs[oneId].manyKeys.push(manyId) - 1;
LogNewMany(msg.sender, manyId, oneId);
return true;
}

Let’s step through it.

  1. Can’t do it unless the Many refers to a One that actually exists.
if(!isOne(oneId)) throw;

2. Can’t do it if the manyId already exists.

if(isMany(manyId)) throw;

3. Push the manyId key on to the keys list and set the internal pointer to support possible delete operations.

manyStructs[manyId].manyListPointer = manyList.push(manyId)-1;

4. The Many must have exactly one One. We know it exists because we checked.

manyStructs[manyId].oneKey = oneId;

Now we look after the pointers in the One. It’s quite similar to the usual process but it all takes place inside a One instance. It maintains a list of Many instances that point at a specific One instance.

5. Push the manyId on to the list and keep track of the row it landed on.

oneStructs[oneId].manyKeyPointers[manyId] = 
oneStructs[oneId].manyKeys.push(manyId) - 1;

6. Success! Log the state change and return true.

Delete a Many

We’ll delete the record in the usual way, removing it from the unordered list of Many keys. We’ll also track down the One record that’s affected by this. We’ll remove the Many key from the list of Many keys that refer to that One instance.

In both cases, we use the familiar technique of moving the last item in the unordered list into the row where the key to delete resides, then shortening the list. Since unrelated keys will have moved (whatever happened to be last), we’ll also need to update the relevant pointers to maintain referential integrity.

function deleteMany(bytes32 manyId) onlyOwner returns(bool success) {
if(!isMany(manyId)) throw; // non-existant key

// delete from the Many table
uint rowToDelete = manyStructs[manyId].manyListPointer;
bytes32 keyToMove = manyList[manyList.length-1];
manyList[rowToDelete] = keyToMove;
manyStructs[manyId].manyListPointer = rowToDelete;
manyList.length--;

// we ALSO have to delete this key from the list in the
// ONE that was joined to this Many
bytes32 oneId = manyStructs[manyId].oneKey;
rowToDelete = oneStructs[oneId].manyKeyPointers[manyId];
keyToMove = oneStructs[oneId].manyKeys[oneStructs[oneId].manyKeys.length-1];
oneStructs[oneId].manyKeys[rowToDelete] = keyToMove;
oneStructs[oneId].manyKeyPointers[keyToMove] = rowToDelete;
oneStructs[oneId].manyKeys.length--;
LogManyDeleted(msg.sender, manyId);
return true;
}

Example Implementation

contract Owned {
address public owner;
modifier onlyOwner {if(msg.sender != owner) throw;_;}
function Owned() {owner=msg.sender;}
}
contract OneToMany is Owned {

// first entity is called a "One"

struct OneStruct {
uint oneListPointer; // needed to delete a "One"
// One has many "Many"
bytes32[] manyKeys;
mapping(bytes32 => uint) manyKeyPointers;
// more app data
}

mapping(bytes32 => OneStruct) public oneStructs;
bytes32[] public oneList;

// other entity is called a "Many"

struct ManyStruct {
uint manyListPointer; // needed to delete a "Many"
bytes32 oneKey; // many has exactly one "One"
// add app fields
}

mapping(bytes32 => ManyStruct) public manyStructs;
bytes32[] public manyList;

event LogNewOne(address sender, bytes32 oneId);
event LogNewMany(address sender, bytes32 manyId, bytes32 oneId);
event LogOneDeleted(address sender, bytes32 oneId);
event LogManyDeleted(address sender, bytes32 manyId);

function getOneCount()
public
constant
returns(uint oneCount) {return oneList.length;}

function getManyCount()
public
constant
returns(uint manyCount){return manyList.length;}

function isOne(bytes32 oneId)
public
constant
returns(bool isIndeed)
{
if(oneList.length==0) return false;
return oneList[oneStructs[oneId].oneListPointer]==oneId;
}

function isMany(bytes32 manyId)
public
constant
returns(bool isIndeed)
{
if(manyList.length==0) return false;
return manyList[manyStructs[manyId].manyListPointer]==manyId;
}

// Iterate over a One's Many keys

function getOneManyCount(bytes32 oneId)
public
constant
returns(uint manyCount)
{
if(!isOne(oneId)) throw;
return oneStructs[oneId].manyKeys.length;
}

function getOneManyAtIndex(bytes32 oneId, uint row)
public
constant
returns(bytes32 manyKey)
{
if(!isOne(oneId)) throw;
return oneStructs[oneId].manyKeys[row];
}

function createOne(bytes32 oneId)
onlyOwner
returns(bool success)
{
if(isOne(oneId)) throw; // duplicate key prohibited
oneStructs[oneId].oneListPointer = oneList.push(oneId)-1;
LogNewOne(msg.sender, oneId);
return true;
}

function createMany(bytes32 manyId, bytes32 oneId)
onlyOwner
returns(bool success)
{
if(!isOne(oneId)) throw;
if(isMany(manyId)) throw; // duplicate key prohibited
manyStructs[manyId].manyListPointer = manyList.push(manyId)-1;
manyStructs[manyId].oneKey = oneId;

// We also maintain a list of "Many" that refer to the "One", so ...
oneStructs[oneId].manyKeyPointers[manyId] =
oneStructs[oneId].manyKeys.push(manyId) - 1;
LogNewMany(msg.sender, manyId, oneId);
return true;
}

function deleteOne(bytes32 oneId)
onlyOwner
returns(bool succes)
{
if(!isOne(oneId)) throw;
// the following would break referential integrity
if(oneStructs[oneId].manyKeys.length>0) throw;
uint rowToDelete = oneStructs[oneId].oneListPointer;
bytes32 keyToMove = oneList[oneList.length-1];
oneList[rowToDelete] = keyToMove;
oneStructs[keyToMove].oneListPointer = rowToDelete;
oneList.length--;
LogOneDeleted(msg.sender, oneId);
return true;
}

function deleteMany(bytes32 manyId)
onlyOwner
returns(bool success)
{
if(!isMany(manyId)) throw; // non-existant key

// delete from the Many table
uint rowToDelete = manyStructs[manyId].manyListPointer;
bytes32 keyToMove = manyList[manyList.length-1];
manyList[rowToDelete] = keyToMove;
manyStructs[manyId].manyListPointer = rowToDelete;
manyList.length--;

// we ALSO have to delete this key from the list in the ONE
bytes32 oneId = manyStructs[manyId].oneKey;
rowToDelete = oneStructs[oneId].manyKeyPointers[manyId];
keyToMove =
oneStructs[oneId].manyKeys[oneStructs[oneId].manyKeys.length-1];
oneStructs[oneId].manyKeys[rowToDelete] = keyToMove;
oneStructs[oneId].manyKeyPointers[keyToMove] = rowToDelete;
oneStructs[oneId].manyKeys.length--;
LogManyDeleted(msg.sender, manyId);
return true;
}

}

Many-to-Many

A Many-to-Many is often implemented with a third table and two one-to-many relationships.

Consider an arrangement with two entities; Voters and Issues.

  • Each Voter can cast a ballot on many Issues.
  • Each Issue has (obviously) many Voters.
  • Many-to-Many can be implemented as third type of entity, e.g. Vote.

You might arrive at something along these lines as a starting point:

Many-to-Many using two One-to-Many

The goal should always be to make it as simple as possible, but no simpler. If you need referential integrity, then the contract must enforce it.

This pattern addresses a worst-case scenario in which referential integrity must be enforced and delete is required for any/all entities in the data structure.

Importantly, all update operations use no iteration and minimal conditional logic. Gas prices for insert, update and delete will be predictable, efficient and consistent at any scale.

Rob Hitchens is a Canadian smart contract design consultant, co-founder of Ethereum smart contract auditor Solidified.io and a courseware co-author and mentor of Ethereum, Hyperledger Fabric, Hyperledger Sawtooth Lake, Corda, Quorum and Tezos bootcamps by B9lab.