Solidity CRUD- Epilogue

Published in

Rob Hitchens

9 min readMar 27, 2019

A Reusable Implementation of the popular storage pattern for Ethereum.

I first wrote about Solidity CRUD in 2017. As expected, since that time I have found a lot of use-cases that call for something like Solidity CRUD to deal with dynamic sets in one way or another.

By “dynamic sets” I refer to logical sets with members that come and go. Even though nothing is ever really deleted from Ethereum, there are plenty of cases where a contract needs a way to logically remove an item from a list.

Many nasty coding challenges boil down to focusing on getting the data structure right. This, so the function implementations flow naturally. Through my training work at B9lab I’ve found that getting a feel for storage structure is a challenge for even very experienced developers because this platform is very different from server-based storage.

Certain themes emerge with repetition:

Check existence.
Ensure key uniqueness.
Insert.
Enumerate the members.
Count.
Remove.

Solidity CRUD implements sets (tables, entities, collections if you prefer) with a three-part description:

A struct with everything but the unique identifier. This struct contains a pointer to the unordered key index. The pointer is important for the delete process.
An unordered key index with keys appended as instances (records, rows) are added. This is important for counting instances and enumerating the keys that exist in the set.
A mapping that stores the structs so random access and existence checks are possible without iteration. (We don’t like iteration. See Getting Loopy with Solidity.)

With this data structure, all the concerns mentioned above are easily handled with one-step, a.k.a. “O(1)”, complexity. You get a storage layout that looks something like this:

struct Object {
  type name;
  type name;
  type name;
  uint indexPointer;
}mapping(bytes32 => Object) objects;
bytes32[] objectList;

Unordered Sets for Referential Integrity

I wrote about using such sets in a nested way to make a set of “where used” records in Enforcing Referential Integrity in Ethereum Smart Contracts,

Suffice it to say that if a contract deals with multiple sets and they are related, things are going to get tedious. Here’s the example I gave for a pair of tables with a one-to-many relationship.

// first entity is called a "One"
    
    struct OneStruct {
        uint oneListPointer; // needed to delete a "One"
        // One has many "Many"
        bytes32[] manyKeys; 
        mapping(bytes32 => uint) manyKeyPointers;
        // more app data
    }
    
    mapping(bytes32 => OneStruct) public oneStructs;
    bytes32[] public oneList;
    
    // other entity is called a "Many"
    
    struct ManyStruct {
        uint manyListPointer; // needed to delete a "Many"
        bytes32 oneKey; // many has exactly one "One"
        // add app fields
    }
    
    mapping(bytes32 => ManyStruct) public manyStructs;
    bytes32[] public manyList;

We want Ethereum contracts to be minimalist, clean, readable and importantly, we want to reduce cognitive overhead for both developers and reviewers — simple, well-solved contracts so the workings of the contract are apparent to as many people as possible. Admittedly, nesting the pattern like that starts to push against those ideals.

The Library

Lately, I’ve been thinking about how to keep it DRY, as in “Don’t Repeat Yourself.” There is a repeating pattern here:

a set of keys, the treatment of which is completely consistent, and
an application-level set of variables that could be anything at all, and is of no consequence to the logic that deals with the keys.

We can clean this up by separating those concerns and implementing the repetitive part in a general-purpose UnorderedKeySet library.

Libraries don’t have state. They get that from the contracts that call them. When we implement the Solidity CRUD pattern using a library, our methods will be assigned to variables in the contract. If you’re unfamiliar with libraries in Solidity, have a look at the official documentation and the popular SafeMath library.

We Can Create a Set Type That Will Do Most of the Work

Wouldn’t it be great to handle Solidity CRUD sets with a few concise methods? Even better, wouldn’t it be great to know that we’re working with a well-solved module that doesn’t need customization?

Long story, short, the HitchensUnorderedKeySet rolls up all the housekeeping and integrity concerns while ignoring the application-level variables.

The library implements the Solidity CRUD pattern (I hacked out the error messages to make it little more Medium-friendly):

import "./Ownable.sol";

library HitchensUnorderedKeySetLib {
    
    struct Set {
        mapping(bytes32 => uint) keyPointers;
        bytes32[] keyList;
    }
    
    function insert(Set storage self, bytes32 key) internal {
        require(!exists(self, key));
        self.keyPointers[key] = self.keyList.push(key)-1;
    }
    
    function remove(Set storage self, bytes32 key) internal {
        require(exists(self, key)");
        bytes32 keyToMove = self.keyList[count(self)-1];
        uint rowToReplace = self.keyPointers[key];
        self.keyPointers[keyToMove] = rowToReplace;
        self.keyList[rowToReplace] = keyToMove;
        delete self.keyPointers[key];
        self.keyList.length--;
    }
    
    function count(Set storage self) internal view returns(uint) {
        return(self.keyList.length);
    }
    
    function exists(Set storage self, bytes32 key) 
        internal view 
        returns(bool) 
    {
        if(self.keyList.length == 0) return false;
        return self.keyList[self.keyPointers[key]] == key;
    }
    
    function keyAtIndex(Set storage self, uint index) 
        internal view 
        returns(bytes32) 
    {
        return self.keyList[index];
    }
}

In the simple example, the contract is just maintaining a list of keys without any additional fields:

contract HitchensUnorderedKeySet {
    
    using HitchensUnorderedKeySet... for ... KeySetLib.Set;
    HitchensUnorderedKeySetLib.Set set;
    
    event LogUpdate(address sender, string action, bytes32 key);
    
    function exists(bytes32 key) public view returns(bool) {
        return set.exists(key);
    }
    
    function insert(bytes32 key) public {
        set.insert(key);
        emit LogUpdate(msg.sender, "insert", key);
    }
    
    function remove(bytes32 key) public {
        set.remove(key);
        emit LogUpdate(msg.sender, "remove", key);
    }
    
    function count() public view returns(uint) {
        return set.count();
    }
    
    function keyAtIndex(uint index) public view returns(bytes32) {
        return set.keyList[index];
    }
    
}

To my eye, that’s a little easier to look at.

Using the Library to Hold Records

You can lay out mapped structs to hold data and it will still be easy to look at:

contract Widget {
    
    using HitchensUnordered... for ...KeySetLib.Set;
    HitchensUnorderedKeySetLib.Set widgetSet;
    
    struct WidgetStruct {
        string name;
        bool delux;
        uint price;
    }
    
    mapping(bytes32 => WidgetStruct) widgets;

Functions are intuitive enough, and since the library is overseeing key insertions and removals, most of the “checks” are done for you:

function newWidget(
  bytes32 key, 
  string memory name, 
  bool delux, 
  uint price) 
  public 
{
  widgetSet.insert(key);
  WidgetStruct storage w = widgets[key];
  w.name = name;
  w.delux = delux;
  w.price = price;
  emit LogNewWidget(msg.sender, key, name, delux, price);
}

The insert() method will fail if the key to insert already exists. As a “public service announcement”, if you’re unfamiliar with storage pointers, read this.

Simplified Expression of Relationships

The improved readability of this approach starts to shine in cases where there is a lot going on. Consider a contract with Red Teams and Blue Teams that form inside Games. Imagine each Player is a struct with properties like all-time achievement, privileges, etc. — properties that are important for contract logic.

Such a thing would have more moving parts than the simple one-to-many example, so we would expect the long-hand form of the functions to get even more busy. That’s not ideal. The library let’s us say it more concisely, which is what we want because it means less opportunity for oversight to creep in.

Consider the hypothetical structure of a Game:

HitchensUnorderedKeySet.Set gameSet;struct GameStruct {
  HitchensUnorderedKeySet.Set redTeam;
  HitchensUnorderedKeySet.Set blueTeam;
  // carry on about the game
}mapping(bytes32 => GameStruct) games;

We said a lot there.

Games are known by unique identifiers and contain two sets of players for the two teams. We can give the players the same treatment and easily enforce a rule to ensure that players need to exist before joining a Game team (if we need to). Players can be on different teams for different games at the same time.

Keeping in mind that state doesn’t need to contain data unless it’s logically important to a contract, we can consider that “Where Used” is often important. For example’s sake, let’s say we want each player to encapsulate the games participated in.

HitchensUnorderedKeySet.Set playerSet;struct PlayerStruct {
  HitchensUnorderedKeySet.Set gameSet;
  // carry on about the players
}mapping(bytes32 => PlayerStruct) players;

That was almost too easy. We now have a “Where Used” list so we can know which players must not be deleted: Two-way bindings with concise code. Nice.

By the way, since the library always uses bytes32keys (because bytes32 can hold the types we should want to use for a key) you can use helper functions to convert to/from address. The expressions are easily coded into pure functions in your contract:

function addressToBytes32(address a) public pure returns(bytes32) {
  return bytes32(uint(uint160(msg.sender)));
}function bytes32ToAddress(bytes32 b) public pure returns(address) {
  return address(uint160(uint(key)));
}

Consider using a hash function for bytes and strings if you really need to use those for keys.

Inserting Into Related Records

Let’s carry on. If we want to push a player into a team in a game and we wish to maintain the above-mentioned bi-directional pointers, it would look something like:

function joinGame(bytes32 gameId, bool team)...{
  require(playerSet.exists(msg.sender), "Register please");
  require(gameSet.exists(gameId), "Not a game");
  GameStruct storage g;
  PlayerStruct storage p;
  if(team) g.redTeam.insert(msg.sender);
  if(!team) g.blueTeam.insert(msg.sender);
  p.insert(gameId);
  emit ...
}

Did you notice the bi-directional binding? We’re populating both the game players and the player games. We also addressed an implied requirement. A player can’t join a game twice, not even on opposite teams because the last insert() would fail the uniqueness test. Pick a team, Alice!

We can easily check all sorts of things that might be important at an application level.

Are you on the red team?

require(g.redTeam.exists(msg.sender), "You're not on the red team");

How many players are on the blue team?

return g.blueTeam.count(); // how many players are there?

Who is on the blue team?

return g.blueTeam.keyAtIndex(row); // where row is a uint < count()

The higher level of abstraction means more thinking about what the application should do and less anxiety about what the contract actually does.

Delete

We can also decide what we want to do about deletes. We have some flexibility here.

In the case that a player wants to quit a team, we have to dismantle whatever relationships we assembled in the insertion process. Remove the player from the game and remove the game from the player.

Maybe something like:

function quitTeam(bytes32 gameId, bool team) public {
  GameStruct storage g = games[gameId];
  PlayerStruct storage p = players[msg.sender];
  if(team) g.redTeam.remove(msg.sender);
  if(!team) g.blueTeam.remove(msg.sender);
  p.remove(gameId);
  emit ...
}

Again, the library will revert transactions in case the input is nonsense, because it doesn’t remove() keys that don’t exist. This catches non-existent games and players who aren’t on the teams, without any help from us.

Cascade Delete

Or, we can do a cascade delete and obliterate heavy structures.

As always, it would be a bad idea to delete a master record when other records depend on it. You would want to prevent that with something like:

require(relatedSet.count() == 0, “Something depends on this.");

But, if a big structure was built up …

HitchensUnorderedKeySet.Set myBigThingSet;struct MyBigThing {
  HitchensUnorderedKeySet.Set aBigSet;
  HitchensUnorderedKeySet.Set anotherBigSet;
  HitchensUnorderedKeySet.Set yetAnotherBigSet;
  HitchensUnorderedKeySet.Set whereUsedElsewhere;
}mapping(bytes32 => MyBigThing) myBigThings;

… and you want it gone:

myBigThing storage t = myBigThings[key];
require(t.whereUsedElsewhere.count() == 0, "Relational Integrity.");
myBigThingSet.remove(key);
delete myBigStructs[key];

The remove() function ensures the key exists so we don’t have to. The first three sets could contain considerable information, and this deletes it all in one move. Although there is gas refund for removing non-zero value from storage, delete will not recurse to fully realize the potential. Have a look here for an interesting detail about garbage collection (or the lack thereof):

Is it possible to access the mapping of a deleted struct

Thanks for contributing an answer to Ethereum Stack Exchange! Please be sure to answer the question. Provide details…

ethereum.stackexchange.com

The existence of uncollected garbage doesn’t effect the nature of the logical delete, so it can be generally ignored. There may be an opportunity to be a good citizen and clean up more effectively and earn gas refunds by releasing storage.

Reset All Values in Mapping

Logical sets are useful for addressing difficult problems. For example, have a look at this question about resetting all of the values in a mapping. There is no efficient way to do that as worded, but logical sets can produce the desired effect with ease:

Reset all values in a mapping

For resetting the balance stored in mapping, you have to record the mapping entries, because you cannot use .length…

ethereum.stackexchange.com

Overwrites are Out Of Scope

Overwrites to your own mapped structs are out of scope for three reasons:

They are not about key lists, by definition.
The library would need knowledge about the layout to present an appropriate function interface and that will be different for every application.
It’s trivial to add an existence check at the application level, as shown in the updateWidget() function in the example. Remember to check exists(key). It may help to notice that if you miss that step, then your update function won’t consult the library at all. That should feel wrong.

Stay tuned for another library that will further generalize storage, including updates and application-level fields. Not the last storage solution you will ever need, but a step in that direction.

Complete Examples

The code repo contains two examples. The library itself contains a minimal contract so you can load it Remix and experiment with the raw library functions. There is also a Widget Example to show a more fleshed out implementation.

Enjoy!

Rob Hitchens is a Canadian smart contract design consultant, co-founder of Ethereum smart contract auditor Solidified.io and a courseware co-author and mentor of Ethereum, Hyperledger Fabric, Hyperledger Sawtooth Lake, Corda, Quorum and Tezos bootcamps by B9lab.