Efficient Ethereum Smart Contract Storage


Reducing the cost to execute smart contracts is an important consideration when developing on the Ethereum network. Not only do low execution costs help reduce the load on the network (ahem, CryptoKitties), but it also incentivizes users to interact more freely with a smart contract. The cheaper it is to transfer tokens, set state, or otherwise modify contract data, the better the user experience will be when using a decentralized app built on the Ethereum blockchain.

However, it’s not always obvious how much of a contract’s execution cost is spent on storage. The cost to call a contract method is dependent upon the gas consumption to run the underlying code. Gas consumption is determined by what opcodes are executed by the Ethereum Virtual Machine (EVM).

Operation         Gas           Description

ADD/SUB 3 Arithmetic operation
MUL/DIV 5 Arithmetic operation
ADDMOD/MULMOD 8 Arithmetic operation
AND/OR/XOR 3 Bitwise logic operation
LT/GT/SLT/SGT/EQ 3 Comparison operation
POP 2 Stack operation
PUSH/DUP/SWAP 3 Stack operation
MLOAD/MSTORE 3 Memory operation
JUMP 8 Unconditional jump
JUMPI 10 Conditional jump
SLOAD 200 Storage operation
SSTORE 5,000/20,000 Storage operation
BALANCE 400 Get balance of an account
CREATE 32,000 Create a new account using CREATE
CALL 25,000 Create a new account using CALL

The above table shows the gas cost to execute a few of the most common opcodes. Most operations are very cheap, but it starts to get expensive quickly when interacting with contract storage (SLOAD and SSTORE), or creating new contracts and accounts (CREATE and CALL).

Although there are plenty of useful optimizations that can be applied to a variety of code patterns, for this post I’m going to focus specifically on reducing the cost and quantity of calls to store contract data (that is, the SSTORE opcode). Reducing the amount of gas spent on storage costs is especially useful for contracts that track a large amount of internal state, and contract methods that update multiple pieces of state each call.

Understanding SSTORE

The SSTORE opcode has two gas costs associated with it — 5,000 and 20,000. This indicates that it costs 20,000 gas to set a piece of storage from an empty value to a non-empty value, and 5,000 gas to set that same piece of storage from a non-empty value to another non-empty value. Note: there is indeed a refund for setting a non-empty value to an empty value.

contract Test {
uint foo;
uint bar;

function setFoo(uint _foo) public {
foo = _foo;
}

function setBar(uint _bar) public {
bar = _bar;
}
}

In the Test contract above, the first time that setFoo is called with any value, the cost to store a new value for foo is 20,000 gas. The first call to setBar will incur 20,000 gas in storage costs, even if setFoo was already called.

Note: the total execution cost of a method will be higher than just the storage cost. That is because many opcodes are being executed, even for the simple methods shown here. As seen in the examples above, storage costs often account for the vast majority of total execution cost. To see more, try running these examples at https://remix.ethereum.org with optimizations turned on.

Test.setFoo(100) // => 20134 gas
Test.setFoo(200) // => 5134 gas
Test.setBar(300) // => 20156 gas

However, compilers are smart, and can introduce valuable optimizations when possible.

contract Test {
uint128 foo;
uint128 bar;

function setFoo(uint128 _foo) public {
foo = _foo;
}

function setBar(uint128 _bar) public {
bar = _bar;
}
}

This example is almost identical to the prior one, except foo and bar are now represented by 16 byte uint128 values instead of 32 byte uint values. As expected, the first call to setFoo will cost 20,000 gas to store the new foo value.

Test.setFoo(100) // => 20404 gas

But what happens if setBar is called after a call to setFoo? In that case it will only cost 5,000 gas to store the new bar value.

Test.setBar(100) // => 5384 gas

This is because Ethereum storage slots are statically sized at 32 bytes. Most of the time a value in storage will take up an entire slot, even if it is less than 32 bytes in size. But sometimes, the compiler is able to optimize code so that two values are stored in the same slot. That’s why in this example when the storage values are typed as 16 byte integers, calling setBar after setFoo acts as if the storage slot for bar has already been initialized from an empty value to a non-empty value — because it has! foo and bar are stored in the same slot.

Considering access patterns

Structuring smart contract code so that the compiler can optimize storage is a great way to reduce the overall gas used by a contract. It’s also important to consider how the contract will be used.

contract TestArray {
uint128[256] public foos;
uint128[256] public bars;
    function set(uint8 idx, uint128 _foo, uint128 _bar) public {
foos[idx] = _foo;
bars[idx] = _bar;
}
}

In this example, users can call set with an index and two values, and the code will update the corresponding values in the two arrays at the provided index. As expected, the first time set is called, the gas cost incurred from storing the values is high. This is because the storage for foos[0] and bars[0] needs to be initialized. Subsequent calls to modify the values at the same index are cheaper.

TestArray.set(0, 100, 200) // => 40872 gas
TestArray.set(0, 300, 400) // => 10872 gas

But what happens if the contract is invoked to set the values at index 1 after index 0 has already been set?

TestArray.set(1, 500, 600) // => 10972 gas

It’s cheap — as if the storage for foos[1] and bars[1] has already been initialized. In-fact it has. The compiler decided to put two uint128 values in each storage slot. In this example, foos[0] and foos[1] occupy the same storage slot, as do bars[0] and bars[1]. So calling set(1, ...) after calling set(0, ...) will only incur storage fees for setting a non-empty value to another non-empty value, because that storage slot already contained data.

While this is a helpful optimization for the compiler to perform, it might not provide the best user experience. Think of a scenario where token 0 and token 1 are owned by separate individuals. It probably does not make sense for one token holder to pay two storage initialization fees when making a change to their token state, while the other token holder only has to pay the much lower storage modification fees.

Enter structs. By using structs a contract can group data in ways that better fit user access patterns, while still allowing the compiler to perform valuable storage optimizations.

contract TestStruct {
struct Test {
uint128 foo;
uint128 bar;
}

Test[100] public tests;
    function set(uint idx, uint128 _foo, uint128 _bar) public {
tests[idx].foo = _foo;
tests[idx].bar = _bar;
}
}
TestArray.set(0, 100, 200) // => 25720 gas
TestArray.set(0, 300, 400) // => 10720 gas
TestArray.set(1, 500, 600) // => 25720 gas

This acts as expected — expensive for the storage initialization, cheap for subsequent updates. One interesting thing to note is that the first time set is called, it only incurs 25,000 gas in storage costs. That’s because setting the foo property initializes the storage slot, while setting the bar property only modifies the storage slot (because both values are stored within the same 32 byte slot).

Packing bytes

For most smart contracts it is likely enough to use structs to benefit from compiler optimizations, while also catering to user friendly access patterns. However, in some cases it is important to even further reduce storage costs.

For example, apps that allow a user to update the state of many tokens at once might still have issues with expensive transactions without further optimization. With the struct examples above, initializing 50 tokens at once will require over 1,250,000 gas, and modifying the state of 50 tokens will require over 500,000 gas. Not only is that expensive for the user, but it might take a while to get a transaction that large into a block (the gas limit for blocks is currently hovering at right around 8 million).

Even though using structs allows the compiler to potentially store multiple values in a single storage location, each modification to a property on a struct still counts as an SSTORE operation.

contract Test {
struct TestStruct {
uint128 foo;
uint128 bar;
}

TestStruct public test;

function set(uint128 _foo, uint128 _bar) public {
TestStruct memory _test = test;

_test.foo = _foo; // SSTORE
_test.bar = _bar; // SSTORE

test = _test;
}
}

Even if the struct is loaded into memory, modified, then set back into storage with a single call, that process will still incur two SSTORE operations. Although obviously this isn’t necessary — the contract is simply updating a single 32 byte storage slot — there is no reason this has to be structured as multiple storage calls.

For contracts that really need to push the limit of optimizing storage costs, manually packing bytes might present a fairly large improvement over storing structs directly. Packing bytes allows a contract to put multiple values into a single storage slot, similar to the optimizations the compiler performs, while also reducing multiple SSTORE calls into a single call.

contract TestBytes {
struct Test {
uint128 foo;
uint128 bar;
}

bytes32[100] public testBytes;

function set(uint idx, uint128 _foo, uint128 _bar) public {
Test memory test = fromBytes(testBytes[idx]);

test.foo = _foo;
test.bar = _bar;

testBytes[idx] = toBytes(test);
}

function fromBytes(bytes32 bs) internal pure returns (Test) {
bytes16 fooBytes = bytes16(bs);
bytes16 barBytes = bytes16(bs << 128);

uint128 foo = uint128(fooBytes);
uint128 bar = uint128(barBytes);

return Test(foo, bar);
}
    function toBytes(Test test) internal pure returns (bytes32) {
bytes32 foo = bytes32(test.foo) << 128;
bytes32 bar = bytes32(test.bar);

return foo | bar;
}
}

The set method in the above contract is very similar to the previous examples. It loads a bit of storage, makes modifications, then puts the new value back into storage. The difference is that instead of loading a struct directly, the contract first loads a bytes32 value, which it then deserializes into the target struct type. By doing this, the contract has direct control over the storage operations, and can put the data back into storage by serializing the struct back into bytes. This allows for a substantial reduction in gas costs.

TestBytes.set(0, 100, 200) // => 21348 gas
TestBytes.set(0, 300, 400) // => 6348 gas

Compare that to 25,720 gas and 10,720 gas in the previous examples using just structs without byte packing. The gas cost of the initial call is dominated by the 20,000 gas to initialize the storage slot, but subsequent calls are ~40% cheaper, simply because the contract is only using SSTORE once.

The benefit of this approach is even more magnified when using smaller sized value types.

struct Test {
uint64 w;
uint64 x;
uint64 y;
uint64 z;
}
function set(uint64 _w, uint64 _x, uint64 _y, uint64 _z) public {
test.w = _w;
test.x = _x;
test.y = _y;
test.z = _z;
}

Using byte packing, the above method would experience a 75% reduction in storage costs after the first call to initialize the values.

With this in mind, contract authors would be wise to reduce the sizes of types that will be put in storage whenever possible. For example, address values are 20 bytes in size, and pair nicely with uint96 values (making for a tidy 32 bytes in total). Many contracts that expose bidding or auction interfaces store the current price of a token as a 256 bit uint value. Unless it’s expected that a token will be auctioned off for more than hundreds of trillions of ETH, it might make sense to restrict the possible ranges for a bid into a type that fits better into an existing storage slot.

struct Token {
address owner;
uint96 price;
}

This struct can be serialized into exactly 32 bytes, while still allowing a bidding price to reach over 79 billion ETH (7.9e+28 WEI). That should be plenty. Even the price of the most expensive CryptoKitty could fit easily into a 72 bit value (expressed in WEI).

Considerations

Using a byte packing approach can potentially save users on gas costs when executing contract methods, but contract authors shouldn’t necessarily default to this pattern.

Generally, it’s pretty complicated. Implementing toBytes and fromBytes for a particular struct is error prone and requires careful testing. During development when types are changing quickly, byte serialization code may break in unexpected ways, and should not be used until the final stages of contract development.

Also, it might not be necessary. Contracts that are simple tokens which transfer from one address to another might not benefit very much from this approach. While on the other hand, contracts that implement a non-fungible token interface with the ability for users to set multiple values within the contract will likely benefit greatly.

In the end, there are probably a great deal of contracts which could benefit from byte packing structs into storage slots. Not only does this present a huge incentive for users to interact with a decentralized app by lowering the cost per transaction, it also frees up more space on the Ethereum network. This in turn reduces the gas price and makes it easier for transactions to make it into blocks.

Hopefully at some point compilers can implement this efficient byte packing approach when storing structs, but until then contract developers should definitely consider using this pattern when creating gas-hungry contracts.