CryptoKitties GeneScience algorithm

The Internets have been abuzz lately with CryptoKitty fever, with many of these digital cats selling for well over $100,000 USD. You can breed and sell your own cats, and after trying my hand at it, I was curious to figure out how the breeding process worked. My own curiosity can be somewhat all-consuming so my winter holiday thus far has basically consisted of staring at disassembled Ethereum bytecode until I had everything figured out.

Needless to say, I’m not the only one trying to crack the CryptoKitties code. It would be reasonable to think that knowing exactly how the genes of the matron and the sire are mixed would give you a significant leg up (spoiler: it doesn’t). And while the gene-mixing algorithm is in fact deterministic, there is still quite a bit of complexity that prevents one from gaming it.

Here is the entry point to the GeneScience contract (at address 0xf97e0A5b616dfFC913e72455Fde9eA8bBe946a2B) from the main CryptoKitties contract source code (at address 0x06012c8cf97BEaD5deAe237070F9587f8E7A266d):

// Call the sooper-sekret gene mixing operation.
uint256 childGenes = geneScience.mixGenes(matron.genes, sire.genes, matron.cooldownEndBlock - 1);

I should give credit where credit is due — while I was busy figuring out the details of the gene-mixing, it turns out Sean Soria had already posted some pseudo-code of how the algorithm works here. And, together with the work by CryptoKitty Scientist kaigani, a new science of CryptoKitties is taking shape. Both of these works should be consulted first, as I will only explain where my understanding differs from theirs.

The main source of randomness in the mixGenes function is actually calculated via the keccak_256 hash of four concatenated big-Endian uint256 numbers: the three arguments of the mixGenes function, in order, and then the BLOCKHASH corresponding to the block matron.cooldownEndBlock — 1. Bits are masked from this number to provide the “randomness” necessary for gene-swapping among dominant and recessive genes as well as for mutations of dominant genes.

The only other difference I noted from Sean’s work is that a mutation will only occur if gene1 (his notation) is less than 0x17 (decimal 23). (EDIT: I took a look again at the bytecode, and realized that if gene1 >=0x17, then a mutation can still occur with 1/2 the probability. Basically, a 3-bit number obtained from masking the above hash now has to be equal to 0 rather than 0 or 1.) I’m not 100% confident in my understanding of mutations, as I haven’t yet accumulated enough test data to include a mutation.

I wrote my code up in a Jupyter notebook which you can get here. If anyone has a bunch of test data that they know includes mutations, I would love to run it through. Or, if you want to try it yourself, please go ahead! Just let me know if you find any bugs.

In the meantime, I hope this was helpful. Here’s my address in case you want to send anything my way (kittehs, Ether, what have you…): 0xE72c2C2A86B21D251033B88d8a8cBBdF1218FfA9

mmmmrreeeowww!!!