Hello, BOB.

Bitcoin OP_RETURN Bytecode

_unwriter
14 min readAug 7, 2019

BOB (Bitcoin OP_RETURN Bytecode) is a new transaction serialization format for dealing with Bitcoin transactions, especially OP_RETURNs.

Until now all existing Planaria systems have been based on a serialization format called TXO.

BOB is a new, modified version of TXO which adopts an abstract machine metaphor, where instructions are written on tapes made up of cells, where each cell contains a single atomic procedure call.

It is a Bitcoin transaction serialization format which fundamentally stops seeing OP_RETURN data as just a dumb data feed, but as actual procedure calls in a virtual operating system.

It’s called “OP_RETURN Bytecode” because it treats OP_RETURN scripts like procedure calls, making it dead simple for you to build your OWN protocol Operating System on Bitcoin.

Here’s a preview of the new data structure:

And here’s an example query that makes use of this new schema:

And below is an example view of what a BOB structure looks like when you make above query (The first “tape” is Metalens with 2 cells, and the second one is Twetch with 3 cells):

You can try the query here:

https://bob.planaria.network/query/1GgmC7Cg782YtQ6R9QkM58voyWeQJmJJzG/ewogICJ2IjogMywKICAicSI6IHsKICAgICJmaW5kIjogewogICAgICAib3V0LnRhcGUuY2VsbC5zIjogIjFQdVFhN0s2Mk1pS0N0c3NTTEt5MWtoNTZXV1U3TXRVUjUiCiAgICB9LAogICAgInByb2plY3QiOiB7CiAgICAgICJvdXQudGFwZS5jZWxsLmxzIjogMSwgIm91dC50YXBlLmNlbGwucyI6IDEsICJ0eC5oIjogMSwgImJsayI6IDEKICAgIH0sCiAgICAibGltaXQiOiAyMAogIH0sCiAgInIiOiB7CiAgICAiZiI6ICJbLltdIHwgLm91dFswXVtdWzE6XSB8IC4gXSIKICB9Cn0=

Before we begin, here is BOB, the first Planaria machine to use this format. Play with it a bit before reading further:

BOB is a fork of Neon Genesis, so you should feel mostly right at home. It crawls the same set of transactions as Neon Genesis, but just in a different structure. The only difference is in the in and out arrays where the internal structure has been re-designed with the new philosophy, adopting the tape + cell metaphor.

Acknowledgements

Special thanks to: jolon, libitix, breavyn for the ideas, feedback, and discussions in Atlantis.

Table of Contents

This article will cover:

  1. TXO
  2. BitCom
  3. BitCom Pipeline
  4. The Need for a New Format
  5. BOB: Bitcoin OP_RETURN Bytecode
  6. BOB Planaria
  7. Conclusion

1. TXO

In order to understand why a new scheme is necessary, we first need to understand how the existing scheme works. This section will explain the current state of Planaria-related systems.

At the center of Planaria is a transaction serialization format called TXO.

TXO has been the fundamental building block for a lot of projects I’ve worked on, such as BitDB, BitSocket, Planaria, Neon Planaria, Bitbus, Eventchain, etc.

To explain briefly, the TXO format takes a raw 1-dimensional Bitcoin transaction and turns it into a 2-dimensional JSON object, which then can be stored in a document database, used as an event, or filtered with a realtime JSON stream processing library.

The main innovation with TXO is in its approach to chunking out every individual push data for indexing, instead of looking at transactions as just 1-dimensional money transfers from A to B.

The generic schema along with a matching powerful query language (Bitquery) made it attractive for usage in many Bitcoin applications.

Here’s what a TXO translated format of a Bitcoin transaction looks like:

The relevant part here is the out array. Each item in the out array represents an output script.

An output script has b prefixed attributes and s prefixed attributes. These represent their push data sequence. The b prefixed attributes represent base64 encoded version of a push data, and s prefixed ones represent UTF8 encoded version of the same push data. For example, b0 means the first (positional index 0) push data in base64 encoding, and s2 means third (positional index 2) push data in UTF encoding.

This format makes it easy to query and filter various transaction patterns on the blockchain, after which programmers can process the data to build their own state machine. Let’s say we want to query all transactions with an output script that looks like:

OP_RETURN
19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut
[DATA]
[MEDIA_TYPE]
[ENCODING]
[FILENAME]

In addition, we may want to select only the [FILENAME] field of the result set. Using Bitquery, you could simply query:

{
"v": 3,
"q": {
"find": {
"out.s1": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut"
},
"project": {
"out.s5": 1
}
}
}

You can try out the query yourself here:

https://genesis.bitdb.network/query/1FnauZ9aUH2Bex6JzdcV4eNX7oLSSEbxtN/ewogICJ2IjogMywKICAicSI6IHsKICAgICJmaW5kIjogewogICAgICAib3V0LnMxIjogIjE5SHhpZ1Y0UXlCdjN0SHBRVmNVRVF5cTFwelpWZG9BdXQiCiAgICB9LAogICAgInByb2plY3QiOiB7CiAgICAgICJvdXQuczUiOiAxCiAgICB9LAogICAgImxpbWl0IjogMjAKICB9Cn0=

This is how most Planaria systems work today.

2. BitCom

In above example, you may have noticed that the first push data after OP_RETURN is a Bitcoin address 19HxigV4QyBvetHpQVcUEQyq1pzZVdoAut. This is called a “protocol prefix”. By prepending transactions with this unique address you’re tagging them to be interpreted in a certain way. Without these prefixes it would be difficult to tell how each transaction pattern should be interpreted since every application protocol has a different rule for interpreting the output sequence.

In the past, people used to just come up with their own arbitrary prefix and share them on a centralized registry in order to avoid namespace collision. However this was too centralized and not scalable because developers had to always get approval from some committee.

This is why BitCom was born. With BitCom, you simply generate a Bitcoin address and then use that as your protocol prefix. Since randomly generated Bitcoin addresses can be trusted to be unique enough, you can use this to uniquely identify protocols. Also because it’s a Bitcoin address, you can use its matching private key to prove that you were the creator of the protocol. Lastly, by publishing a transaction that contains the Bitcom prefix, sent from from the same address, you sign and confirm the creation of your protocol on the blockchain forever.

You can learn more about BitCom here:

You can also check out a decentralized protocol directory for BitCom powered protocols here:

3. BitCom Pipeline

BitCom is a “protocol protocol”, a protocol for defining a protocol structure.

This means it’s not just about unique prefix identifiers, it can also be used to define ways multiple protocols interoperate with one another.

Back in January there was a discussion about implementing a Unix Pipeline-like structure for protocols so multiple protocols can interoperate freely.

Back then It was just an idea, but people have since picked it up and started using it, and today this type of protocol mashup has become prevalent in many Bitcoin applications.

For example posting on Twetch involves 3 protocols separated by pipes ( | ). And many other apps (such as TonicPow, whose founders actually created the Magic Attribute Protocol and Author Identity Protocol) are powered by OP_RETURN pipeline nowadays.

B:// | Magic Attribute Protocol | Author Identity Protocol

You can learn more about how it all started here:

4. The Need for a New Format

The organic adoption of BitCom pipeline was a great step forward for constructing sophisticated OP_RETURN transactions, but it also introduced a big challenge in terms of how to filter the blockchain for various transaction patterns. More specifically,

  1. BitCom was originally designed with the assumption that there is only ONE procedure call per OP_RETURN output.
  2. But with BitCom pipeline, now multiple procedure calls (such as B://, Magic Attribute Protocol, Author Identity Protocol ) can co-exist within a single “command”, binding the procedure calls together as a single atomic unit.

For example, let’s look at a simple B:// example WITHOUT a pipeline:

Thanks to its fixed structure, we can easily query, filter, and process transactions using Bitquery. Since we are looking for the address 19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut at positional index 2, we can query:

{
"v": 3,
"q": {
"find": {
"out.s2": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut"
}
}
}

And it will give us all the relevant “commands” we need in order to run a state machine.

But the BitCom pipeline changes the game. For example let’s look at another example where B:// is used used along with another protocol:

Now it’s more complicated. There are two procedure calls within a single command, concatenated with a pipe |.

How do we filter this? Previously, filtering "s2": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" was enough, but this time the "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" exists in another position in the pipeline. Now we need to also take into account "s6": "19HxigV4QyBvetHpQVcUEQyq1pzZVdoAut"

{
"v": 3,
"q": {
"find": {
"$or": [
{ "s2": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" },
{ "s6": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" }
]
}
}
}

OK good job, but what if there are three protocols, like this?

Now we have the 19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut at index 11, so I guess we add another condition to the $or, right?

{
"v": 3,
"q": {
"find": {
"$or": [
{ "s2": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" },
{ "s6": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" },
{ "s11": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut" }
]
}
}
}

But at this point we start to realize this is not sustainable, there can be too many combinations.

And this is even before we consider a much more challenging problem of dynamic-length protocols like Magic Attribute Protocol where a protocol can have variable number of arguments. With variable length protocols, we can’t even calculate the index anymore and the $or method won’t cut it at all.

TXO doesn’t have the expressive power to deal with these problems.

We need a format that’s expressive enough to build all kinds of protocol operating systems.

5. BOB: Bitcoin OP_RETURN Bytecode

BOB is a data structure schema for dealing with the challenges explained above.

BOB doesn’t look at OP_RETURN outputs as just data, but as procedure calls.

More specifically, we adopt a “positional association” approach, utilizing the first push data as the Procedure_Name, and the rest of the push data list as parameters.

So how do we implement this in Bitcoin? To understand this, let’s start from an abstract computer and build up.

If you’ve studied computer science you know that it’s possible to create a full fledged abstract computer using tapes. And that’s what we’re going to do here. We look at each Bitcoin input script and output script as a single strip of tape, from which we will run computation.

In this computation model, a “cell” is a single unit of execution. We could think of it as a single “procedure call” (such as B://, Magic Attribute Protocol, and Author Identity Protocol). Let’s look at an example:

Using the “tape” and “cell” metaphor, we have a single “tape” with 4 cells. We can visualize it like follows (The pipes are excluded from BOB):

Each cell represents a single procedure call. Just like any operating system, each process has an address. Coincidentally, BitCom literally uses an address (Bitcoin address) to represent procedure calls.

Each cell also contains additional arguments after the address and local execution scope. For example, the cell 2 has the following arguments: SET, name, and Heuristically Programmed ALgorithmic Computer.

By slightly modifying the TXO convention of b as base64 and s as UTF8, we can represent above diagram as an “abstract syntax tree” of:

tape := [
{
cell: [
{op:0,ops:"OP_0","ii":0,"i":0},
{op:106,ops:"OP_RETURN","ii":1,"i":1}
]
},
{
cell: [
{s:"19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut","ii":2,"i":0},
{b:<file-base64>,s: <file-utf8>,"ii":3,"i":1},
{b:<mediatype-base64>,s:<mediatype-utf8>,"ii":4,"i":2},
{b:<encoding-base64>,s:<encoding-utf8>,"ii":5,"i":3},
{b:<filename-base64>,s:<filename-utf8>,"ii":6,"i":4}
]
},
{
cell: [
{s:"1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5","ii":8,"i":0},
{s:"name","ii":9,"i":1},
{s:"Heuristically Programmed ALgorithmic Computer","ii":10,"i":2}
]
},
{
cell: [
{s:"15PciHG22SNLQJXMoSUaWVi7WSqc7hCfva","ii":12,"i":0},
{s:"BITCOIN_ECDSA","ii":13,"i":1},
{s:...,"ii":14,"i":2}
]
}
]

And this is the core of BOB. It stores transaction inputs and outputs in a tape + cell format. Check out this query to see the actual data:

https://bob.planaria.network/query/1GgmC7Cg782YtQ6R9QkM58voyWeQJmJJzG/ewogICJ2IjogMywKICAicSI6IHsKICAgICJmaW5kIjoge30sCiAgICAic29ydCI6IHsKICAgICAgImJsay5pIjogLTEsCiAgICAgICJpIjogLTEKICAgIH0sCiAgICAibGltaXQiOiAxMAogIH0KfQ==

There are a couple of notable changes from TXO:

  1. Each procedure locally scoped into a cell: Instead of a single globally contiguous tape, a tape is chunked up into cells, each containing a single procedure call (such as B://, Magic Attribute Protocol, or Author Identity Protocol), using pipes | as delimiters. This means you not only have the push data position within the global scope, but also the position within the local scope.
  2. Arrays instead of Attributes: No more b0, b1, etc. Instead, these attributes are stored as items under arrays named cell, which contain push data objects with b, s, etc. attributes. Because it’s effortless to run existential queries, which used to be impossible before.
  3. More compact: In TXO, there used to be an attribute called “str” which contained a full string representation of each transaction. This was redundant and I’m not aware of people ever using it, but it was taking up all the space, so this attribute has been removed.
  4. Local scope: every item has a new attribute named i, which represents the local positional index in which it appears within the parent sequence. For example, a cell item with i of 2 means it’s the third item in the cell.
  5. Global scope: a cell item also has an attribute named ii which represents the global positional index. For example, what used to be s13 with TXO format will have an ii of 13 in BOB.

And just like TXO, this data structure format can be stored in a document database as well as filtered in realtime.

6. BOB Planaria

Let’s walk through a couple of query examples to understand what this new schema enables.

1. Filter Protocol Prefix ANYWHERE in the Pipeline

Let’s say we wanted to query: “Find all BitCom transactions with the protocol prefix 1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5”

As we saw above, this becomes extremely complicated and we had to resort to $or queries with hardcoded positions which does not scale. However with BOB this query becomes extremely simple.

All you need to do is query the tape.cell array for the existence. You don’t even need to specify the position.

Here’s the query:

{
"v": 3,
"q": {
"find": {
"out.tape.cell.s": "1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5"
},
"limit": 2
}
}

Ask BOB:

https://bob.planaria.network/query/1GgmC7Cg782YtQ6R9QkM58voyWeQJmJJzG/ewogICJ2IjogMywKICAicSI6IHsKICAgICJmaW5kIjogewogICAgICAib3V0LnRhcGUuY2VsbC5zIjogIjFQdVFhN0s2Mk1pS0N0c3NTTEt5MWtoNTZXV1U3TXRVUjUiCiAgICB9LAogICAgImxpbWl0IjogMgogIH0KfQ==

This single query can find 1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5 regardless of where it shows up in the entire pipeline. Notice how the query doesn’t even include a position. It’s simply asking whether the value exists, anywhere.

2. Local Scope Positional Query

Because we have access to the positional index of each push data WITHIN its local execution scope, we no longer need to worry about global position index. All we care about is the local scope (cell) index.

If we want to search for the occurrence of “1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5” as the first item of its own cell, we simply need to filter for cells that simultaneously match i of 0 and s of 1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5, using $elemMatch.

{
"v": 3,
"q": {
"find": {
"out.tape.cell": {
"$elemMatch": {
"i": 0,
"s": "1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5"
}
}
},
"project": {
"out.tape.cell.s": 1
},
"limit": 5
},
"r": {
"f": "[ .[] | .out[0].tape[] | { cell1: .cell } ]"
}
}

Ask BOB:

https://bob.planaria.network/query/1GgmC7Cg782YtQ6R9QkM58voyWeQJmJJzG/ewogICJ2IjogMywKICAicSI6IHsKICAgICJmaW5kIjogewogICAgICAib3V0LnRhcGUuY2VsbCI6IHsKICAgICAgICAiJGVsZW1NYXRjaCI6IHsKICAgICAgICAgICJpIjogMCwKICAgICAgICAgICJzIjogIjFQdVFhN0s2Mk1pS0N0c3NTTEt5MWtoNTZXV1U3TXRVUjUiCiAgICAgICAgfQogICAgICB9CiAgICB9LAogICAgInByb2plY3QiOiB7CiAgICAgICJvdXQudGFwZS5jZWxsLnMiOiAxCiAgICB9LAogICAgImxpbWl0IjogNQogIH0sCiAgInIiOiB7CiAgICAiZiI6ICJbIC5bXSB8IC5vdXRbMF0udGFwZVtdIHwgIHsgY2VsbDE6IC5jZWxsIH0gXSIKICB9Cn0=

3. OP_RETURN vs. OP_FALSE OP_RETURN Becomes a Non-problem.

This local scoping approach gives us a cool side effect. If you use BOB, the OP_RETURN vs. OP_FALSE OP_RETURN problem becomes a non-problem.

In case you weren’t aware of this, one thing all OP_RETURN protocol developers need to keep in mind is that in 2020 the OP_RETURN goes from a meaningless opcode used mainly for attaching arbitrary data, to an actual programming language construct.

It goes back to Bitcoin’s original design, where it is now an actual “return” statement. This will open doors to many interesting features, because now functions can have return values.

But this also means there may be vulnerabilities if you don’t fully understand the implication. This is why it is recommended that developers start using OP_FALSE OP_RETURN instead of just OP_RETURN. You can read more about this here:

Anyway, if we wanted to support both the legacy OP_RETURN and the new OP_FALSE OP_RETURN options using the existing Planaria nodes such as Genesis and Babel, we would have to write an $or query that takes into account both cases (because now OP_RETURN can appear as the first opcode but also the second after OP_FALSE)

But because BOB chunks the tape out into multiple cells and keeps local scope positional index, you no longer need to worry about this, because your queries will be against local scope, instead of global scope. Also because you can query without even worrying about positional index.

To clarify, the following query will return exactly the same result no matter whether your output starts with OP_RETURN or OP_FALSE OP_RETURN because in both cases they will belong to their own cell and don’t affect the local position of “1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5” within its own cell scope.

{
"v": 3,
"q": {
"find": {
"out.tape.cell": {
"$elemMatch": {
"i": 0,
"s": "1PuQa7K62MiKCtssSLKy1kh56WWU7MtUR5"
}
}
}
}
}

4. Global positional index query

While the local positional index query is very useful, this doesn’t mean the global positional index is gone, you can still query by global index.

If you used to use the following query (look for index 1):

{
"v": 3,
"q": {
"find": {
"out.s1": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut"
},
"limit": 10
}
}

Now you can do the following, because each push data in a cell has an additional attribute named i which keeps the global positional index:

{
"v": 3,
"q": {
"find": {
"out.tape.cell":{
"$elemMatch": {
"ii": 1,
"s": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut"
}
}
},
"limit": 10
}
}

5. A dedicated human-readable OPCODE attribute

In 2020, all script limits will be gone and we will be able to finally use Bitcoin script as an actual programming language.

I am sure there will be lots of need to query and filter these NON-OP_RETURN Opcodes as well. Until now, the opcodes were stored under b attributes. This was not the cleanest design decision because this meant b variables could be used for both base64 encoded pushdata, but also to store opcodes. For example an OP_RETURN opcode may have looked like this:

{
b0: {
"op": 106
},
..
}

but at the same time, it was being used to store base64, like this:

{
b0: {
"op": 106
},
"b1": "MTlIeGlnVjRReUJ2M3RIcFFWY1VFUXlxMXB6WlZkb0F1dA",
"s1": "19HxigV4QyBv3tHpQVcUEQyq1pzZVdoAut",
..
}

This is confusing, so with this new schema, there are two additional attributes: op (for opcode number) and ops (for opcode string). Here’s an example:

[
{
"cell": [
{
"op": 0,
"ops": "OP_0",
"ii": 0,
"i": 0
},
{
"op": 106,
"ops": "OP_RETURN",
"ii": 1,
"i": 1
}
]
},
..
]

This makes it much more human readable, and also the rule is much more consistent. Now the attribute naming rule is straight forward:

  • b: base64 encoded push data
  • s: UTF8 encoded push data
  • op: opcode number
  • ops: opcode string

Today we don’t have access to many different types of Opcodes, but it’s exciting to just imagine being able to make queries like this in the near future:

{
"v": 3,
"q": {
"find": {
"out.tape.cell.ops": "OP_CODESEPARATOR"
}
}
}

7. Conclusion

Because BOB is a serialization format, and is simply too convenient for dealing with OP_RETURN scripts, you will probably start to see this show up in other future Planaria endpoints as well as other related systems such as Bitbus, etc.

This is the first version of BOB and it will keep evolving based on usage patterns and feedback, so please don’t hesitate to send your questions and feedback.

Try Bob:

Ask questions:

1. Join #planaria channel in Atlantis.

2. Find me on Twitter: https://twitter.com/_unwriter

--

--