A Tutorial Introduction to Michelson & Tezos, Part I: Hello Michelson
Michelson is the smart contract language of the Tezos blockchain. Roughly speaking, Michelson is to Tezos as the Ethereum Virtual Machine is to Ethereum. Both languages are stack-based, which means that computation is performed by mutating a sequence of data elements (the stack) according to some sequence of instructions (the program).
The most notable difference between Michelson and EVM is that Michelson is written in a human-legible text format, whereas EVM operations are represented as bytes. For example, if you look up the opcode table for the EVM you’ll see that the opcode 01
takes two numbers (off the top of the stack) and adds them together. The equivalent operation in Michelson is written as ADD
.
(To be fair, the addition opcode 01
in the EVM has ADD
as an intermediate mnemonic representation)
In the above image, we have a stack that we can write out as
20 : 7 : 13 : 45 : []
where :
is our element separator and []
indicates the bottom of the stack.
In the illustration we apply the operation ADD
to the stack, which has the following definition:
ADD / a : b : S => (a + b) : S
In plain English, this definition says “The ADD
operation removes the top two elements of the stack (a
and b
in the definition) and then puts the element (a + b)
back onto the top of the stack:
ADD / 20 : 7 : 13 : 45 : [] =>
(20 + 7) : 13 : 45 : [] =>
27 : 13 : 45 : []
All computation with Michelson works similarly based on this process of stack mutation. We used ADD
in the above example, but we could also mutate the stack with other arithmetic operations like subtraction or multiplication, or logical operations like NOT
, AND
, OR
. We can directly manipulate the stack by explicitly pushing data onto it, or by swapping around or duplicating elements. We have control flow structures like LOOP
or IF
. We can perform some cryptographic operations like hashing or checking signatures, and we can interact with the blockchain by initiating token transfers or creating accounts. Michelson has lot of different operations.
The second major difference between Michelson and EVM, is that Michelson data elements are typed.
Broadly speaking, a type is a piece of information that constrains the possible things that can be done with a given data value.
If the value 1
has the type int
(for integer), then we know that we can safely perform numerical addition on it, but that we can't safely perform list indexing.
For the value "foobar"
with type string
, the situation is reversed. Adding a number to a list 1 + "foobar"
is not well-defined, because addition is an operation on integers (in most languages, some languages overload the +
operator to mean numerical addition when its arguments are numbers and concatenation when its arguments are strings)
Types are useful because they allow the Michelson interpreter to exclude programs that may have problematic behaviour. For example, with the natural number type, nat
, attempting to subtract a larger nat
from a smaller nat
like 4 - 5
will result in an error. The programmer then can determine during testing whether this error is the result of an undesired operation, incorrect arguments, or whether the type signature of the values ought to be changed. The crucial thing is that this error occurred early on and prevented the program from running at all, rather than for the possible bug to slip past testing unnoticed only to cause issues later on in production.
Generally speaking, types allow the programmer to communicate their intentions in more detail to the machine, and allows the machine to communicate to the programmer when execution deviates from those intentions.
Okay, now that we’ve covered a little bit of the theory of how Michelson works, let’s get our hands dirty with some code.
The easiest way to play with Michelson is to install the Tezos client from using docker. In your terminal type the following commands:
$ wget https://gitlab.com/tezos/tezos/raw/alphanet/scripts/alphanet.sh$ chmod +x ./alphanet.sh$ ./alphanet.sh start
This should spit out a bunch of output that you can mostly ignore for now. The important thing is that packaged within the Tezos client is a Michelson runtime we can use to test our programs.
Alternatively, you can build Tezos from source by following the instructions in the Tezos Docs
Open up your favorite editor and write the following program helloTezos.tz
in the same directory you put the alphanet.sh
script.
# helloTezos.tz
parameter unit;
storage string;
code {DROP;
PUSH string "Hello Tezos!";
NIL operation; PAIR;};
First we’re going to check that the script is well-typed:
$ ./alphanet.sh client typecheck script container:helloTezos.tz
We can see more information emitted by the typechecker by adding the --details
flag:
$ ./alphanet.sh client typecheck script container:helloTezos.tz --details
This program should typecheck, but if it doesn’t, possible reasons are if the file was copied incorrectly, or if there have been breaking changes to the Michelson semantics since this document was published.
Now that we know the program typechecks, we’re going to run it. The command for telling the tezos client to run a Michelson program (in a sandbox) is:
$ ./alphanet.sh client run script <path> on storage <data> and input <data>
where <path>
is the path to the program source (since we're using docker this will be prepended with container:
), and <data>
is some Michelson value.
We’ll go over what storage
and input
mean below. For now, try running:
$ ./alphanet.sh client run script container:helloTezos.tz on storage '""' and input Unit
This should return:
storage "Hello Tezos!" emitted operations
Congratulations, you’ve just run your first smart contract in Michelson!
Now let’s get into the details of how the contract works:
All Michelson smart contracts are functions that take two arguments, an input parameter and a storage value, and return a pair of a list of operations and a storage value. The storage value is effectively a return value, and the list of operations are like a continuation, if, for example, the contract was called by another contract.
There are many different ways to notate types signatures but here’s what the Michelson type signature of a contract looks like:
lambda (pair 'parameter 'storage) (pair (list operation) 'storage)
Personally, I prefer type notation more like Haskell’s, but both signatures are equivalent:
contract :: (Parameter p, Storage s) -> ([Operation], Storage s)
Let’s take another look at helloTezos.tz
:
# helloTezos.tz
parameter unit;
storage string;
code {DROP;
PUSH string "Hello Tezos!";
NIL operation; PAIR;};
The parameter unit
and storage string
lines specify the types of the contract's two arguments. If we concretize the above signature for the general type of Michelson contracts, with 'parameter
as unit
and 'storage
as string
, we get the type of our specific helloTezos.tz
contract:
lambda (pair unit string) (pair (list operation) string)
The initial stack of a Michelson contract is its argument pair (pair 'parameter 'storage)
so helloTezos.tz
starts with a stack of type:
:: (pair unit string) : []
At the command line we ran
$ ./alphanet.sh client run script container:helloTezos.tz on storage '""' and input Unit
'""'
is the command-line syntax for ""
, the empty string and Unit
is the data constructor of the single inhabitant of the unit
type. Be advised that input and parameter are synonymous here. So our initial stack has the concrete value:
(Pair Unit "") : []
which has type
:: (pair unit string) : []
The distinction between Pair
and pair
is that Pair
is a data constructor and pair
is a type. Data constructors in Michelson begin with an initial capital, whereas types are all lowercase.
Now we have our initial value, and our return type, we can think about the the execution of our contract as a sequence of operations that transform our initial value into a return value that matches our return type.
In other words, we started with
(Pair Unit "") : [] :: (pair unit string) : []
and we want to end up with
??? :: (pair (list operation) string) : []
Fortunately, our contract is pretty short at only 4 operations, so we can walk through the steps of this transformation here. The operations are written after the code
heading in helloTezos.tz
:
# helloTezos.tz
code {DROP;
PUSH string "Hello Tezos!";
NIL operation; PAIR;};
If we write down this operation sequence and the initial values and types from the previous section, we get the full state of our Michelson stack machine:
STATE
code DROP; PUSH string "Hello Tezos!"; NIL operation; PAIR;
stack (Pair Unit "") : []
type :: (pair unit string) : []
The DROP
operation removes (or "drops") the top element of the stack and has the following definition (slightly rearranged from the Michelson spec)
code DROP
stack _ : S => S
type :: _ : 'A -> 'A
where _
is a wildcard matching any operations, values, or types
Applying this to our state, our new state is:
STATE
code PUSH string "Hello Tezos!"; NIL operation; PAIR;
stack []
type :: []
The PUSH
operation adds a value with a certain type onto the top of the stack and has the following definition:
code PUSH 'a (x :: 'a)
stack S => x : S
type 'A -> 'a : 'A
Our concrete instruction is PUSH string "Hello Tezos!"
, so the transformation is concretized as:
code PUSH string ("Hello Tezos!" :: string)
stack S => "Hello Tezos!" : S
type 'A -> string : 'A
which when applied gives us a new state:
STATE
code NIL operation; PAIR;
stack "Hello Tezos!" : []
type :: string : []
The NIL
operation adds an empty list of a certain type onto the top of the stack and has the following definition:
code NIL 'a
stack S => {} : S
type 'A -> list 'a : 'A
which when applied gives us a new state:
STATE
code PAIR;
stack {} : "Hello Tezos!" : []
type :: list operation : string : []
The PAIR
operation removes the top two elements of the stack, makes a pair of them, and pushes the pair onto the stack. It has the following definition:
code PAIR
stack a : b : S => (Pair a b) : S
type 'a : 'b : 'A -> pair 'a 'b : 'A
which when applied gives us a new state:
STATE
code
stack (Pair {} "Hello Tezos!") : []
type :: pair (list operation) string : []
We are now out of operations, and our stack has type pair (list operation) string : []
which is exactly what we wanted. Since the type matches our expected return type, the contract returns the values in our (pair 'operations 'storage)
:
storage
"Hello Tezos!"
emitted operations
This concludes Part I of our Michelson tutorial. You should now know
- the basics of how a stack machine works
- how to install the Tezos client
- how to write and execute a simple michelson contract
- what Michelson types, values and operations are and some simple examples of each one
In Part II, we will write more contracts and go through their execution (although with less rigor), introducing more operations, types and data constructors.
Exercise 1: Modify our helloTezos.tz
file to output "Hello World!"
instead of "Hello Tezos
. Call this file helloWorld.tz
Exercise 2: Now modify our helloTezos.tz
file to take a string argument as input and output "Hello <input>".
Call this file helloInput.tz
You will need to know two additional operations to do this:
code CAR # select left-hand of pair
stack (Pair a _) : S => a : S
type pair 'a 'b : 'A -> 'a : 'Acode CONCAT # string concatenate
stack a : b => a ^ b : S
type string : string : 'A -> string: 'A
where a ^ b concatenates the end of a to the beginning of b
Exercise 3: Do the same thing as exercise 2, except now the contract takes a unit as its input parameter, and outputs "Hello "
concatenated to its initial storage. So running$./alphanet.sh client run script container:hellotezos.tz on storage '"bar"' and input 'Unit'
should output “Hello bar”. Call this file helloStorage.tz
You will need to know one additional operation:
code CDR # select right-hand of pair
stack (Pair _ b) : S => b : S
type pair 'a 'b : 'A -> 'b : 'A
Exercise 4: Write a contract that concatenates an input string and a storage string and ouputs "Hello <input><storage>"
. call this filehelloInputAndStorage.tz
You will need to know:
code DUP # Duplicate the top of the stack
stack x : S => x : x : S
type :: 'a : 'A -> 'a : 'a : 'Acode DIP ops # Runs code protecting the top of the stack
stack x : S => x : S'
where ops / S => S'
type :: 'b : 'A -> 'b : 'C
iff ops :: [ 'A -> 'C ]
You could also use SWAP
instead of DIP
SWAP # Exchange the top two elements of the stack
stack x : y : S => y : x : S
type :: 'a : 'b : 'A -> 'b : 'a : 'A