A Tutorial Introduction to Michelson & Tezos, Part I: Hello Michelson

Michelson is the smart contract language of the Tezos blockchain. Roughly speaking, Michelson is to Tezos as the Ethereum Virtual Machine is to Ethereum. Both languages are stack-based, which means that computation is performed by mutating a sequence of data elements (the stack) according to some sequence of instructions (the program).

The most notable difference between Michelson and EVM is that Michelson is written in a human-legible text format, whereas EVM operations are represented as bytes. For example, if you look up the opcode table for the EVM you’ll see that the opcode 01 takes two numbers (off the top of the stack) and adds them together. The equivalent operation in Michelson is written as ADD.

(To be fair, the addition opcode 01 in the EVM has ADD as an intermediate mnemonic representation)

In the above image, we have a stack that we can write out as

20 : 7 : 13 : 45 : []

where : is our element separator and [] indicates the bottom of the stack.

In the illustration we apply the operation ADD to the stack, which has the following definition:

ADD / a : b : S => (a + b) : S

In plain English, this definition says “The ADD operation removes the top two elements of the stack (a and b in the definition) and then puts the element (a + b) back onto the top of the stack:

ADD / 20 : 7 : 13 : 45 : [] => 
(20 + 7) : 13 : 45 : [] =>
27 : 13 : 45 : []

All computation with Michelson works similarly based on this process of stack mutation. We used ADD in the above example, but we could also mutate the stack with other arithmetic operations like subtraction or multiplication, or logical operations like NOT, AND, OR. We can directly manipulate the stack by explicitly pushing data onto it, or by swapping around or duplicating elements. We have control flow structures like LOOP or IF. We can perform some cryptographic operations like hashing or checking signatures, and we can interact with the blockchain by initiating token transfers or creating accounts. Michelson has lot of different operations.

The second major difference between Michelson and EVM, is that Michelson data elements are typed.

Broadly speaking, a type is a piece of information that constrains the possible things that can be done with a given data value.

If the value 1 has the type int (for integer), then we know that we can safely perform numerical addition on it, but that we can't safely perform list indexing.

For the value "foobar" with type string, the situation is reversed. Adding a number to a list 1 + "foobar" is not well-defined, because addition is an operation on integers (in most languages, some languages overload the + operator to mean numerical addition when its arguments are numbers and concatenation when its arguments are strings)

Types are useful because they allow the Michelson interpreter to exclude programs that may have problematic behaviour. For example, with the natural number type, nat, attempting to subtract a larger nat from a smaller nat like 4 - 5 will result in an error. The programmer then can determine during testing whether this error is the result of an undesired operation, incorrect arguments, or whether the type signature of the values ought to be changed. The crucial thing is that this error occurred early on and prevented the program from running at all, rather than for the possible bug to slip past testing unnoticed only to cause issues later on in production.

Generally speaking, types allow the programmer to communicate their intentions in more detail to the machine, and allows the machine to communicate to the programmer when execution deviates from those intentions.

Okay, now that we’ve covered a little bit of the theory of how Michelson works, let’s get our hands dirty with some code.

The easiest way to play with Michelson is to install the Tezos client from using docker. In your terminal type the following commands:

$ wget https://gitlab.com/tezos/tezos/raw/alphanet/scripts/alphanet.sh
$ chmod +x ./alphanet.sh
$ ./alphanet.sh start

This should spit out a bunch of output that you can mostly ignore for now. The important thing is that packaged within the Tezos client is a Michelson runtime we can use to test our programs.

Alternatively, you can build Tezos from source by following the instructions in the Tezos Docs

Open up your favorite editor and write the following program helloTezos.tz in the same directory you put the alphanet.sh script.

# helloTezos.tz
parameter unit;
storage string;
code {DROP;
PUSH string "Hello Tezos!";
NIL operation; PAIR;};

First we’re going to check that the script is well-typed:

$ ./alphanet.sh client typecheck script container:helloTezos.tz

We can see more information emitted by the typechecker by adding the --details flag:

$ ./alphanet.sh client typecheck script container:helloTezos.tz --details

This program should typecheck, but if it doesn’t, possible reasons are if the file was copied incorrectly, or if there have been breaking changes to the Michelson semantics since this document was published.

Now that we know the program typechecks, we’re going to run it. The command for telling the tezos client to run a Michelson program (in a sandbox) is:

$ ./alphanet.sh client run script <path> on storage <data> and input <data>

where <path> is the path to the program source (since we're using docker this will be prepended with container:), and <data> is some Michelson value.

We’ll go over what storage and input mean below. For now, try running:

$ ./alphanet.sh client run script container:helloTezos.tz on storage '""' and input Unit

This should return:

storage "Hello Tezos!" emitted operations

Congratulations, you’ve just run your first smart contract in Michelson!

Now let’s get into the details of how the contract works:

All Michelson smart contracts are functions that take two arguments, an input parameter and a storage value, and return a pair of a list of operations and a storage value. The storage value is effectively a return value, and the list of operations are like a continuation, if, for example, the contract was called by another contract.

There are many different ways to notate types signatures but here’s what the Michelson type signature of a contract looks like:

lambda (pair 'parameter 'storage) (pair (list operation) 'storage)

Personally, I prefer type notation more like Haskell’s, but both signatures are equivalent:

contract :: (Parameter p, Storage s) -> ([Operation], Storage s)

Let’s take another look at helloTezos.tz:

# helloTezos.tz
parameter unit;
storage string;
code {DROP;
PUSH string "Hello Tezos!";
NIL operation; PAIR;};

The parameter unit and storage string lines specify the types of the contract's two arguments. If we concretize the above signature for the general type of Michelson contracts, with 'parameter as unit and 'storage as string, we get the type of our specific helloTezos.tz contract:

lambda (pair unit string) (pair (list operation) string)

The initial stack of a Michelson contract is its argument pair (pair 'parameter 'storage) so helloTezos.tz starts with a stack of type:

:: (pair unit string) : []

At the command line we ran

$ ./alphanet.sh client run script container:helloTezos.tz on storage '""' and input Unit

'""' is the command-line syntax for "", the empty string and Unit is the data constructor of the single inhabitant of the unit type. Be advised that input and parameter are synonymous here. So our initial stack has the concrete value:

(Pair Unit "") : []

which has type

:: (pair unit string) : []

The distinction between Pair and pair is that Pair is a data constructor and pair is a type. Data constructors in Michelson begin with an initial capital, whereas types are all lowercase.

Now we have our initial value, and our return type, we can think about the the execution of our contract as a sequence of operations that transform our initial value into a return value that matches our return type.

In other words, we started with

(Pair Unit "") : [] :: (pair unit string) : []

and we want to end up with

??? :: (pair (list operation) string) : []

Fortunately, our contract is pretty short at only 4 operations, so we can walk through the steps of this transformation here. The operations are written after the code heading in helloTezos.tz:

# helloTezos.tz 
code {DROP;
PUSH string "Hello Tezos!";
NIL operation; PAIR;};

If we write down this operation sequence and the initial values and types from the previous section, we get the full state of our Michelson stack machine:

STATE 
code DROP; PUSH string "Hello Tezos!"; NIL operation; PAIR;
stack (Pair Unit "") : []
type :: (pair unit string) : []

The DROP operation removes (or "drops") the top element of the stack and has the following definition (slightly rearranged from the Michelson spec)

code DROP 
stack _ : S => S
type :: _ : 'A -> 'A

where _ is a wildcard matching any operations, values, or types

Applying this to our state, our new state is:

STATE 
code PUSH string "Hello Tezos!"; NIL operation; PAIR;
stack []
type :: []

The PUSH operation adds a value with a certain type onto the top of the stack and has the following definition:

code PUSH 'a (x :: 'a) 
stack S => x : S
type 'A -> 'a : 'A

Our concrete instruction is PUSH string "Hello Tezos!", so the transformation is concretized as:

code PUSH string ("Hello Tezos!" :: string) 
stack S => "Hello Tezos!" : S
type 'A -> string : 'A

which when applied gives us a new state:

STATE 
code NIL operation; PAIR;
stack "Hello Tezos!" : []
type :: string : []

The NIL operation adds an empty list of a certain type onto the top of the stack and has the following definition:

code NIL 'a 
stack S => {} : S
type 'A -> list 'a : 'A

which when applied gives us a new state:

STATE 
code PAIR;
stack {} : "Hello Tezos!" : []
type :: list operation : string : []

The PAIR operation removes the top two elements of the stack, makes a pair of them, and pushes the pair onto the stack. It has the following definition:

code PAIR 
stack a : b : S => (Pair a b) : S
type 'a : 'b : 'A -> pair 'a 'b : 'A

which when applied gives us a new state:

STATE 
code
stack (Pair {} "Hello Tezos!") : []
type :: pair (list operation) string : []

We are now out of operations, and our stack has type pair (list operation) string : [] which is exactly what we wanted. Since the type matches our expected return type, the contract returns the values in our (pair 'operations 'storage):

storage 
"Hello Tezos!"
emitted operations

This concludes Part I of our Michelson tutorial. You should now know

  • the basics of how a stack machine works
  • how to install the Tezos client
  • how to write and execute a simple michelson contract
  • what Michelson types, values and operations are and some simple examples of each one

In Part II, we will write more contracts and go through their execution (although with less rigor), introducing more operations, types and data constructors.

Exercise 1: Modify our helloTezos.tz file to output "Hello World!" instead of "Hello Tezos. Call this file helloWorld.tz

Exercise 2: Now modify our helloTezos.tz file to take a string argument as input and output "Hello <input>". Call this file helloInput.tz You will need to know two additional operations to do this:

code CAR # select left-hand of pair 
stack (Pair a _) : S => a : S
type pair 'a 'b : 'A -> 'a : 'A
code CONCAT # string concatenate 
stack a : b => a ^ b : S
type string : string : 'A -> string: 'A
where a ^ b concatenates the end of a to the beginning of b

Exercise 3: Do the same thing as exercise 2, except now the contract takes a unit as its input parameter, and outputs "Hello " concatenated to its initial storage. So running$./alphanet.sh client run script container:hellotezos.tz on storage '"bar"' and input 'Unit'

should output “Hello bar”. Call this file helloStorage.tz

You will need to know one additional operation:

code CDR # select right-hand of pair 
stack (Pair _ b) : S => b : S
type pair 'a 'b : 'A -> 'b : 'A

Exercise 4: Write a contract that concatenates an input string and a storage string and ouputs "Hello <input><storage>". call this filehelloInputAndStorage.tz

You will need to know:

code DUP # Duplicate the top of the stack
stack x : S => x : x : S
type :: 'a : 'A -> 'a : 'a : 'A
code DIP ops # Runs code protecting the top of the stack
stack x : S => x : S'
where ops / S => S'
type :: 'b : 'A -> 'b : 'C
iff ops :: [ 'A -> 'C ]

You could also use SWAP instead of DIP

SWAP # Exchange the top two elements of the stack
stack x : y : S => y : x : S
type :: 'a : 'b : 'A -> 'b : 'a : 'A