Executing Ruby program ? You need to know this.

PART 1

Arun kumar
Prod.IO
4 min readApr 25, 2017

--

How many times do you think Ruby reads and transforms your code before running it? Once? Twice? Thrice!

Whenever you run a Ruby script what happens beneath is truly amazing. Ruby rips your code apart into small pieces and then puts it back together in a different format… three times! Between the time you type “ruby” and start to see actual output on the console, your Ruby code has a long road to take, a journey involving a variety of different technologies, techniques and open source tools.

How can a computer language be smart enough to understand the
code you give it? What does this intelligence really consist of?

Let’s look at the journey your code takes…

In this post we’ll follow a journey of a simple ruby program that is lexed, parsed and compiled into bytecode. We’ll use the tools that Ruby gives us to spy on the interpreter every step of the way.

Let’s take the following ruby code and rip it apart.

10.times do |n| puts n end

Tokenizing

Before ruby interpreter can run the program it converts it from a somewhat free-form programming language into more structured data.

Ruby interpreter tries to break your program into small chunks, or tokens.

The ruby library provides a module called Ripper, that lets us process ruby code in the same way how ruby interpreter processes it

require 'ripper'
Ripper.tokenize(“10.times do |n| puts n end”)
# => ["10", ".", "times", " ", "do", " ", "|", "n", "|", " ", "puts", " ", "n", " ", "end"]

Tokenizer is completely stupid, even if you give an invalid ruby code it will still tokenize it.

require 'ripper'
Ripper.tokenize("10.times do {n} puts n end")
# => ["10", ".", "times", " ", "do", " ", "{", "n", "}", " ", "puts", " ", "n", " ", "end"]

Lexing

A lexer converts the statements in code into various categories of like key words,constants,variable etc just like identifying parts of speech in a sentence and produce token.

# => ["10", ".", "times", " ", "do", " ", "|", "n", "|", " ", "puts", " ", "n", " ", "end"]

When Ruby sees these characters, it tokenizes them. That is, it converts them into a series of tokens or words that it understands by stepping
through the characters one at a time.

Let’s use ripper to tokenize the ruby script.

require 'ripper'
require 'pp'
pp Ripper.lex("10.times do {n} puts n end")
# =>
[[[1, 0], :on_int, "10"],
[[1, 2], :on_period, "."],
[[1, 3], :on_ident, "times"],
[[1, 8], :on_sp, " "],
[[1, 9], :on_kw, "do"],
[[1, 11], :on_sp, " "],
[[1, 12], :on_lbrace, "{"],
[[1, 13], :on_ident, "n"],
[[1, 14], :on_rbrace, "}"],
[[1, 15], :on_sp, " "],
[[1, 16], :on_ident, "puts"],
[[1, 20], :on_sp, " "],
[[1, 21], :on_ident, "n"],
[[1, 22], :on_sp, " "],
[[1, 23], :on_kw, "end"]]

In the example above we are using Ripper to Lex our program, as you can see, it’s now tagging each token as being an identifier :on_ident, an operator :on_kw, an integer :on_int, etc.

There is still no real syntax checking going on at this point. The lexer will happily process invalid code.

Parsing

Once your code is converted into series of tokens, what does ruby do ? Can it directly understand and run the code by going through the tokens and execute it one by one. NO!

Ruby has a long way to go.. The next step is parsing where tokens are grouped into sentences or phrases that makes sense to Ruby.

But how will ruby understand what you are telling it with your code ? Ruby uses parser generator — takes a series of grammar rules as input that
describe the expected order and patterns in which
the tokens will appear. The most used parser generator is Yacc(Yet another Compiler Compiler) but ruby uses a new version of Yacc called Baison.

The grammar rule file for ruby is parse.y. This file defines the actual syntax and grammar that you have to use while writing a ruby program. So parse.y is heart and soul of Ruby where the language itself is defined.

Now we have the parse.y file which has the grammar/rules but how will it analyse and process the incoming tokens?

Next, it uses an LALR parser to convert the input stream of tokens into a
data structure called an abstract syntax tree.

To be continued…

Reference:

Ruby Under a Microscope — Pat Shaughnessy

--

--