I had put off proper register allocation until now, and somehow the 100 commits it took verifies that i wasn’t completely wrong. I think there are quite a few differences to the normal problem and solution, to warrant an explanation, so here it goes: How RubyX does Static Single Assignment and Register Allocation in a ruby compiler.
Simple register allocation, the old way
To understand the problem, i’ll explain the previous, much simpler, approach and it’s problems. Then i’ll explain the new way, with reference to what i read the “normal” way is and the differences i think we have.
Simple register allocation is quite simply hard-coding at least some register names into the generation of assembler-like code, and having very simple ways to deal with the rest. In RubyX the assembler-like layer of the code is called Risc, and that is basically a simplified ARM. ARM has registers from r0 through to r15.
Early on i made two decisions, the current Message object would live in r0. This was one constant hard-coded throughout. The other decision was to design Slot level instructions such, that each instruction had use of all registers. To use registers i implemented a simple stack, but that was reset after every Slot level instruction.
Motivation for change
There are two major problems with the simple solution outlined above. The first is that it is sub-optimal now, the second that it is restrictive in the future.
The current problem, is actually many-fold. There is the obvious difficulty of keeping track of registers as they are used and returned. While i thought that the by hand approach was not too bad, after finishing this work, i checked the automatic way uses only half the amount of registers. This is nevertheless peanuts compared to the second problem, which means that the code is forever locked into a subset of the SlotMachine, ie no optimisations can be done across SlotMachine Instruction borders. That is quite serious, and sort of binds in with the third problem. Namely there were some implicit register assumptions being made, but there were implicit, ie could not be checked and would only show up if broken.
But the real motivation for this work came from the future, or the lack of expandability with this approach. Basically i found from benchmarking that inlining would have to happen quite soon. Even it would only be with more Macros at first. Inlining with this super simple allocation would not only be super hard, but also much less efficient than could be. After all there are 10+ registers that one can keep things in, thus avoiding reloading, and i already noticed constant loading was a thing.
SSA and Register Allocation
So then we come to the way that it is coded now. The basic idea (followed by most compilers) is to assign new names for every new usage of a register. I’ll talk about that first. Then the second step is something called liveliness analysis; basically determining when register are not used anymore, and the third is the allocation.
Static Single Assignment
A Static Single Assignment form of the Instructions is one where every register name is assigned only once. Hence the Single Assignment. The static part means that this single assignment is only true for a static analysis, so at run time the code may assign many times. But it would be the same code doing several assignments.
SSA is often achieved by first naming registers according to the variables they hold, and to derive subsequent names by increasing an subscript index. This did not sound very fitting. For one, RubyX does not really have “floating” variables (that later may get popped on some stack), rather every variable is an instance variable. Assigning to a variable does not create a new register value, but needs to be stored in memory. In a concurrent environment it is not safe to bypass that.
New variables may be created by “traversing” into a instance of the object (if type is known off course). This lead to a dot syntax naming convention, where almost every variable starts off from message or as a constant, eg “message.return_value”. This is as “single” as we need it to be (i think), and implementing this naming scheme was about half the work (more than half of that in tests, where register names were and are checked).
The great benefit from this renaming of registers is that even the risc code is now quite readable, which is great for debugging and tests. This is because the registers now have meaningful names (instance variable names), and it is always clear what a register is used for.
The next big step was to determine liveliness of registers. This is something i have not found good literature on, and in documents about Register Allocation it is often taken as the starting point.
Basic reasoning lead me to believe that a simple backward scan is at least a safe estimate. If you imagine going trough the list of instructions and marking the first occurence of any register use. By going backwards through the list you thus get the last usage, and that is the point where we can recycle that register.
I spent some time trying to figure out if backward branches changes the fact that you can release the register, but could not come to a conclusion (brain melt every time i tried). Intuitively i think that you can, because on the first run through such a loop you could not use results from a register that, because of the ssa, would have had to be created later, but there you go. Even rereading that hurts. My final argument was that a backward jump is a while loop, and a ruby while loop would have to store its data in ruby variables and not new registers, or so i hope).
I did read about phi nodes in ssa and i did not implement that. Phi nodes are a way to ensure that different branches of an if produce the same registers, or the same registers are meaningfully filled after the merge of an if. My hope is that the ruby variable argument from above gets us out of that, and for some risc functions i added some transfers to ensure things work as they should.
The actual Register Allocation is not substantially more complicated now, than before. But there is a good base now to make more analysis and optimisations.
So basically we go through the instruction sequence and assign registers in order. But because of the liveliness analysis, we can release registers after their last use, and reuse them immediately off course. I noticed that this results in surprisingly many registers being used only for a single instruction. And the total number of registers used went down by half.
As i said that this was mostly for the future, what is the future going to hold? Well, inlining is high up, that’s for sure.
But also there is something called escape analysis. This essentially means reclaiming objects that are created in a method, but never get passed out. A sort of immediate GC, thus not only saving on GC time, but also on allocation. Mostly though it would allow to run larger benchmarks, because there is no GC, and integers get created a lot before a meaningful number of milliseconds has elapsed.
On the register front, some low hanging fruits are redundant transfer elimination and double load elimination. Since methods have not grown to exhaust registers, unloading register is undone and thus is looming. Which brings with it code cost analysis. So much more fun to be had!!
I am happy to announce that RubyX is part of Rails Girls Summer of Code and some interest is being show. Since i have enjoyed my last RGSoC summer, i am looking forward to some mentoring, and outside participation.
Originally published at http://ruby-x.org.