Scala 3: A Look at “inline” (and “Programming Scala” is Now Published!)
Update April 3, 2022: I added the code examples below to the book’s code repo and did more extensive testing. See here for details.
Update May 22, 2022: Michel Charpentier correctly pointed out that the arguments don’t need to be by-name if they are inlined. This makes perfect sense if you think about it (which I didn’t 🤓), because we are no long calling the
failfunctions; they are now gone, replaced by their bodies! I’ve updated the gist and the text accordingly. Thanks, Michel!
I haven’t blogged yet about the new metaprogramming system in Scala 3, so let’s start that now. First, let’s look at the new
inline keyword, which causes the compiler to “inline” the decorated code.
Programming Scala, Third Edition is now published! It provides a comprehensive introduction to all the new features in Scala 3, while also introducing the Scala 2 features you’ll still need to know for working with an existing code base. Programming Scala, Third Edition is aimed at experienced Scala developers who want to learn what’s new, as well as professional developers getting started with Scala.
“Inlining” means that instead of generating the usual byte code for a construct, like a conditional,
val declaration, or method, the compiler inserts byte code that bypasses the overhead of the construct.
For a conditional, instead of doing the usual
if (predicate) true_stuff else false_stuff, the compiler just inserts
predicate is determined to be
true at compile time or it inserts
predicate is determined to be
inline can’t inline conditionals when the value of
predicate can’t be determined at compile time. I’ll show you an example in a moment.
val, the actual value, which must also be known at compile time, is inserted everywhere a reference to the
val is made.
For a method, the body is inlined instead of calling the method. This could add code bloat for big methods used in many places, so only inline small methods. You won’t gain much runtime performance inlining large methods anyway.
The method arguments (which can also be inlined) don’t have to be compile-time constants, but if there are type parameters (e.g.,
def foo[T](t:T): Unit), the actual types have to be known at compile time, where the method is “called”.
As an example, suppose I need an invariant checker, a tool that allows me to specify some invariant that should be true before and after some code executes. Here is a possible implement using the Scala 3 metaprogramming tools:
inline keywords. I start by importing
scala.quoted.*, then define an object to implement the invariant checker.
First, I inline a flag
ignore, which specifies whether or not to “ignore” invariant checking. This is analogous to how some of the
assert related Scala library methods worked in Scala 2, where you could disable them at runtime by passing certain flags to
scala. (At this time, this feature hasn’t been implemented in Scala 3.)
It would be convenient for the user to make this value a
var, so it can be changed dynamically at runtime, without recompilation. However, this would prevent inlining, so if you want to disable the runtime checks, you have to recompile with the value set to
If you’re playing along at home, try adding the type annotation
:Boolean to the declaration. You get this error:
[error] -- Error: .../InvariantEnabled.scala:5:21
[error] 6 | inline val ignore: Boolean = false
[error] | ^^^^^^^
[error] | inline value must have a literal constant type
The problem is that a
Boolean can have two values, but the compiler only accepts the literal type constant
false here, not
Boolean. So, you could use
false as the type here:
inline val ignore: false = false
Or, just leave off the type annotation. See my post, Scala 3: Dependent Types, Part I for more details on dependent types.
I won’t show it here, but I also defined a nearly-identical type,
InvariantDisabled, where the only difference is to declare
ignore to be
true. I’ll use both of these types in an example below.
apply and all of its parameters are defined
inline. This method, along with
fail, need to be inline for the new macro quoting and splicing to work. I’ll discuss those features in a subsequent post. For our purposes now, declaring these methods inline means that the byte code won’t contain calls to methods with these names, but instead it will contain their bodies inserted inline.
Similarly, the parameters
block will be inlined.
Now we come to the conditional,
inline if ignore then ... else ... If
true, then the byte code for the following four lines will be inserted:
if !predicate then fail(predicate, message, block, "before")
val result = block
if !predicate then fail(predicate, message, block, "after")
fail is also inlined, so the actual byte code will contain the call to
failImpl, which constructs and throws an
But what if
false at compile time? Then only the byte code for
block is inserted. Hence, there will be no runtime overhead for unused invariant checking! We’ll see this in action shortly.
I’ll discuss the rest of this example in a subsequent post that explores quoting and splicing.
Finally, here is another variant that removes most of the inlining, except where needed for quoting and splicing, but still performs invariant checking:
Note that now I need to pass the arguments
block as by-name parameters, so they are only evaluated inside the method bodies for
fail, not before calling those methods. This is not necessary when these arguments are inlined! However, even though this implementation still does invariant checking, not inlining the arguments means we won’t get the same expressions output as strings in the error messages. For example, instead of
FAILURE! predicate “i.>=(0)” you get less useful output like
FAILURE! predicate “predicate$proxy6”.
So, I claimed that a major advantage of inlining is the ability to remove whole blocks of unneeded code, if that situation can be determined at compile time. The inlined
val ignore triggers this situation. Now let’s see what sort of performance impact this has. Consider the following program:
This program accepts zero or more numbers for the number of trials to run. If none is specified, it defaults to
1000. For each
n, the program times the execution using
invariantNoInline, then prints out the times in nanoseconds and the percentages vs. what should be the fastest execution times, those for
Running this program with the arguments
10 100 1000 10000 100000, results in the following:
| N | Elapsed Times (nanos) |
| | Enabled | Disabled | E/D% | NoInline | N/D% |
| 10 | 329267 | 140683 | 234.05% | 397920 | 282.85% |
| 100 | 117403 | 33356 | 351.97% | 87826 | 263.30% |
| 1000 | 402782 | 152507 | 264.11% | 532264 | 349.01% |
| 10000 | 882765 | 220922 | 399.58% | 1941655 | 878.89% |
| 100000 | 1288324 | 1140562 | 112.96% | 1555927 | 136.42% |
The numbers can vary quite a lot from run to run, especially for larger
N where JVM hotspot optimization kicks in. However, the general trend is clear at least for smaller
N; compiling with checking disabled eliminates significant overhead and so does extensive inlining.
By the way, what happens if a check fails. If you change
!= in line 11 of
InlinePerf, you’ll get this error:
[error] ...invariantEnabled$InvariantFailure: FAILURE! predicate "thing1.label.!=("label")" failed before evaluation of block: "thing1.count = i.*(2).%(3)". Message = "".
We see an important benefit of using a macro implementation; we can compose an error message that shows the actual code for both the
predicate and the
block that triggered the failure. Note, however, that the operator notation is converted to method invocations. Still, this is very handy when debugging.
Also, recall I said that the method parameters don’t have to be compile time constants, even though we inline them and the method. Note that the
predicate we inlined is
thing1.label == "label" and the
block changes the value of
thing1.count, neither of which is constant at compile time.
You can get carried away with inlining (like anything else). Outside the context of macros, it’s best to profile your code to determine if a) you really need to improve the performance of some section of code and b) using inline actually makes a significant difference in real-world execution scenarios (recall the behavior for large
So for example, I gave up the convenience of switching invariant checking on and off at runtime by inlining the
ignore value. The performance gains were noticeable, but do they outweigh the convenience of the runtime flexibility?
See Programming Scala, Third Edition for more information about the new metaprogramming facilities and Scala 3, in general.