Behind-the scenes improvements in Swift 4.1

Swift 4.1 has been released, and as you can see from the release announcement, there are many important user-visible improvements, including conditional conformances, a new -Osize optimization mode, and a much more.

In this post, I wanted to focus on a few smaller, less visible improvements, which are nonetheless interesting and important to document. These improvements fall somewhere between new features and bug fixes. Some of them have no user-visible impact at all. Others address long-standing problems or lay the groundwork for future language enhancements.

The below descriptions will get rather technical at times, and also I’m sure some of my explanations are unclear. Diagrams would probably help in some cases. If anything doesn’t make sense, don’t hesitate to reach out; I’m @slava_pestov on Twitter.

Also, the usual disclaimer applies: this is a post on my personal blog, and only represents my own opinion, and not any kind of official communication from the Swift team or Apple.

Special declaration names

The Swift AST includes a class hierarchy modeling declarations written in the language, rooted in the Decl class. One of the most important subclasses is ValueDecl, an abstract base class which, roughly speaking, means “this declaration has a type and is referenced directly from expressions”. Examples of declarations that are referenced directly include type declarations, functions, properties, subscripts, and constructors.

Examples of declarations which are not subclasses of ValueDecl include extensions, operator precedence groups, and import.

Among the ValueDecls, some are referenced by name with member name lookup, others use a special syntax. Examples of the latter include subscripts and constructors.

In Swift 4.0, each ValueDecls name was stored as an Identifier, which is a data type that is roughly equivalent to a string, except instances are uniqued. This created problems for those kinds of declarations which are not accessed directly via member name lookup. For example, subscripts are instances of SubscriptDecl, and had the name subscript. This creates two problematic situations:

  • you could write myValue.subscript on a value myValue whose type defines a subscript
  • you could define a function named subscript and attempted to subscript your value with the special syntaxmyValue[...]

In either case, strange things would occur. What we really wanted was for member name lookup to only find actual function and property declarations, and for subscript syntax to only find bona-fide subscripts. A similar problem occurred with the special name deinit. You really don’t ever want or need to refer to a deinitializer with member dot syntax.

The solution in Swift 4.1 is that names of ValueDecls are no longer plain identifiers, but instead use a new data type called DeclBaseName. A DeclBaseName is either an Identifier, or one of several “special” names, of which Swift 4.1 defines two, one for subscripts and one for deinitializers. The special names are not equal to any identifier and cannot be written as part of member dot syntax, and instead only originate in special circumstances where name lookup is expected to produce a subscript or deinitializer.

Bridging peephole

If you’ve spent any time working with Apple’s Objective-C frameworks from Swift, you’ll notice that instead of exposing raw the Objective-C API, Swift’s ClangImporter performs some amount of translation to make the APIs feel more Swift-like. This includes translating names, and more cruicially, bridging Cocoa reference types to Swift value types. For example, APIs that take or return an NSArray are typically imported as taking or returning a Swift Array<T>, the NSURL class becomes the URL struct, and so on.

Sometimes, you want to call an Objective-C method, and pass in or receive an NSArray, without bridging. Swift casts support bridging, so you can just cast the result of your method call to NSArray:

let result = myObj.someMethod(with: arg) as NSArray

In Swift 4.0, this leads to slightly inefficient code generation; we bridge the result of someMethod(with:) from an NSArray to a Swift array, then generate the code for the cast, converting it back to an NSArray. In some circumstances the bridging is lazy in the Cocoa-to-Swift direction, but if the elements themselves must be bridged, for example when they are Cocoa types that map to Swift value types, an expensive O(n) operation must be performed.

The bridging peephole recognizes such situations and cancels out the two complementary bridging operations; the cast now has the effect of eliminating the bridging performed by default, returning the underlying NSArray as a result. The peephole is implemented in the SILGen phase of the pipeline so you get the benefit even at -Onone, and the implementation also introduced some cleanups to how various value conversions and transformations are modeled in the AST and SILGen.

Definite initialization improvements

First, some background. Definite initialization refers to a set of language rules, implemented by a special compiler pass, for enforcing that all memory locations are properly initialized before they are accessed. For example, in the following code example, the local variable x is initialized on all control flow paths, so the call to print(x) is guaranteed to not access uninitialized memory:

let x: String
if y < 0 {
x = "hello"
} else {
x = "goodbye"
}
print(x)

On the other hand, the following is invalid, because if y < 0, the string is not initialized:

let x: String
if y >= 0 {
x = "ok"
}
print(x)

In ordinary functions, definite initialization concerns itself with local variables only. In constructors, definite initialization is also tasked with ensuring that all stored properties of self are fully initialized by the time the constructor returns. This is complicated by the fact that constructors can throw errors, or “fail” by returning nil in Swift. For example, consider the following class and its constructor:

class OnlineDocument {
let title: String
let contents: String

init(title: String, from url: URL) throws {
self.title = title
self.contents = try OnlineDocument.load(from: url)
try validate()
}
  static func load(from url: URL) throws { ... }
  func validate() throws { ... }
}

A cursory examination should convince you that the self value is definitely initialized by the time the constructor returns successfully — the class has two stored properties, title and contents, both of which are assigned to inside the constructor, and neither property is read before being constructed.

Furthermore, the call to the instance method validate() is only performed after both stored properties have been initialized, meaning that another important invariant is maintained: the self value is not permitted to escape from the constructor before all stored properties have been initialized.

What about the failure paths though? There are two places where the constructor can fail:

  • If the static method load() throws
  • If the instance method validate() throws

Note that if load() throws, the self value in the constructor is only partially initialized; the title property has been initialized, but the contents property has not. So in this failure path, we must take care to release any memory associated with title, without touching contents. Then, we must deallocate the partially-initialized instance of OnlineDocument.

On the other hand, if validate() throws, the self value is fully initialized, and a single release operation suffices to deinitialize both stored properties and free the memory associated with the instance.

If OnlineDocument had a superclass, the situation gets even trickier; we might throw before calling the superclass constructor, the superclass constructor might throw, or we might throw after calling the superclass constructor. All of those situations need to be handled, and all combinations of partially uninitialized stored properties must be handled.

While everything described above worked in Swift 4.0, there were a number of corner cases that were not handled correctly, resulting in incorrect generated code or compiler crashes. This includes constructors which were both failing and throwing, or constructors that delegated to a throwing constructor defined in an Objective-C base class. Another case that was handled incorrectly was passing self as a parameter to the superclass constructor — this should have been rejected, but was not, and code that used this pattern could hit undefined behavior at runtime because a partially-initialized self value was permitted to escape.

In Swift 4.1, large parts of the definite initialization compiler pass have been redone completely, consolidating various disparate code paths. This fixes various bugs, including the ones outlined above, while also improving diagnostics and making the logic easier to understand and debug in the future.

Parent pointer removal from type metadata

Swift supports nested types, for example here we define a struct nested inside a class:

class Horse {
struct Saddle {}
}

Unlike Java, Swift nested types cannot capture the stored properties or the self value of the outer type; Swift nested types are more like static inner classes in Java. However, Swift nested types can capture the outer type’s generic parameters. For example, if I invent a data structure SkipList<T> with a nested type Iterator, the iterator type is also specialized on the element type T:

struct SkipList<T> : Sequence {
func makeIterator() -> Iterator { return Iterator(self) }
  struct Iterator : Swift.Iterator {
let list: SkipList<T> // we can reference T here
    func next() -> T { ... } // ...and here
}
}

Just as SkipList<Int> and SkipList<String> are distinct types, so are SkipList<Int>.Iterator and SkipList<String>.Iterator.

Now, recall how Swift implements generics. Suppose I pass a SkipList<T>.Iterator to a function expecting a Swift.Iterator:

func iterate<I : Iterator>(_ iter: Iterator) { ... }
iterate(mySkipList.makeIterator())

The function iterate() is only compiled once in the general case; the calling convention passes the iterator value together with type metadata for the generic parameter I. You can find details in our talk, Implementing Swift Generics. The relevant detail here, is when the generic method calls a protocol method on the value of type I, how does the method recover the type metadata for the generic parameter T from the type metadata for SkipList<T>.Iterator?

The answer up through Swift 4.0 is that each type stores the metadata for its immediate generic parameters, as well as a pointer to the type metadata for its parent type, if any.

This meant that if I want to create a new instance of SkipList<Int>.Iterator from scratch, I would first call the type metadata constructor for SkipList<T> passing in the type metadata for Int, and then call the type metadata constructor for SkipList<T>.Iterator, passing in the type metadata for SkipList<Int>.

Inside the implementation of a protocol requirement, which receives the type metadata for the protocol’s conforming type as an argument, we would first follow the pointer to the parent type metadata, and then follow the pointer to the type metadata for T if we wanted to abstractly manipulate a value of type T.

This approach of explicitly modeling nested types in type metadata seems simple and obvious, but it had a number of subtle issues:

  • The most obvious is that recovering generic parameters of the outer type requires an additional pointer indirection.
  • In the case where both the outer and inner type are not generic, we want to emit the metadata for the inner type statically if possible. However, if the parent type metadata requires a call to a runtime function to load, the inner type would also require full runtime initialization. When the Objective-C runtime is used, class metadata must always be obtained by calling a function, to reference the class with the runtime. This meant that a struct, even a simple non-generic struct, nested inside an Objective-C class required runtime initialization for its metadata.
  • A more insidious problem related to the above is that a class nested inside another class could not be @objc because only classes whose metadata does not require runtime initialization can be referenced statically from C code or loaded with NSClassFromString() or similar.
  • If the metadata for the outer type itself depended on the inner type, the runtime would deadlock because neither metadata could be initialized without the other. An example is when the superclass of a class is a nested type of the class itself:
class Outer : Outer.Inner {
class Inner { ... }
}

Now, none of the above problems are deal-breakers on their own. The efficiency hit of the additional indirection and runtime metadata initialization is marginal at best, and the runtime deadlock is just one specific example of a more general problem that can be observed with value types as well. However, together with a few additional issues not mentioned here, they add up to a non-trivial amount of technical debt that we wanted to address before ABI stability.

Now in Swift 4.1, type metadata for nested generic types no longer points at the parent type, and instead includes all generic parameters for all outer scopes. The main user-visible benefit here is that a certain class of runtime deadlocks with circular type metadata is now fixed. A complete solution for the general runtime deadlock issue with circular type metadata is not part of Swift 4.1, but is being developed on the master branch.

Classes conforming to protocols with default implementations

The next improvement concerns a language restriction that seems rather puzzling at first sight. If a protocol has a method requirement returning Self, Swift 4.0 does not allow a class to use a default implementation of the method as a witness for the requirement, instead requiring the method to be implemented directly in the class, even if the body is just a copy-and-paste of the default implementation:

protocol Widget {
func print()
func clone() -> Self
}
extension Widget {
func print() { print("a widget of type \(Self.self)")
func clone() -> Self { return self }
}
class Box<Element> : Widget {
let contents: T
  // Default implementation of print() is OK!
  // You must define this, or conformance checking fails:
func clone() -> Self { return self }
}

The reason for this was rather esoteric. To understand why, we need to understand how type metadata for Self is passed from the caller to the extension method.

The protocol method print() has the type <Self : Widget> (Self) -> () -> (). This means the caller must supply type metadata for Self, together with a witness table for the conformance of Self to Widget. The witness table contains a function for pointer each requirement, in our example, print() and clone(). The function pointers do not directly point at the concrete implementations of print() and clone(). Instead, they point at protocol witness thunks, which translate the calling convention of the protocol method to the calling convention of the concrete method. The concrete methods on Box<T> expect the type metadata for T as a parameter; the witness thunks receive the type metadata for Box<T> as Self. So the witness thunk pulls the type metadata for T out of the type metadata of Self and passes that on to the concrete method.

However, if the method was itself implemented in a protocol extension, we would then reconstruct metadata for Box<T> from T, and pass this reconstructed metadata to the protocol extension method. The original metadata for Self was lost. To see why this is a problem, let’s pretend we define a subclass of Box<T>:

class Crate<T> : Box<T> {}
let c = Crate<Int>(value: 3)
c.print()
// prints: a widget of type Box<Int>

Indeed, the problem here is that we’ve lost the fact that we have a subclass of Box<T>, because we took apart the metadata for Self that was passed in by the caller, and called the default implementation of print() with newly-constructed metadata for Box<T>.

This does not create any type soundness issues in the case of print(), but it might give you unexpected behavior if you look at the type Self directly inside the protocol extension method, as we do here. However, imagine if Swift 4.0 allowed the default implementation of clone() to be used. Then if we call clone() on some abstract generic parameter, say X that is known to be constrained to Widget, the type signature of clone() tells us that we expect to get an object with the same type, X, back at runtime. However, if X is bound to a subclass of Box<T>, such as Crate<Int>, this contract is violated at runtime, because it returns a value of type Box<Int>, not Crate<Int> as we expect.

This is why Swift 4.0 did not allow the default implementation of a protocol method returning Self to be used on a class. In Swift 4.1, this problem is solved by having protocol witness thunks for classes preserve the metadata for Self they received from the caller, instead of always taking it apart, and pass it on if the protocol requirement is satisfied by a default implementation.

This plugs the soundness hole and simplifies the conceptual model of the language. There is now one less corner case to worry about.

Another related improvement in this area concerns the construction of requirement enviroments. A requirement environment is a compile-time data structure defining the mapping from the protocol requirement’s type signature to the witness type signature. The type checker now caches requirement environments and constructs fewer of them while checking conformances, which gives a small compile-time speed boost.

Merge modules pass and serialized SIL

Roughly speaking, there are two ways of invoking the Swift compiler:

  • Whole-module optimization (or single-frontend mode), where a single compiler job parses, type checks and generates code for all source files in a project. Whole-module optimization allows greater optimization opportunities because all the code for a module is available for analysis to the optimizer. It also uses more memory, does not support incremental rebuilds, and cannot make use of multiple cores except for the LLVM optimization phase at the very end of the pipeline.
  • Single-file mode, where the driver runs one compiler job per source file. Each job only type checks and generates code for a single file, producing a single object file as output. Single-file mode allows for incremental rebuilds, where only changed source files and those that depend on them need to be rebuilt.

When building an application, single-file mode produces a set of object files, which are then linked together to form the final binary. When building a module, each job produces an object file together with a partial Swift module file. In addition to linking the object files together, the partial Swift modules are linked together to produce the final Swift module by what is known as the merge modules pass.

A Swift module file is a binary serialization of the AST-level declarations from one or more source files, together with the SIL instructions from the bodies of any inlinable functions.

When you import another module from your project, the compiler will lazily deserialize declarations from the module as name lookup is performed, using the serialized types of those declarations and their members during the course of type checking. When generating code, calls to inlinable functions for which serialized SIL is available can be inlined or specialized across module boundaries, giving a performance boost.

In Swift 4.0, the merge modules pass was able to merge AST-level declarations, but it would drop serialized SIL. This meant that only modules built with whole module optimization could export serialized SIL for inlining in other modules.

This wasn’t a huge problem of course — in both Swift 4.0 and 4.1, the functionality for cross-module inlining and specialization was not officially supported. Furthermore, every public definition always had an exported symbol in the resulting object file, so if a serialized SIL body was not available, it was always safe to just call the external symbol.

However, as we work on the stable ABI, both of these are subject to change. A cross-module inlining and specialization proposal was recently accepted for inclusion in a future Swift release. To lay the groundwork for this and other features which may appear in the future, Swift 4.1 now supports merging serialized SIL in the merge modules pass.

While we already had the code to deserialize and re-serialize SIL, this change was not quite straightforward. The big limiting factor is that deserializing declarations that reference imported types would call into the ClangImporter subsystem of Swift, and the ClangImporter did certain things that required a type checker instance to be running. Most of the time, deserialization happens during type checking, so this was not a problem. However, when deserializing SIL, either in the optimization phase or during the merge modules stage, no type checker is available.

To address this, Swift 4.1 introduces some changes to the ClangImporter, making it no longer depend on the type checker for certain things, instead constructing fully-type checked declarations where possible. This gives a small compile-time performance boost, improves reliability by fixing some crashes that could occur when deserialization occurred during the SIL optimization phase, and finally, enables the merge modules pass to reliably preserve serialized SIL.

Lazier declaration checking in single-file mode

Recall the above description of single-file mode:

Each job only type checks and generates code for a single file, producing a single object file as output.

This is a gross simplification. First of all, every job has to parse all source files in the module, even though only the primary file is going to be type checked. This is because we must be able to find extensions of other types, and perform name lookup of symbols in other files in general.

Also, declarations in a job’s primary file can directly reference declarations in other source files in the module. So while we must type check every declaration (and for functions, their bodies) in the primary file, we may also end up checking a lot of declarations in non-primary files, together with any declarations they reference, and so on.

In the worst case, we can end up performing a quadratic amount of work. To see why, consider a project consisting of a series of source files, where each source file contains a single class, whose superclass is the next source file’s class, and so on, until the last file, which contains a class with no superclass. That is, we have:

// File 1
class C1 : C2 { ... }
// File 2
class C2 : C3 { ... }
// File 3
class C3 : C4 { ... }
// File 4
class C4 { ... }

When we type check the first file in the module, we effectively end up type checking all four class declarations, because each one has a superclass that references the next.

In the particular case of a superclass reference, the architecture of the declaration checker makes it very hard to make the process more incremental. This is because we only have two “type check this declaration” operations:

  • “Validation”, which basically means doing enough work to be able to reference the declaration. For a function, this means checking all parameter and return types; for a class, this checks the superclass and any generic parameters, and so on.
  • “Type checking”, which is what we perform on declarations in the primary file; for functions, this also checks their body.

Making this design more fine-grained is something we’d like to do in the future, but in the meantime, there are some possible improvements.

First of all, we never type check functions in non-primary source files, so even though we have to parse non-primary source files, we can skip function bodies, avoiding building AST for them, since it will never be needed.

Second, we can in some cases avoid validating all members of a type, depending on how it is used.

To understand why we must validate all members of a type, consider the example of a class in one file, with a second file that invokes a method on that class, and a third file with a method that just stores an instance of the class in an array:

// File 1
class MyClass : BaseClass {
func method1(a: NSObject) { ... very complex code ... }
func method2(b: UIView) { ... very complex code ... }
override func method3(c: SomeOtherClass) { ... very complex code ... }
}
// File 2
public func callMethodOnClass(m: MyClass) {
b.method2()
}
// File 3
public func storeClass(m: MyClass) {
someArray.append(m)
}

Let’s pretend we’re running a compiler job with “file 2” as the primary file. We have to parse “file 1” and “file 3” as well, but as noted above, we no longer build AST for the complex method bodies in “file 1”.

Now, methods in Swift are virtually dispatched, which means every class has a “vtable” consisting of function pointers. Subclasses can override methods by replacing entries in the base class’s “vtable”.

In order to emit the method call of method2 on MyClass from “file 2”, we must know the exact vtable layout of MyClass. Calculating this layout requires validating all the members of MyClass which may be virtually dispatched, figuring out which ones override members from the base class, and whether the override adds a new vtable entry or not (this can occur if the override has more general, covariant parameter type, or a less general return type. Until Swift 4.0, this used to crash — this was another “behind the scenes” improvement from an earlier era).

So while type checking “file 2”, we cannot presently avoid validating every member of MyClass, which means importing NSObjectand UIView, and possibly deserializing SomeOtherClass (or validating it, if its defined in the same module but another source file).

However, while type checking “file 3”, we don’t need to know the vtable layout of MyClass, so we don’t have to validate all the members of MyClass, only the class itself.

This was implemented by decoupling “validate a declaration” from “validate the layout of a declaration”; the latter does more work, and is only invoked when we see a method call on a class, an extension of a class, or a subclass of a class.

A final improvement in this area concerns extension binding. In single-file mode, we must locate all extensions in all source files, and “bind” them to their corresponding types. This was a particularly bad source of quadratic behavior, because every source file always had to bind every extension. Previously, we used to validate each extended type, which meant validating all of their members.

Now, not only is validation of members decoupled from the validation of a type itself, but we can do even better, and not validate a type at all when binding an extension of it.

Together, these improvements might lead to shorter compile times with Swift 4.1 on certain projects.

Removal of -sil-serialize-all mode

In Swift 4.0, the standard library is built in a special mode where we serialize SIL for the bodies of all defined functions. This allows for standard library code to be inlined and specialized in clients, which is a major performance win since so much of the standard library is generic algorithms on collections that benefit greatly from these optimizations.

Now that Swift 4.1 includes a preliminary, unofficial implementation of the cross-module inlining and specialization proposal, we were able to remove this special compilation mode, and annotate most of the library with the @_inlineable attribute instead.

As we converge on a stable ABI, we will perform a full audit of the standard library implementation and decide which functions are to be inlinable and which ones are not. For now, almost everything is inlinable, to mimic the behavior of the old -sil-serialize-all mode. This still introduces certain advantages over the Swift 4.0 approach.

We were able to remove the implementation of -sil-serialize-all, which was an on-going source of bugs and corner case behavior. A related feature we were able to remove is a special @_semantics("stdlib_binary_only") attribute used to “opt out” of -sil-serialize-all. Now that inlinability is “opt-in”, the functions that were formerly marked as @_semantics("stdlib_binary_only") are now simply not marked inlinable. Both of these changes narrow the gap between “normal Swift” and “the standard library dialect”, which makes the implementation more approachable and easier to maintain.

Conclusion

There really is no big conclusion here. Hopefully you found this post informative or at least entertaining, and as I pointed out in the beginning, don’t hesistate to reach out if you have any questions.

Like what you read? Give Slava Pestov a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.