How To Speed Up Swift By Ordering Conformances
The Swift runtime executes a protocol conformance check when you cast a type to a protocol, such as with as?
or as!
. This operation is surprisingly slow, as detailed in my previous post. In this article we’ll look at an easy way to speed this up by ~20%, without making any changes to your source code. First, a brief review of protocol conformance checks.
Review + iOS 16 improvements
Records of every conformance you write in source code get stored in the __TEXT/__const section of the binary in a form similar to this:
struct ProtocolConformanceDescriptor {
// Offset to the protocol definition
let protocolDescriptor: Int32
// Offset to the type that conforms to the protocol
var nominalTypeDescriptor: Int32
let protocolWitnessTable: Int32
let conformanceFlags: UInt32
}
A typical app can have tens of thousands of these. Many are conformances to common protocols such as Equatable
Hashable
Decodable
or Encodable
. When the Swift runtime encounters something like myVar as? MyProtocol
(which may not be written directly in your code, many common functions like String(describing:)
internally do an as?
) it loops over every ProtocolConformanceDescriptor
in the binary plus any dynamically linked binaries. This operation is O(n). In the worst case if you need to lookup a protocol conformance record for every type that would be O(n^2)
.
iOS 16 greatly improves on this. As I explained in a previous post, iOS 16 precomputes protocol conformances in the dyld closure, and the Swift runtime consults dyld before running the O(n)
lookup. At the time of the previous blog post Apple had not released the iOS 16 dyld source code, but now that they have, we can see the actual implementation in the function _dyld_find_protocol_conformance_on_disk
. This function is conceptually the same as the zconform library which speeds up these checks using a hash table that maps types to a list of protocols that they conform to.
However, there are still 3 cases where you might encounter the slow lookup, making it worth optimizing for:
- On the first launch after an app install/update. The dyld closure isn’t built yet, and all conformance lookups are still slow.
- When the conformance lookup results in
nil
. This could be a_dyld_protocol_conformance_result_kind_definitive_failure
but a quick scan of the source code reveals this is not yet implemented. - If you aren’t using iOS 16, such as a user on an older OS or using Swift on a non-apple platform including server side Swift.
It’s also difficult to measure this iOS 16 improvement in practice, because this dyld behavior is disabled when running the app from Xcode or Instruments. Emerge has a local performance debugging tool that works around this and can be used to profile apps that do have access to the dyld closure.
Order files
Order files are inputs to the linker which make apps faster by grouping code used together into the same region of the binary. With order files, your app accesses only the memory used by the app launch code rather than reading an entire 100+ MB binary into memory. This principle relies on the concept of a memory page size. To access one byte of the binary, the entire 16kb page is loaded. It’s beneficial to have the data you need on as few pages as possible. I previously wrote a deep dive on order files.
Keeping used memory close together is also important to improve the cache hit rate. iPhones have multiple levels of memory caches, for example the iPhone 7/A10 has the following structure [1]
Level | Size
--------------
L1 | 126KB
L2 | 3MB
L3 | 4MB
RAM | 2GB
NAND | 256GB
The specifics of speeds are not published by Apple and vary year to year, but some benchmarks show that moving up a level can increase latency by 5x [2].
Ordering conformances
By default, protocol conformances end up spread throughout the __TEXT/__const
section of the binary. This is because each module in an app generates their own static binary. When they are linked into the final app, the binaries are placed side by side. Data from different modules is not interleaved in the executable. Let’s visualize this with the Uber app, the version we’re using has 102,800 conformance records (based on the size of the __TEXT/__swift5_proto
section) and a 12.7mb __TEXT/__const
section.
The above figure shows the number of conformances on each page of Uber’s app. A protocol conformance record can vary in size (depends on details like associated types), but the minimum size is 16 bytes. You can have a maximum of 1024 conformance records on a single page of memory. Interestingly, Uber has a few spikes where a page contains nothing but the minimum sized conformances. This might be due to codegen, such as dependency injection or network models, which creates many simple protocols in one module. There are also a couple regions with no conformances, likely due to non-Swift code in the app. The key takeaway is that conformances are spread throughout the binary, so almost all pages will be loaded from memory when conformances are enumerated.
Similarly, the above figure shows conformances Lyft’s app. While there are no large spikes, there are about 250 conformances on every page with the exception of one region that is likely non-Swift code.
We can apply the idea of using order files to group data onto as few pages as possible to conformances, and generate an order file that moves all conformances onto their own pages.
The above figure shows the result of using an order file to group conformances. Each of the first ~250 pages now only contain protocol conformance descriptors, with about 500 per page. Conformance records vary in size, so the number of conformances on a page is not always the same. With this ordering, less than half of the section needs to be loaded when performing a protocol conformance lookup. In fact, the total memory used by 250 pages is < 4MB so in this example they can all fit in the L3 cache of an iPhone 7. In our tests, co-locating the conformances like this resulted in an over 20% decrease in protocol conformance lookup time on an iPhone 7 running iOS 15!