Stop Using Structs!
I hear all the time people say ‘prefer value types over reference types’. What if we are all getting it wrong?
Important: This article was published for Swift 4.0 and updated on 06.09.2018 through the language updates that show up with Swift 4.1 and 4.2. So please read carefully Update sections. Some of the statements on article directly affected by these changes and they are NOT valid for the new versions of the language.
I’m preparing another article for what’s changed and how it affect our case. And by that time I can update this article for newer Swift versions.
Hello everyone, first of all, this is my very first post on Medium. I hope you enjoy it, here we go!
I know the title of this post reads somewhat ‘ambitious’ and I can admit that it’s intended as a bit of a click bait. But I think this topic needs huge attention, because most of the developer community is unaware of the problems with value type usages. So I would like to amend the title as below.
Stop (Mis)using & (Ab)using Value Types
In this post, I will try to address some anti-patterns of Swift value type usages. Then I will try to explain how these can affect our expectations of reality.
Value Types & Reference Types
As a reminder here is a list of Swift value types:
- Struct
- Enum
- Tuple
- Primitives (Int, Double, Bool etc.)
- Collections (Array, String, Dictionary, Set)
And reference types:
- Class
- Anything coming from NSObject
- Function
- Closure
*But some of the value types like Strings or Arrays indirectly keep items in the heap. So they are value types that are backed by reference types.
*This statement has some exceptions like small strings optimizations. With SSO small strings kept inside String struct itself, instead of a storage buffer.
Story
In 2014 Apple introduced Swift. It’s a multi-paradigm, compiled, statically and strongly-typed language. Swift has powerful and rich value types that support methods, implementing protocols(interfaces), extensions etc. Even though Apple recommended using value types since then, the milestones in the brief history of Swift were actually the following two sessions of WWDC 2015.
In these sessions, Apple strongly advised using value types more often. Value semantics serve to eliminate mutation and remove unintended sharing of state and related side effects. By providing powerful value types, Swift aims to maximize value type usage to avoid possible errors related to sharing the state. Meantime value types provide better performance metrics than reference types.
Therewithal it was Swift taking a pass at the functional programming community since the latter aims the same goals, even in today’s increasingly concurrent world. Functional programming also depends on the paradigm “thinking in functional style”. In functional programming world, there is no country for shared state, mutating state and related side effects. That said it has its disadvantages as well, such as its inability to fit perfectly to machine model or its inefficiency in cases where the mutation is a good choice. This is a huge topic that goes beyond the scope of this article. So I will not go deeper into this heated debate and leave it here for now.
Here is a quick refresher for value semantics and the features of value types presented in aforementioned WWDC sessions.
No Shared State (Auto Copying) & Immutability
Mutating an instance will never affect another.
Instances of value types are created in the stack and on each assignment or passing the value around (between functions or threads) there will be a unique instance(if compiler is not sure there will be no mutation, a new copy) and it will be passed. Therefore, you are guaranteed with no shared state. And it’s not possible to mutate an instance unintendedly.
Now create an instance of our struct
And assign it to another variable, then modify
As you see on assignment our struct is automatically copied and this copy is mutated. So value types don’t have shared state, and they have an auto-copying feature.
Swift’s collection types (Dictionary, Array, String, Set etc.) are value types that are backed by reference types. In these types copy-on-write
performance optimization implemented by default in order to avoid mutation issues. Basically copy-on-write provides creating another instance only when the first instance is mutated. Otherwise, a single instance is shared among the variables. So collections are safe for mutability. More information and implementation details of copy-on-write explained in this great article by Marco Santarossa.
No memory leaks
*Swift value types are kept in the stack, thus there is no dynamic memory allocation, and as a consequence no room for memory leaks.
*This statement has some exceptions, in the second part of the article I’m planning to mention those.
Thread safety
As I mentioned above, Swift value types are kept in the stack. In a process, each thread has its own stack space, so no other thread will be able to access your value type directly. Hence no race conditions, locks, deadlocks or any related thread synchronization complexity.
The code above is perfectly safe for multithreading. Each iteration modifies another copy of User
.
Memory Management
For value types, cost of allocating and deallocating memory is just decrementing and incrementing stack pointer.
For reference types situation is quite complicated, for allocation firstly runtime/os needs to find the most suitable location to avoid memory fragmentation. This is followed by the allocation and it must be thread-safe, that means there is a necessity for an inefficient lock/unlock or a similar synchronization technique.
On the other hand, reference counting must be thread-safe too. And it occurs in each assignment, or passing instance around, even in a for
loop. A for
loop on a reference type is actually
or on a function call
is actually
Performance
Value types do not need dynamic memory allocation or reference counting, both of which are expensive operations. At the same time methods on value types are dispatched statically. These create a huge advantage in favor of value types in terms of performance.
(Mis)use & (Ab)use
Thus far everything seems great with using value types. So what is the problem?
The problem starts with thinking, choosing between value types and reference types is a matter of opinion. It’s not! This decision has a strong checklist and none of the items on the list are based upon opinions, they are based on facts. In his brilliant Controlling Complexity in Swift: Making Value Types Friends talk, Andy Matuschak mention something, that may be easily misconstrued and therefore misleading. He uses the analogy of live objects versus dead objects for reference and value types respectively. This approach is quite subjective and its result can vary between people and doesn’t depend on the real difference between these two.
The bigger problem is believing in the cargo cult. Believing that; just by using value types we can get the benefit of value types, without following any constraints. After these WWDC sessions, we all started using value types excessively. And also building architectures, libraries, and tools based on this cargo cult. Even though top StackOverflow answer was edited on May 2016 it did hitherto contains the following paragraph
My personal advice, is to always default to using a struct because they greatly reduce complexity and fallback to classes if the Struct becomes very large or requires some feature that structs and protocols cannot provide, most notably the ability to have multiple variables reference the same data.
And yet this statement still contains a hope for the previous assumption being accurate, namely that the documentation is outdated.
The Swift Programming Language documentation was written before the Protocol Oriented Programming talk was given.
No, it’s not outdated my friend — we were all wrong.
The one lesson I’ve learned from technology and food is the only time you know you’re doing the wrong thing is when you’re doing what everyone else is doing.
Kimbal Musk
The constraint
Value types can take advantage of these features as long as they can remain as “pure value types”. In a case where a value type contains a reference type or a value type backed by a reference type (Strings, Arrays etc.), this world will collapse by leaving behind a wreckage of this cargo cult.
Now I will change the type of struct Account
to a class and look back at the benefits of value types in cases where the constraint is not applied.
Immutability & No Shared State
In the example above I’m creating 2 instances of my struct. Then I’m changing the alias for the second user’s account. Wait a second, first user’s account alias has changed too. But? What about immutability, and no shared state? As you see my struct is mutable. I failed the main purpose of choosing the value type.
So my codebase is quite vulnerable to bugs related to state sharing. You may say it’s as vulnerable as another reference type, and that may be true. But here is a confusion of believing using a value type keeps you safe. If it was a reference type, you would already know that you have shared state and you must handle it carefully.
No Memory Leaks
Previously I mentioned that value types are all in the stack and no dynamic memory management is required. But when we put a reference type inside a value type things change. It will be in the heap, so memory leaks are possible as it’s the same as any other reference type. So we are losing another advantage of using value types over reference types.
Thread safety
Remember me saying using value types provides thread safety, forget about that when reference types are placed inside value types. Unless you are convinced, you can try to run this code and see the results. Spoiler: It will crash 🙈
So as you see when you use a reference type inside a value type, you are losing thread safety advantage too. As it’s with a reference type, we need an external thread synchronization mechanism.
Memory Management & Performance
Update #1 (06.09.2018)
After Swift 4.1 Swift’s calling convention has changed and this directly effects language’s performance characteristics those were mentioned in this section. I’m preparing another article about what’s changed and how those effect the performance characteristics.
So statements under this section are valid for Swift versions 1.0 – 4.0
Let’s modify User
struct a little more and add some other fields.
and create an instance and pass it to another method
I can almost hear some saying what’s wrong with this code? It’s quite a common scenario. We build our struct from a JSON model, than we applied some common validation which happens quite often in our use cases.
Since User
is a struct, before passing it to a function, the instance will be copied, and that happens on the stack which seems fine. So what about the properties? As expected properties are copied too, with the one condition that "if they are value types". What if one of the properties is a reference type like UIImage
? Then our instance of UIImage
will not be copied, just another pointer to the instance will be created. So we have 2 references to a single instance which means we need to retain the actual storage, and this operation must be thread-safe, and a little more costly.
What about String
? Strings in Swift are value types backed by reference types that indirectly keep items in the heap. So the same situation is applicable for strings too. Their reference count should be increased before the assignment and decreased after they are out of scope. As I mentioned above this operation must be thread-safe too.
So for my example, I have 6 calls to retain and 6 calls to release and that means I need 12 thread synchronized operations.
And thanks to our great application architectures, before actually displaying the User
on the screen, I need to pass it through several steps. JSON Parsing, Field Validation, Business Validation, Formatting, Interactor, Presenter, View Controller, View etc..
Let’s think of an example, imagine we have an architecture that contains 6 steps, managed by different classes/functions.
JSON Parsing -> Field Validation -> Business Validation -> Formatting -> Interactor -> View Controller
If I assume that I have to pass this User
struct that contains 6 properties on each of these steps for only once -and none of the calls are inlined-, I will end up with 6 retain/release for one assignment (6 + 6) * 6 = 72 reference counting operations. That is a huge overhead.
What if I made User
a reference type. I would need just 1 retain/release for each assignment/passing around. That will end up with (1 + 1) * 6 = 12 reference counting operations.
So for a basic User
instance, we end up with 12 vs 72 operations. Is it hard to make a choice? I don't think so. A struct that contains n reference type or value type backed by reference type will be n - 1 times *slower than a reference type that contains same storage.
*This statement is just based on memory management perspective, software performance has other perspectives too.
So finally, I lost the “being fast” advantage of using a value type too. And this time it’s even worse than reference types.
Practice to the reader: Think about performance characteristics of an architecture like ReSwift, which is actually a Redux implementation based on state structs, that may contain lots of strings and arrays.
When to use value types
Despite documentation makes this quite clear I want to add small fundamental checks before making this decision.
Of course, the decision tree above is not an indisputable fact. You may have exceptions like you have a struct with many mixed fields, and you may have other reasons or a design decision to prefer value types. It’s fine, just take these considerations into account, know what you are doing well, and do it wisely.
Update #2 (06.09.2018)
After this article published, Apple updated the documentation through the changes in Swift 4.2 and language internals changed a lot. As I mentioned above I’m preparing another article for that. Till then you should know that, after Swift 4.2 Apple offers defaulting to structs, and classes should be used only when Objective-C compatibility and identity of data required.
So statements below are only correct for Swift versions 1.0 — 4.1
The official documentation on Choosing Between Classes and Structures says following:
However, structure instances are always passed by value, and class instances are always passed by reference. This means that they are suited to different kinds of tasks.
As a general guideline, consider creating a structure when one or more of these conditions apply:
- The structure’s primary purpose is to encapsulate a few relatively simple data values.
- Any properties stored by the structure are themselves value types, which would also be expected to be copied rather than referenced.
Examples of good candidates for structures include:
- The size of a geometric shape, perhaps encapsulating a width property and a height property, both of type Double.
- A way to refer to ranges within a series, perhaps encapsulating a start property and a length property, both of type Int.
- A point in a 3D coordinate system, perhaps encapsulating x, y and z properties, each of type Double.
In all other cases, define a class, and create instances of that class to be managed and passed by reference. In practice, this means that most custom data constructs should be classes, not structures.
POC for Performance
Update #3 (06.09.2018)
As I mentioned above, language performance is directly affected by the calling convention updates with Swift 4.1. So POC results aren’t applicable for versions above 4.0
Here is a POC for the case when a struct and class both contains 10 other classes.
Important: This POC is created and tuned for proving aforementioned statements. The results can easily vary in different situations, and they shouldn’t be generalized!
I made performance tests on my computer with specs
- MacBook Pro 15' (Mid 2014)
- 2,2 GHz Intel Core i7
- 16 GB 1600 MHz DDR3
and toolchain
- Apple Swift version 4.0.3 (swiftlang-900.0.74.1 clang-900.0.39.2)
- Target: x86_64-apple-macosx10.9
I tested each configuration with 100 iterations, results are averages in milliseconds, smaller is better 🙃
When I disassemble these 3 binaries, I see that the single file optimized version provides better POC for my assumption. In different configurations compiler optimizations may take the output far away from the concept expectations. Especially in whole module optimization case, compiler optimizes the code in favor of classes. On the other hand, when I profile the executions for all configurations, I can clearly see that most of the execution time is consumed by retain
and release
methods.
TLDR & Moral
Since this is a long post I want to summarize the main point. By creating value types that contain reference types or value types that are backed by reference types, we are losing most of the advantages of value semantics. So in these cases, you should think twice or thrice by taking these constraints into account before using value types. Unless you have other benefits (if any) of using value types and it’s worth using them, you should prefer reference types. This is not about performance or optimization or another perspective. This is about doing things right, avoiding unexpected errors and using features provided by programming language wisely.
In the forthcoming second part of the article, I will get into more details and try to explain what happens when we use value types with protocols and/or generics.
Thanks for reading this long article ❤️ Please feel free to 💬 and share this article if you 👍 it. You can 🏃 🏃 me on Medium/Twitter for the second part of the article.
References: