Concurrency in iOS. Async Data Source.

Sasha Terentev
16 min readFeb 9, 2024

--

The previous article: GCD. Readers–Writers Problem

The next article: Async and Declarative Layouts. Async Rendering.

In the previous article we delved into concurrency essentials, exploring the Readers-Writers problem and its solution leveraging GCD.

Building upon this foundation, let’s delve deeper into the concept of a Shared Resource and its application as the Data Source of a UI component.

In this discussion, we aim to address related challenges such as Inconsistency and the synchronous retrieval of the data source state from the UI thread.

As always, beginning with an illustrative example will provide a clear starting point for our exploration.

If you’re seeking the final GCD-based solution with all the fixes and improvements, including the ability to immediately (non-blocking synchronously) access the data source state from a predefined specific queue, such as the Main queue, please scroll to the end of the article, where you’ll find the Specific Queue Observer section.

Please ensure to also consider the section Internal Thread-Safety of State

Example

As you may recall from our previous article, we presented an implementation of a Shared Resource utilizing GCD. Let’s make some adjustments to this resource to adapt it for a specific scenario: serving as the data source for a UI component that manages a collection of items.

Summarizing the requirements:

  • UI State Description: The object must accurately describe the state of the UI.
  • Updates After Persistent Data Reading: It should update accordingly after reading persistent data.
  • Handling Network Events: Capable of handling events received from the network.
  • Dedicated Queue for Heavy Operations: Requires its own queue to handle resource-intensive operations efficiently.
  • UI Reflects DataSource Changes: Changes to the DataSource should be promptly reflected in the UI.

Note: We won’t delve into the details of persistent data reading or specific network operations. They were merely mentioned as examples to highlight heavy and asynchronous operations with the DataSource.

Note 2: Unobserving functionality is intentionally omitted from the article to keep the code as concise as possible.

Solution 1

As a starting point, we’ll leverage the solution provided in the previous article and make minor adjustments to the information stored within the shared resource to align with the specified requirements.

The key modification will be the implementation of a mechanism to observe changes of the resource:

protocol DataSourceObserver {
func dataSourceChanged(_ dataSource: DataSource)
}

typealias Item = String
typealias Header = String
typealias Footer = String

class DataSource {
var header = Header()
var items = [Item]()
var footer = Footer()

var observers = [DataSourceObserver]()

func change(_ block: @escaping (DataSource) -> Void) {
queue.async(flags: .barrier) {
block(self)
self.observers.forEach { $0.dataSourceChanged(self) }
}
}

func read(_ block: @escaping (DataSource) -> Void) {
queue.async {
block(self)
}
}

private let queue = DispatchQueue(label: "com.data_source", qos: .userInitiated, attributes: .concurrent)
}

And here is the UI component code provided:

class CollectionComponent: DataSourceObserver {
let dataSource: DataSource
init(dataSource: DataSource) {
self.dataSource = dataSource
dataSource.observers.append(self)
}

func dataSourceChanged(_ dataSource: DataSource) {
reload()
}

func reload() {}
}

You may notice that I intentionally violated the rule introduced in the previous article: the properties of DataSource are made public for mutation. This was done to underscore the problems resulting from such a violation.

Thread-Safety

Since the mutable properties are public, there are no restrictions, neither at compile time nor at runtime, on accessing and mutating them from any thread.

class CollectionComponent: DataSourceObserver {    
func dataSourceChanged(_ dataSource: DataSource) {
DispatchQueue.main.async { self.reload() }
}

func reload() {
// UI thread
let items = dataSource.items // bad access is possible
}
}

I presume that such problems will eventually be detected, as they often lead to crashes such as memory access violations. The call stack of each crash should provide informative clues. This type of issue can be addressed by using Swift actor, although it comes at the cost of losing the ability for parallel reading since the actors are single-threaded.

Inconsistency

The properties may exhibit inconsistency during reading, as they can be changed between successive readings.

These issues may go unnoticed if they don’t immediately result in crashes during reading.

Note: Swift Actors do not address inconsistency problems.

To tackle inconsistency, let’s revisit the definition of Data consistency: it refers to the compliance of the data with each other, ensuring data integrity and internal non-contradiction.

Example: Ensuring consistency between the states of a UICollectionView, which includes both its Data Source and Layout, during transitions is essential. If the updates made to a Collection view do not correspond with the changes between the previous data source (or layout) and the final one, it can lead to catching NSInternalInconsistencyException. I have described some other possible reasons for such exceptions thrown from a UICollectionView here.

To illustrate the problem, even if we employ Swift Actors, we may transit our data source to an actor:

actor DataSourceActor {
// Observing will be added further

var header = Header()
var items = [Item]()
func setItems(items: [Item]) {
self.items = items
}
func clear() {
items = []
}
var footer = Footer()
}

@MainActor
class CollectionComponentForActor {
let dataSource: DataSourceActor

func updateItem(at index: Int, update: (DataSourceActor.Item) -> (DataSourceActor.Item)) async {
var updatedItems = await dataSource.items
updatedItems[index] = update(items[index])
await dataSource.setItems(items: updatedItems)
}
}

The issue here arises from the fact that dataSource.items may be altered by some other operation between var updatedItems = await dataSource.items and await dataSource.setItems(items: updatedItems).

For instance, if this “wedged” operation involves removing all items(dataSource.clear()) the subsequent await dataSource.setItems(items: updatedItems) will attempt to “return” all the removed items back to the data source.

Due to reentrancy, such a situation is possible even if similar code is located within a method of the actor, particularly if this method contains a suspension point (await someAsyncFunc()):

actor DataSourceActor {
...
func reloadItem(at index: Int) async {
let item = items[index]
let updatedItem = await loadUpdates(for: item)
// at this moment self.items may not contain the initial item!
items[index] = updatedItem
}
func loadUpdates(for item: Item) async -> Item {}
}

Reading Thread

Both implementations, whether GCD-based or actor-based, impose the same restriction on the reading code: any access to a mutable property should occur within the Data source’s thread/queue. This constraint necessitates asynchronous reading of the data source info. It would be beneficial to find a safe workaround for this limitation.

The Safest Not Blocking Solution. State Object

To mitigate or minimize the occurrence of the aforementioned problems, we must address their root cause: public scattered mutable properties.

In other words, we need to transform all properties in the opposite direction. The properties of the data source should be:

  • Private;
  • Immutable;
  • Combined together.

Another factor contributing to the problem is a non-atomic change, or in other words, the sequence of operations: { read, change, write }.

As a solution, I propose employing the following technique: encapsulating the properties within a private State object.

And handling the communication between the data source and the UI:

The code implementing this approach would be as follows:

typealias Item = String
typealias Header = String
typealias Footer = String
struct State {
let header: Header
var items: [Item]
var footer: Footer
}
struct StateUpdate { // To differentiate state=nil and no update cases
let newState: State?
}

protocol DataSourceObserver {
func dataSourceChanged(_ newState: State?)
}

For a GCD-based data source, multiple reading operations are permitted to occur in parallel:

class DataSource {
private var state: State? {
didSet {
observers.forEach { $0.dataSourceChanged(state) }
}
}

private var observers = [DataSourceObserver]()
func add(observer: DataSourceObserver, notifyOnAdded: Bool) {
update { _ in
self.observers.append(observer)
if notifyOnAdded { observer.dataSourceChanged(self.state) }
return nil
}
}

func update(_ update: @escaping (State?) -> (StateUpdate?)) {
queue.async(flags: .barrier) {
guard let update = update(self.state) else { return }
self.state = update.newState
}
}

func read(_ block: @escaping (State?) -> Void) {
queue.async {
block(self.state)
}
}

private let queue = DispatchQueue(label: "com.data_source", qos: .userInitiated, attributes: .concurrent)
}

In contrast to the GCD case, in the actor variant, both types of operations, read and write, are sequential (serial):

actor DataSourceActor {
private(set) var state: State? {
didSet {
observers.forEach { $0.dataSourceChanged(state) }
}
}
func updateState(_ update: (State?) -> (StateUpdate?)) {
guard let update = update(state) else { return }
state = update.newState
}

private var observers = [DataSourceObserver]()
func add(observer: DataSourceObserver, notifyOnAdded: Bool) {
observers.append(observer)
if notifyOnAdded { observer.dataSourceChanged(self.state) }
}
}

Undoubtedly, it’s still possible to conceive of inconsistent code, but now it’s significantly more challenging due to the design choice of having immutable state properties and always requiring a full state reset.

The issue with reading from different threads has also been mitigated. Since the state is immutable, there’s no risk associated with accessing it from any thread. To avoid making asynchronous calls each time we need to read the state property, we can simply store a copy of the state in any thread, particularly in the UI code, whenever the state is updated.

class CollectionComponent: NSObject, DataSourceObserver, UICollectionViewDataSource {
func setup(with dataSource: DataSource) {
dataSource.add(observer: self, notifyOnAdded: true)
}

func setup(with dataSource: DataSourceActor) {
Task {
await dataSource.add(observer: self, notifyOnAdded: true)
}
}

func dataSourceChanged(_ newState: State?) {
// the only place we transit the state to the UI thread not to make self.state atomic
DispatchQueue.main.async {
self.state = newState
}
}

private var collectionView: UICollectionView?
private var state: State? {
didSet {
collectionView?.reloadData()
}
}
private var items: [Item] { state?.items ?? [] }

func numberOfSections(in collectionView: UICollectionView) -> Int {
1
}

func collectionView(_ collectionView: UICollectionView, numberOfItemsInSection section: Int) -> Int {
items.count
}

func collectionView(_ collectionView: UICollectionView, cellForItemAt indexPath: IndexPath) -> UICollectionViewCell {
let items = items[indexPath.item]
// setup and return cell
}
}

The introduced solution is indeed the safest possible. There’s only one instance in the code where we need to wait for asynchronous reading of the state: during UI setup.

In the Compromises section of the article, I’ll outline various workarounds for enabling synchronous reading of the data source information. However, it’s important to note that each workaround will come with its own set of obstacles. In some cases, the code may become less safe or less performant as a result.

I hope the following timeline scheme will aid in better understanding the process of state reading and switching, as well as clarify what is happening in each queue:

In the case of the actor, the scheme remains similar, but it’s essential to note that all data source operations are consequent (serial), as mentioned previously.

Internal Thread-Safety of State

The proposed solution is entirely thread-safe only if the State object is completely immutable. In other words, we guarantee thread safety only for changes to the dataSource.state property. If the State is a reference type, we ensure thread safety only for operations with the corresponding pointer (reference) placed inside the data source. If the State is a value type, such as a Struct, the struct value itself is safe.

However, if the State contains any mutable information inside, even if it’s deeply nested, thread safety is not guaranteed for this mutable information. It’s crucial to note that the problem may still exist even if the State is a value type.

To clarify, let’s consider some examples.

Value-type State:

struct State {
let string: String
let nsString: NSString
let array: NSArray
let dictionary: NSDictionary
}

// Safe
State(string: "test", nsString: "test", array: [NSObject()], dictionary: [:])

// Not safe
let someMutableStringFromOtherThread: NSMutableString
let string = someMutableStringFromOtherThread
State(string: string as String, nsString: string, array: [string], dictionary: ["key": string])

Reference-type State:

// Safety remains the same as for a value-type State.
class ImmutableState {
let string: String
let nsString: NSString
let array: NSArray
let dictionary: NSDictionary
}


/// Every property access is not safe and also
/// inherits all safety problems from the previous variants
class MutableState {
var string: String
var nsString: NSString
var array: NSArray
var dictionary: NSDictionary
}

Just to note, if the arguments provided to the init method are not thread-safe at the moment of initialization of State, the code is not thread-safe.

Compromises. Synchronous Access

Using Locks

One potential solution that comes to mind is to use locks for reading or changing the data source properties. However, using locks will inevitably lead to blocking the UI for some duration. In the worst-case scenario, if a deadlock occurs, certain threads may remain locked indefinitely.

If I’m not mistaken, this solution is not applicable for the actor variant. Let’s therefore focus on the GCD implementation.

And in this case, we have two options to consider.

Atomic Properties

We may make the state property atomic. However, as we’ve learned in the section Internal Thread-Safety of State, it’s crucial to remember that safety is guaranteed only for accessing the dataSource.state property itself, not for the nested memory of State. This solution is similar to the Objective-C @atomic modifier:

class DataSource {
private let lock = NSLock()
/// - Warning: never use this property even in this class except `state` property implementation!
private var _state: State?
private(set) var state: State? {
set {
lock.lock()
_state = newValue
lock.unlock()
// we provide `newValue` to observers! not to place with code inside `lock`!
observers.forEach { $0.dataSourceChanged(newValue) }
}
get {
lock.lock()
defer {
lock.unlock()
}
return _state
}
}

private var observers = [DataSourceObserver]()
func add(observer: DataSourceObserver, notifyOnAdded: Bool) {
update { _ in
self.observers.append(observer)
if notifyOnAdded { observer.dataSourceChanged(self.state) }
return nil
}
}

func update(_ update: @escaping (State?) -> (StateUpdate?)) {
serialQueue.async {
guard let update = update(self.state) else { return }
self.state = update.newState
}
}

private let serialQueue = DispatchQueue(label: "com.data_source", qos: .userInitiated)
}

Note: We may modify this solution and place self.state = update.newState inside the lock. This would transform the solution into the approach described in the compromise Using Locks. However, calling the block provided via the public API, update: @escaping (State?) -> (StateUpdate?), and even notifying the observers inside the lock, is unsafe as it increases the possibility of facing a deadlock in the API user’s code.

In this case, the queue is serial, and we use it only for calculations of a new State. We no longer need the async read() function. State updates are atomic as well, which is guaranteed by the implementation of the update() function. However, the inconsistency problem may arise in the following UI code:

func collectionView(_ collectionView: UICollectionView, cellForItemAt indexPath: IndexPath) -> UICollectionViewCell {
let cell = UICollectionViewCell()
cell.title = dataSource.state?.items[indexPath.item].title
// The state may be changed between this lines from the data source queue
cell.detailedTitle = dataSource.state?.items[indexPath.item].detailedTitle
}

Sync GCD Operations

As a different solution, we may allow synchronous addition of operations to the data source queue. To avoid deadlocks, we must ensure that locks are never added to this queue for the opposite direction, towards one of the data source user’s threads.

class DataSource {
// Warning! Never use locks in the `self.queue` operations

private var state: State? {
didSet {
observers.forEach { $0.dataSourceChanged(state) }
}
}
private var observers = [DataSourceObserver]()
func add(observer: DataSourceObserver, notifyOnAdded: Bool, sync: Bool = false) {
update(sync: sync) { _ in
self.observers.append(observer)
if notifyOnAdded { observer.dataSourceChanged(self.state) }
return nil
}
}
func update(sync: Bool = false, _ update: @escaping (State?) -> (StateUpdate?)) {
let operation = {
guard let update = update(self.state) else { return }
self.state = update.newState
}
sync ? queue.sync(flags: .barrier, execute: operation) : queue.async(flags: .barrier, execute: operation)
}
func read(sync: Bool = false, _ block: @escaping (State?) -> Void) {
let operation = {
block(self.state)
}
sync ? queue.sync(execute: operation) : queue.async(execute: operation)
}

private let queue = DispatchQueue(label: "com.data_source", qos: .userInitiated, attributes: .concurrent)
}

Non-Blocking Reading from a Specific Queue

The last compromise I want to consider involves optimizing for non-blocking reading from a specific queue.

If we are certain that we are going to use our Data source from a specific serial queue, such as the Main queue, we can modify our code to enable immediate reading access without requiring a lock from this queue.

To achieve this, we have different options, and I want to consider a couple of them.

Providing Target Queue for Writing Operations

If we aim to ensure reading safety from a specific queue different from the data source, we can synchronize writing operations between two queues. I utilized this solution in a scenario where we already had an asynchronous data source with numerous public state properties that was used in a lot of places of the project.

class DataSource {
private let safeReadingQueue: DispatchQueue?
init(safeReadingQueue: DispatchQueue? = nil) {
self.safeReadingQueue = safeReadingQueue
}

// Safe for reading read from `self.safeReadingQueue` and `self.queue`
private(set) var state: State?

func update(_ update: @escaping (State?) -> (StateUpdate?)) {
let operation = {
guard let update = update(self.state) else { return }
if let safeReadingQueue = self.safeReadingQueue {
safeReadingQueue.sync(flags: .barrier) { // SYNC!
self.state = update.newState
}
} else {
self.state = update.newState
}
}
queue.async(flags: .barrier, execute: operation)
}

// the same as before
func read(_ block: @escaping (State?) -> Void) {
queue.async { block(self.state) }
}

/// Please ensure to never synchronously add operations to this queue
/// to avoid encountering a deadlock with `self.safeReadingQueue`
private let queue = DispatchQueue(label: "com.data_source", qos: .userInitiated, attributes: .concurrent)
}

So, if we provide safeReadingQueue: .main, the update scheme will be as follows:

The solution involves no lock inside safeReadingQueue, at the same time ensuring that immediate reading access is safe. However, we use locks inside the data source queue, with all the attendant consequences for this queue (but not for the safeReadingQueue).

Specific Queue Observer

The idea is as follows:

  1. Create an observer responsible for transferring the observed state to the designated queue.
  2. Place the observer inside the data source and keep it up-to-date from the very beginning to ensure that the observer always has an up-to-date accessible State value.

Here’s a code snippet implementing the described idea:

class SpecificQueueStateObserver: DataSourceObserver {
private let queue: DispatchQueue
init(queue: DispatchQueue, state: State? = nil) {
self.queue = queue
self.state = state
}
private(set) var state: State? {
didSet {
observers.forEach { $0.dataSourceChanged(state) }
}
}
/// Observers for `self.state`. The observers are being notified on the Main thread
var observers = [DataSourceObserver]()

func dataSourceChanged(_ newState: State?) {
queue.async(flags: .barrier) {
self.state = newState
}
}
}

An atomic state observer is a versatile solution designed to ensure thread-agnostic access from any thread. By employing atomic operations, this observer guarantees thread-safe access to the state property, eliminating the risk of data corruption or race conditions. This approach provides seamless access to the observer’s state, making it suitable for diverse threading scenarios:

class AtomicStateObserver: DataSourceObserver {
init(state: State?) {
self.state = state
}
private var _state: State?
private let lock = NSLock()
private(set) var state: State? {
set {
lock.lock()
_state = newValue
lock.unlock()
observers.forEach { $0.dataSourceChanged(newValue) }
}
get {
lock.lock()
defer {
lock.unlock()
}
return _state
}
}
/// Observers for `self.state`. The observers are being notified on the Main thread
var observers = [DataSourceObserver]()

func dataSourceChanged(_ newState: State?) {
state = newState
}
}

In case you need the complete data source code, here it is:

class DataSource {
/// - parameter setupMainQueueObserver: provide true if you need to observe `self` from Main queue. Providing true leads to adding operations on Main queue on every `self.state` update
/// - parameter setupAtomicObserver: provide true if you need to atomically read `self.state` from any thread. Providing true leads to waiting for the `atomicObserver` lock on every `self.state` update
init(initialState: State?, setupMainQueueObserver: Bool = false, setupAtomicObserver: Bool = false) {
state = initialState
mainQueueObserver = setupMainQueueObserver ? .init(queue: .main, state: initialState) : nil
atomicObserver = setupAtomicObserver ? .init(state: initialState) : nil
mainQueueObserver.flatMap { self.add(observer: $0, notifyOnAdded: false) }
atomicObserver.flatMap { self.add(observer: $0, notifyOnAdded: false) }
}

/// The method creates a copy of `self`.
/// The method is async because we are not able to safely read `self.state` after `self` initialization
/// - parameter setupMainQueueObserver: provide true if you need to observe `self` from Main queue. Providing true leads to adding operations on Main queue on every `self.state` update
/// - parameter setupAtomicObserver: provide true if you need to atomically read `self.state` from any thread. Providing true leads to waiting for the `atomicObserver` lock on every `self.state` update
func copy(setupMainQueueObserver: Bool = false, setupAtomicObserver: Bool = false, _ completion: @escaping (DataSource) -> Void) {
read { state in
completion(.init(initialState: state, setupMainQueueObserver: setupMainQueueObserver, setupAtomicObserver: setupAtomicObserver))
}
}

/// it is constant (`let`) because the value cannot be tread-safely changed after initialization
let mainQueueObserver: SpecificQueueStateObserver?
func observerOnMainQueue(with observer: DataSourceObserver) {
if let mainQueueObserver {
mainQueueObserver.observers.append(observer)
observer.dataSourceChanged(mainQueueObserver.state)
return
}
/// We cannot set `self.mainQueueObserver = newObserver` here because `newObserver.state` is not actualized
/// and cannot be synchronously actualized as we cannot get the actual `self.state` in sync manner
/// Supporting not initialized state for `mainQueueObserver` is solvable but complex task as it leads us to some State machine for it
let newObserver = SpecificQueueStateObserver(queue: .main)
newObserver.observers.append(observer)
add(observer: newObserver, notifyOnAdded: true)
}

/// it is constant (`let`) because the value cannot be tread-safely changed after initialization
let atomicObserver: AtomicStateObserver?

private var state: State? {
didSet {
observers.forEach { $0.dataSourceChanged(state) }
}
}

private var observers = [DataSourceObserver]()
func add(observer: DataSourceObserver, notifyOnAdded: Bool) {
update { _ in
self.observers.append(observer)
if notifyOnAdded { observer.dataSourceChanged(self.state) }
return nil
}
}

func update(_ update: @escaping (State?) -> (StateUpdate?)) {
queue.async {
guard let update = update(self.state) else { return }
self.state = update.newState
}
}

func read(_ block: @escaping (State?) -> ()) {
queue.async {
block(self.state)
}
}


private let queue = DispatchQueue(label: "com.data_source", qos: .userInitiated, attributes: .concurrent)
}

Below, you’ll find the UI code:

/// Block-based DataSourceObserver
struct BlockStateObserver: DataSourceObserver {
typealias Block = (State?) -> Void
private let block: Block
init(block: @escaping Block) {
self.block = block
}

func dataSourceChanged(_ newState: State?) {
self.block(newState)
}
}

class CollectionComponent: NSObject, UICollectionViewDataSource {
func setup(with dataSource: DataSource) {
let stateObserver = BlockStateObserver { [weak self] newState in
self?.state = newState
}
dataSource.observerOnMainQueue(with: stateObserver)
}

/// The rest of the code is the same

private var collectionView: UICollectionView?
private var state: State? {
didSet {
collectionView?.reloadData()
}
}
private var items: [Item] { state?.items ?? [] }

func numberOfSections(in collectionView: UICollectionView) -> Int {
1
}

func collectionView(_ collectionView: UICollectionView, numberOfItemsInSection section: Int) -> Int {
items.count
}

func collectionView(_ collectionView: UICollectionView, cellForItemAt indexPath: IndexPath) -> UICollectionViewCell {
let items = items[indexPath.item]
// setup and return cell
}
}

The queue scheme for this solution is illustrated below:

Conclusion

In this article, we introduced an implementation of an asynchronous data source and explored various challenges that may arise during its implementation.

We discussed several compromises concerning synchronous access to the asynchronous data source.

In the next article, we will delve into adding asynchronous layout functionality to our data source.

And do not forget:

--

--