Understanding and Addressing Data Race in iOS Development

8 min readAug 9, 2024

What is a Data Race?

Data race is a common issue in parallel programming. In our daily development, a data race is likely to occur when the following conditions are met:

N (> 1) threads simultaneously access the same memory location
At least one thread is performing a WRITE operation
These threads are not using any synchronization mechanism

Data races can lead to unpredictable behavior, including data corruption, program crashes, and data inconsistency issues that affect user experience. In iOS development, due to the ease of use of GCD (Grand Central Dispatch), data race is a particularly noteworthy problem.

It’s important to clarify a common point of confusion: Data race and race condition. A race condition refers to a situation where the system’s substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent results. Race condition encompasses a broader scope than data race, but as it’s not the focus of this article, we won’t elaborate further. The following content will focus on data race.

Common data race scenarios in iOS development

In iOS development, operating on shared resources in different threads can easily lead to issues. The following scenarios are most prone to inadvertent occurrences:

class DownloadManager {
    var downloadedFiles: [String: Data] = []

    func downloadFile(_ file: String, completion: @escaping (Data?, Error?) -> Void) {
        DispatchQueue.global().async {
            do {
              let data: Data = try ... // Downloading...
              Thread.sleep(forTimeInterval: 0.1)
              // ❌ Data Race might happen here
              self.downloadedFiles[file] = data
              completion(data, nil)
            } catch {
              completion(nil, error)
            }
        }
    }
}

let manager = DownloadManager()

for i in 1...100 {
    manager.downloadFile("File\(i)")
}

In this example, data race may occur when multiple threads simultaneously call downloadedFiles. Another common scenario is Singleton Lazy Initialization:

class Singleton {
    static var shared: Singleton?
    
    private init() {}
    
    static func getInstance() -> Singleton {
        if shared == nil {
            // ❌ Data Race might happen here
            shared = Singleton()
        }
        return shared!
    }
}

DispatchQueue.concurrentPerform(iterations: 1000) { _ in
    let instance = Singleton.getInstance()
    ...
}

Methods to Solve Data Race

There are primarily two methods to solve data race problems: using locks and lock-free techniques. Each method has its advantages, disadvantages, and suitable use cases.

Lock

Locks are the most direct and commonly used method to solve data race. They prevent data race by ensuring that only one thread can access shared resources at a time. The protected code segment is often referred to as the “Critical Section”.

In iOS development, we can easily use methods like NSLock or os_unfair_lock to work with locks. However, I’ll demonstrate various lock applications using the Lock library, which is maintained by me. This library encapsulates POSIX, os_unfair_lock, and GCD, providing a consistent API for developers to use.

GitHub - ezoushen/Lock

Contribute to ezoushen/Lock development by creating an account on GitHub.

github.com

UnfairLock (os_unfair_lock)

UnfairLock is an efficient lock implementation widely used in iOS development. Here’s how we can rewrite the problematic function from the first case:

let lock = UnfairLock()

func downloadFile(_ file: String, completion: @escaping (Data?, Error?) -> Void) {
    DispatchQueue.global().async {
        do {
          let data: Data = try ... // Downloading...
          Thread.sleep(forTimeInterval: 0.1)
          // Protect write operation with lock
          lock.lock()
          self.downloadedFiles[file] = data
          lock.unlock()
          // Unlock after operation is complete
          completion(data, nil)
        } catch {
          completion(nil, error)
        }
    }
}

The use of lock can be replaced with different types of locks to achieve the same effect.

MutexLock (pthread_mutex)

MutexLock is a wrapper for POSIX thread mutex locks, providing more configuration options.

let lock = MutexLock(type: .default)
lock.lock()
// Critical Section
lock.unlock()

RWLock (pthread_rwlock)

RWLock provides read-write lock functionality, allowing concurrent read operations, while write operations require exclusive access.

let lock = RWLock()
// Read lock
lock.rdlock()
lock.unlock()
// Write lock
lock.wrlock()
lock.unlock()

ConditionVariable (pthread_cond)

ConditionVariable provides a synchronization mechanism between threads, allowing unlocking only when specific conditions are met.

let lock = MutexLock()
let condition = ConditionVariable()
var sharedResource = false
// Thread A
DispatchQueue.global().async {
   lock.lock()
   while !sharedResource {
   condition.wait(mutex: lock)
   }
   print("Condition met!")
   lock.unlock()
}
// Thread B
DispatchQueue.global().async {
   lock.lock()
   sharedResource = true
   condition.signal()
   lock.unlock()
}

Quickly sum up these locks

+-------------+-------------------------------------+-------------------------------------+
| Type        | Advantages                          | Disadvantages                       |
+-------------+-------------------------------------+-------------------------------------+
| UnfairLock  | Superior performance, especially    | Does not guarantee fairness, may    |
|             | in situations with low contention.  | cause some threads to wait for      |
|             | Simple implementation and easy      | long periods                        |
|             | to use                              |                                     |
+-------------+-------------------------------------+-------------------------------------+
| MutexLock   | Supports recursive locking,         | Slightly lower performance          |
|             | suitable for complex nested         | compared to UnfairLock              |
|             | scenarios. Can be configured with   |                                     |
|             | different lock types (e.g.,         |                                     |
|             | normal, recursive, error checking)  |                                     |
+-------------+-------------------------------------+-------------------------------------+
| RWLock      | Allows multiple concurrent read     | Relatively complex implementation,  |
|             | operations, improving performance   | may lead to delays in write         |
|             | in read-heavy scenarios. Suitable   | operations                          |
|             | for situations where reads are      |                                     |
|             | more frequent than writes           |                                     |
+-------------+-------------------------------------+-------------------------------------+
| Condition   | Provides a more flexible thread     | Improper use may lead to deadlocks. |
| Variable    | synchronization mechanism,          | Compared to simple locks,           |
|             | suitable for scenarios requiring    | understanding and usage are more    |
|             | waiting for specific conditions     | complex                             |
+-------------+-------------------------------------+-------------------------------------+

GCD

We can use the characteristics of queue to achieve effects similar to Mutex or RWLock. The following diagram shows how DispatchQueue can achieve effects similar to Mutex using serial queue and RWLock using con current with dispatch barrier.

Credit: https://betterprogramming.pub/the-complete-guide-to-concurrency-and-multithreading-in-ios-59c5606795ca

Concurrent queue, considered as a thread pool, may create opportunities for parallelism when initiating multiple tasks simultaneously if resources are available. This satisfies the requirement for concurrent read. When there’s a write requirement, a barrier task is initiated to ensure no parallel tasks are executing in the queue at the same time, thus providing concurrent read and sequential write.

// RWLock-like
let concurrentQueue = DispatchQueue(
    label: "concurrent.queue",
    attributes: [.concurrent]
)

// Read lock
let value = concurrentQueue.sync {
    // Safe to read
    return value
}

// Write lock
concurrentQueue.async(flag: .barrier) {
    // Safe to write
}

As for mutex, we only need to ensure that thers’s only one thread working in the critical section, which inherently conforms to the design of serial queue. Therefore, simply calling queue.async/sync would be enough.

// Mutex-like
let serialQueue = DispatchQueue(label: "serial.queue")

serialQueue.async {
    // Critical Section
}

When using locks, we must carefully design the scope of the critical section. We should precisely protect operations on shared resources, neither over-expanding the scope causing thread starvation, nor over-narrowing it to the point where consistency cannot be guaranteed. Reasonable critical section design can improve concurrent performance and reduce thread waiting time, but it doesn’t directly guarantee absolute fairness between threads. Developers need to balance lock granularity, holding time, and competition frequency to achieve a balance between protecting shared resources, improving performance, and maintaining reasonable thread scheduling.

Lock-free

Lock-free techniques attempt to achieve thread safety without using locks. These techniques can often provide better performance, especially in high concurrency situations, but are more complex to implement.

Swift Concurrency

The Swift Concurrency model introduced in Swift 5.5 provides a new way to handle concurrency and avoid data race.

actor BankAccount {
    private var balance: Double
    
    init(initialBalance: Double) {
        balance = initialBalance
    }
    
    func deposit(_ amount: Double) {
        balance += amount
    }
    
    func withdraw(_ amount: Double) throws {
        guard balance >= amount else {
            throw NSError(domain: "InsufficientFunds", code: 1, userInfo: nil)
        }
        balance -= amount
    }
    
    func checkBalance() -> Double {
        balance
    }
}

// Usage
Task {
    let account = BankAccount(initialBalance: 100)
    await account.deposit(50)
    do {
        try await account.withdraw(75)
        let balance = await account.checkBalance()
        print("Current balance: \(balance)")
    } catch {
        print("Withdrawal failed: \(error)")
    }
}

The Actor Model provides a safe and easy-to-use concurrency model, and the compiler can catch potential data races at compile time. Swift Concurrency can be used on iOS 13 and later thanks to back-deployment. A potential downside is the need to use the await keyword, which may affect code structure.

Credit: https://github.com/uriva/gamlajs

swift-atomics

GitHub - apple/swift-atomics: Low-level atomic operations for Swift

Low-level atomic operations for Swift. Contribute to apple/swift-atomics development by creating an account on GitHub.

github.com

Swift Atomics mainly relies on two key mechanisms: hardware-level atomic instructions and memory ordering. Hardware-level instructions like Compare-and-Swap (CAS) guarantee the atomicity of operations, while memory ordering options such as relaxed, acquiring, releasing, acquiring and Releasing, and sequentially consistent allow developers to balance performance and safety. These mechanisms jointly implement efficient and safe lock-free concurrent operations.

// Value type
let counter = ManagedAtomic<Int>(0)
DispatchQueue.concurrentPerform(iterations: 100) { _ in
   counter.wrappingIncrement(ordering: .relaxed)
}
print(counter.load(ordering: .relaxed)) // 100

As hardware and atomic instructions involve hardware-level design, the part of explaining Lock-free & Wait-free in the repo particularly mentioned that the result would be affected by CPU-supported instructions:

All atomic operations exposed by this package are guaranteed to have lock-free implementations. However, we do not guarantee wait-free operation — depending on the capabilities of the target platform, some of the exposed operations may be implemented by compare-and-exchange loops. That said, all atomic operations map directly to dedicated CPU instructions where available — to the extent supported by llvm & Clang.

Hardware-level Lock-free techniques require consideration of many aspects, including understanding the impact on performance from hardware and memory access, and the need to be aware of the ABA problem (A -> B -> A looks like the value hasn’t changed, but the memory location has actually changed). While pursuing high performance, problems also become more complex and increase the possibility of errors. In practical applications, the use of Lock-free technology should be weighed based on specific needs and scenarios.

Credit: CppCon 2016: JF Bastien “No Sane Compiler Would Optimize Atomics

Conclusion

Data race is a common problem in iOS development that can lead to serious consequences. Correctly understanding and handling data race is crucial for developing stable, high-performance iOS applications. By fully understanding your needs (performance, target deployment version, third-party libraries, frequency of handling concurrency needs, etc.) and choosing appropriate synchronization mechanisms, developers can effectively prevent and solve data race problems using either lock or lock-free techniques.

However, solving Data Race is not a one-time process. It requires developers to have a deep understanding of high-concurrency program design and consider thread safety at the design and implementation stages. Continuous code review and unit testing are also important means to ensure programs are free from data races.

With the continuous development of Swift and the community (thanks to all mighty developers), we have more tools to handle concurrency issues. Keeping learning and practicing the most suitable concurrent programming techniques will help develop more robust and performant programs.

However, regardless of the method, designing your program in a concurrent manner undoubtedly makes your business more complexi. Therefore, it’s also necessary to examine whether there is a real need for concurrent execution. Sequential execution can save us a lot of effort and make the program easier to maintain.

“Design programs synchronously as much as possible when synchronization is feasible.”