Mastering HealthKit: Common Pitfalls and Solutions

Published in

MobilePeople

16 min readSep 19, 2023

Some time ago, I had a task to write a library to replace an existing 3rd party library that works with health data on a project. The main goal was to reduce expenses by removing an existing library and improving security using direct communication between the app and the back end.
PS: The information below does not represent production-ready solutions and only reflects the general pitfall you may come up with. So, I made separate abstract examples to highlight the essential pitfalls of using the HealthKit framework based on my experience.

The problem

The basic idea of the task is to send new data from the HealthKit to the back end. For now, let me start with the most simple diagram below.

At first glance, it sounds pretty simple. We only need to query data from HealthKit and send them to the server. But there are a few things that could make it more complex. Let me split a few of them into two categories: local and server-side challenges.

Local challenges:

How to fetch only the needed portion of data from HealthKit? Because if we bring all of the data at once, it will be too redundant and inefficient. Or even more, the app can crash. This challenge is valid, at least for the first app launch.

Server-related challenges:

What if a server requires us to create a concrete object aggregating data from different sources? For example, an object 'Summary' that contains some information (like Sleep, Steps, Weight, etc.) for a specific day. How should we manage IDs for created objects?
How should we cache fetched results from the HealthKit if a server returns an error?
How should we manage mapped (created from different sources, like in the previous example — 'Summary') objects when a user edits or removes some data from one data object?

Solution

The first intuition on how to solve the problem was using footprints (UUID value for each object), which will be stored on a server. This diagram will show you the base flow of it.

For example, on app launch, we fetch footprints from the server. Then, we bring data from the HealthKit and compare it with server data. If we have new items, add them to the server. If we have missed footprints, remove them from the server.

But how do we manage footprints in such cases? We can use an ID from the HealthKit, which is also UUID for non-grouping data objects. But, for grouping data objects (like 'Summary'), we must create a unique UUID on the mobile side. Because it's a composable object from other data objects, and HealthKit doesn't manage this work.

The solution with separate footprints has several drawbacks. The first one is that we need to fetch all data with each app launch (or any other user action) to compare footprints, and it will add a considerable performance overhead, especially if the user has a lot of data in the HealthKit. Also, we needed to map and filter HealthKit objects to server objects. Imagine we have millions of data objects, giving a performance gap in your app. The second problem is that the server must store all the data and give us a separate endpoint, which we can use often. How should we avoid redundant server-client communication and fetch only needed data from the HealthKit?

Luckily, the HealthKit has an object that gives us a snapshot of the last changes in the database. This object is named HKQueryAnchor.

An object used to identify all the samples previously returned by an anchored object query.

The updated flow will look in the following way.

With HKQueryAnchor, we have handled the issue with redundant request results to the HealthKit database, and we got rid of the additional API request and separate storage for its footprints.

But how to store HKQueryAnchor? Depending on your needs, you can save it in the cloud, on your server, or locally on a device (for instance, Keychain). Also, if your app has authorization, remember to include the user's unique ID in the HKQueryAnchor.

It is essential to mention that HKQueryAnchor can't be reused for different HKSampleTypes. For instance, you have two queries to the HealthKit. The first is for HKWorkoutType, and the second is for HKQuantityType. In such a scenario, you should have two different HKQueryAnchor objects. So, if you have many users and each will have additional requests for other types, then the memory complexity is O(m* n) when m is the number of users and n is the number of available HKSampleTypes.

The updated flow will have the following diagram.

The code for saving and retrieving the data from the Keychain is presented below.

func save(anchor: HKQueryAnchor, userId: Int) {
    do {
        let data = try NSKeyedArchiver.archivedData(withRootObject: anchor, requiringSecureCoding: false)
        var attributes: [String: Any] = [
            kSecAttrAccount as String: userId,
            kSecValueData as String: data
        ]
        guard let oldAnchor = retrieve(userId: userId) else {
            attributes[kSecClass as String] = kSecClassInternetPassword
            let addStatus = SecItemAdd(attributes as CFDictionary, nil)
            if addStatus != errSecSuccess {
                assertionFailure("*** SecItemAdd error: \(addStatus)")
            }
            return
        }
        guard oldAnchor != anchor else { return }
        let query = retrieveQuery(params, returnData: false)
        let updateStatus = SecItemUpdate(query, attributes as CFDictionary)
        if updateStatus != errSecSuccess {
            assertionFailure("*** SecItemUpdate error: \(updateStatus)")
        }
    } catch let error {
        assertionFailure("*** Unable to store new anchor \(error)")
    }
}

func retrieve(userId: Int) -> HKQueryAnchor? {
    do {
        let query = retrieveQuery(userId: userId)
        var item: CFTypeRef?
        let copyStatus = SecItemCopyMatching(query, &item)
        guard copyStatus == errSecSuccess else {
            debugPrint("*** SecItemCopyMatching error: \(copyStatus)")
            return nil
        }
        guard let existingItem = item as? [String: Any],
              let data = existingItem[kSecValueData as String] as? Data else {
            debugPrint("*** Something went wrong trying to find the anchor in the keychain")
            return nil
        }
        let anchor = try NSKeyedUnarchiver.unarchivedObject(ofClass: HKQueryAnchor.self, from: data)
        return anchor
    } catch let error {
        assertionFailure("*** Unable to retrieve an anchor \(error)")
        return nil
    }
}

private func retrieveQuery(userId: Int, returnData: Bool = true) -> CFDictionary {
    var query: [String: Any] = [
        kSecClass as String: kSecClassInternetPassword,
        kSecAttrAccount as String: userId
    ]
    if returnData {
        query[kSecMatchLimit as String] = kSecMatchLimitOne
        query[kSecReturnAttributes as String] = true
        query[kSecReturnData as String] = true
    }
    return query as CFDictionary
}

The Keychain storage could be a good place if you have a limited server and ensure your data will exist after the app is uninstalled. But remember that in such cases, you can easily overload the Keychain storage, and this data will be saved when your app is not used anymore.

Also, it is essential to mention a case for apps that can make a HealthKit request simultaneously for the same user. If the iOS user has different Apple accounts, we will send two different data types from two accounts. But what if a user has the same account on iOS and iPadOS, for example? In such a case, this could be a tricky and unpredictable result. The Keychain should not create an issue because it synchronizes keys in the same Cloud account until Apple has long throttled for the Keychain key updates between different devices. But as far as this is rare, I'll not stop on it now.

If we have a massive amount of data, for example, on the first launch, we probably need to send them not in one request to our backend. Primary, because otherwise, we can overload our backend. There are several solutions for this. But maybe the most common is to split our data into batches and send them one by one. But what if one request will fail? To resolve this potential bug, I see at least two solutions. The first is to store failed requests and send them over on the next iteration. The second is to fetch not all of the data but rather batches of data from HealthKit. For example, to query each month separately. But this could be overcomplicated for most apps, so the possible solution at the beginning will be solution #1. The code with the batched request can have the following representation:

request.chunks(ofCount: 200).forEach { chunk in
    Task {
        let result = await HKAPI.request(.postHKData(chunk))
        switch result {
        case .success:
            guard isFailedData else { return }
            database.updateFailedFootprints(remove: footprints)
        case .failure:
            database.updateFailedFootprints(add: footprints)
        }
    }
}

If you follow the approach with a batching request, don't make them in parallel because the benefit in time you get will require you to cover more error handling and possibly have more bugs.

But what if a user removes an app, and we lose all failed requests? Does it mean that he potentially may lose some of the historical data? In this case, unfortunately, yes. A solution for it could be a Keychain storage, which will save all the needed requests. However, it's a lousy way what should you try to store there. Another option is to keep the last successful HKQueryAnchor object with a boolean property, indicating that the request failed. After that, we can fetch data from a server, filter differences, and send one more time. But both solutions look like a potential spike with many error handling that could add more bugs rather than a benefit. So I can't recommend them because it could be a rare case in production. It's a topic to discuss with a client instead of trying to hover over all possible rare-edge cases.

Good, we resolved some of the potential issues. Next, let's move to the authorization. Imagine the following situation when a user starts an app for the first time. In this case, we should handle the authorization flow. Also, to improve UX, we can add the possibility of turning authorization off in the app. It's not a native approach because iOS doesn't allow us to change the permission in the app.

There are a few ways to fetch new data from the HealthKit, for example, with each app launch, manually when the user wants it, or with any other trigger from your app. For this reason, users can spam requests to the HealthKit. So, one more thing we need to add is throttling.

As far as we can improve UX and add the possibility to turn off the authorization, we need to decide where to store his choice. I'd rely on UserDefault because it's simple, doesn't require any relationship, and could be easily replaced/removed. It's a boolean value (or values if you have multiple users), so the app removal will not affect our flow. The updated flow will have the following diagram. The grey zone on the diagram below is the flow part that could be separated into a separate target or the library.

With the authorization, we must decide what data types we will request to read and write data. Reading types could also be helpful when asking for data from HealthKit because we can prefetch needed sources and then fetch data with the predicates using these sources. The access request code will have the following representation.

let requestAccess: @escaping () async -> Result<Void, Error> = {
    guard HKHealthStore.isHealthDataAvailable() else {
        return .failure(HKClient.Error.healthDataUnavailable)
    }

    do {
        let typesToShare: Set<HKSampleType> = .init(SourceType.writeTypes.map(\.sampleType))
        let typesToRead: Set<HKObjectType> = .init(SourceType.readTypes.map(\.objectType))

        let beforeRequestStatus = try await store.statusForAuthorizationRequest(
            toShare: typesToShare,
            read: typesToRead
        )

        switch beforeRequestStatus {
        case .shouldRequest:
            break
        case .unnecessary:
            return .success(())
        case .unknown:
            return .failure(HKClient.Error.unknownStatusForAuthorizationRequest)
        @unknown default:
            return .failure(HKClient.Error.unknownDefaultStatusForAuthorizationRequest)
        }
        
        try await withUnsafeThrowingContinuation { (continuation: UnsafeContinuation<Void, Error>) in
            store.requestAuthorization(
                toShare: typesToShare,
                read: typesToRead
            ) { _, error in
                if let error {
                    continuation.resume(throwing: error)
                } else {
                    continuation.resume(returning: ())
                }
            }
        }
        return .success(())
    } catch let error {
        return .failure(error)
    }
}

Also, apart from reading types, as input, we need a user ID to fetch the appropriate HKQueryAnchor and start date, which your manager or server should give because the historical data could be vast and useless for an app. The detailed HealthKit flow will have the following diagram.

After we fetched and received new/changed data and removed IDs, we can map it to the server data and save the updated user's HKQueryAnchor.

The next part of my solution will focus on workarounds over grouping objects. If you already have an existing backend with grouping objects, as I had, and you cannot modify it, then this part is for you.

First, let's add a new term, checksum — a footprint(ID) of a grouping object. The difference between Footprint ID and Checksum ID is that checksum if applied for grouping objects, is formed by a developer and contains many footprints. A footprint — is created by the HealthKit, and this framework is responsible for returning the same ID for the same object. While checksum lies on developer responsibility, it can be managed manually and saved locally, remotely, or in both places.

PS: If you need to hash your IDs for security reasons, for example, using md5, the better way to implement it is before sending it to a server by overriding the encode method. In such a case, you will avoid many additional possible bugs in your app.

struct Checksum {
    let id: String // generated by developer
    let type: String // type, e.g. - Summary
    let footprints: Set<String> // ids from the HealthKit
    let timestamp: Date // the day of the checksum
}

I suggest saving it in a database in our example because it could have relationships and many objects. But if we keep it only in the database, there could be a case when a user removes an app that has failed to save entities on a server in your database. For this reason, you need to have a workaround. For example, to encode this data to the Keychain. Another approach is to force fetch all HealthKit data objects and search for diffs.

Let's step back to our flow. We fetched new objects and removed IDs from the HealthKit. Now, we need to identify which objects changed to the server. Objects that changed — IDs that the HealthKit returns as not existing, plus objects that the HealthKit returns as new data objects. But there could be new grouping and new not grouping objects.

For this reason, we have to map new objects to new non-grouping IDs and new grouping IDs, where a grouping ID is an object that can have many IDs inside. Then, we take all checksum objects from our storage (e.g., database) and filter items that contain at least one footprint from the changed objects. This will be the first part of the objects we must remove from a server. The code of this part will look in the following way.

let (newItems, removedIds) = await requestHKChanges(from: startDate)
guard !newItems.isEmpty || !removedIds.isEmpty else { return }
// non-grouping objects will have only 1 ID, which is returned from the HealthKit
// grouping objects will consist of info from many non-grouping objects
let newFootprints = newItems.flatMap({
    $0.isGrouping ? $0.footprints : [$0.uuid.uuidString]
})
let changedFootprints = Set(removedIds + newFootprints)
// here I fetch it from the database but you can choose any type
let checksums = fetchChecksums(for: userId)
let changedDBObjects = Set(checksums
    .filter({ 
      $0.footprints.contains(where: { 
          changedFootprints.contains($0)
      }) 
    })
)

This flow will have the following diagram representation.

The next part is to find out the dates of grouping objects that have changed and fetch from these dates the HealthKit data one more time. This is essential if we query the HealthKit database from the existing HKQueryAnchor objects, which means we do not query for the first time. Otherwise, we will only send the updated part of the grouping objects to the server. But if your server accepts this, then it's good. In my case, I couldn't change the server logic and needed to remove the updated objects first and then create them again.

let (newGroupingObjects, newNonGroupingServerData) = newItems.grouped()
let removedChecksums = Set(checksums
    .filter({ 
       $0.footprints.contains(where: { 
          removedIds.contains($0)
       })
    })
)
let groupingServerData: [Content]
let groupingObjectsToRemoveAfterDayFetch: Set<Checksums>
// Optimisation for the first fetch time of historical data to avoid double fetch
if hasAnchor(for: userId) {
    let existingChecksumDates = Set(newGroupingObjects.map(\.timestamp.dayStart) + removedChecksums.map(\.timestamp.dayStart))
    let objectsFromDates = await requestChanges(from: existingChecksumDates.sorted()).grouped()
    groupingServerData = objectsFromDates.grouped().filter({ existingChecksumDates.contains(\.timestamp.startOfTheDay) })
    let groupingServerDataSet = Set(groupingServerData.flatMap(\.groupingFootprints))
    // if we need to add existing footprint then we HAVE TO remove this checksum object too first
    groupingObjectsToRemoveAfterDayFetch = .init(checksums
        .filter({ groupingSupportedTypes.contains($0.type) }) // as far as I fetch only grouping object then I can remove only grouping objects as well
        .filter({ el in !el.footprints.intersection(groupingServerDataSet).isEmpty }))
} else {
    groupingServerData = addedGroupingObjects
    groupingObjectsToRemoveAfterDayFetch = []
}

let objectsToRemove = changedRealmObjects.union(groupingObjectsToRemoveAfterDayFetch)
guard !newNonGroupingServerData.isEmpty || !groupingServerData.isEmpty || !objectsToRemove.isEmpty else { return }

In the code above, I added one more request — from dates. This code is pretty simple. We must create an HKAnchoredObjectQuery with the type, predicates, and anchor and set a limit.

func fetchSamples(
    store: HKStore,
    dates: [Date],
    // Important: don't use array and fetch at once because if one permission is denied then the final result will be empty.
    sampleType: HKSampleType,
    sources: HKSourceQuery.Result
) async -> HKSampleQuery.Result {
    return await withUnsafeContinuation { (continuation: UnsafeContinuation<HKSampleQuery.Result, Never>) in
        let query: HKAnchoredObjectQuery = .init(
            type: sampleType,
            predicate: .samplesFromDates(sources, dates),
            anchor: nil,
            limit: HKObjectQueryNoLimit
        ) { _, samples, deleted, _, error in
            continuation.resume(returning: samples ?? [])
            guard let error else { return }
            assertionFailure("*** HKAnchoredObjectQuery error: \(error.localizedDescription)")
        }
        store.execute(query)
    }
}

Also, I added a comment — don't use an array and fetch at once because if one permission is denied, the final result will be empty. This is crucial because you can occasionally miss this case during the tests. So, bear in mind about this.

The predicate parameter allows us to fetch only needed data from the HealthKit. In my example, I created a predicate to query only specific dates and sources. The source of the object is the app name. For example, you may need to request only data from the Health app.

var samplesFromDates: (HKSourceQuery.Result, [Date]) -> NSPredicate {
    { sources, dates in
        let dateRangePredicate = dates.map({
            HKQuery.predicateForSamples(withStart: $0.startOfTheDay(), end: $0.endOfTheDay())
        })
        let datePredicate = NSCompoundPredicate(orPredicateWithSubpredicates: dateRangePredicate)
        let sourcePredicate = HKQuery.predicateForObjects(from: sources)
        let predicate = NSCompoundPredicate(andPredicateWithSubpredicates: [datePredicate, sourcePredicate])
        return predicate
    }
}

But as far as this post about pitfalls, let me highlight one more issue you may face. Imagine you try to fetch too many dates at once. In such a scenario, the parameter query will contain a long parameter, which will overcomplicate the request to the HealthKit. As a result, you receive a crash.

I see at least two possible resolutions for it. The first one is to update the HKAnchoredObjectQuery limit parameter. The second one is to use the chunks method.

func requestChangesFromDates(_ dates: [Date]) async -> [ServerRequest.Content] {
    await withTaskGroup(of: Result<HKClient.Response, Error>.self) { group in
        // 900 max non-crashable value from tests - you can change it or improve
        dates.chunks(ofCount: 900).forEach { datesChunk in
            group.addTask {
                await hkClient.requestDataFromDates(
                    .init(
                        dates: .init(datesChunk),
                        userId: userId
                    )
                )
            }
        }
        var results = [ServerRequest.Content]()
        for await result in group {
            switch result {
            case .success(let data):
                results += data
            case .failure(let error):
                debugPrint("*** Chunk HK error for dates", error)
            }
        }
        return results
    }
}

The updated flow will have the following diagram.

The final part is about mapping objects and sending them to the server. I'll start with the code snippet.

let deleteObjects: [DeleteType] = objectsToRemove.map({ el in
    // if checksum == first footprint then this is non-grouped type
    var checksum = el.checksum
    if el.checksum == (el.footprints.first ?? "") {
        checksum = checksum.uuidString
    }
    return .init(footprint: checksum, type: el.type)
})

let createdGroupedChecksumObjects: [Checksum] = groupingServerData.map({
    let data = $0.footprint
    return .init(
        checksum: data.uuid.uuidString,
        type: data.type,
        footprints: .init($0.groupingFootprints),
        timestamp: data.timestamp
    )
})

let createdNonGroupedChecksumObjects: [Checksum] = addedNonGroupedServerData.map({
    let data = $0.footprint
    return .init(
        checksum: data.uuid.uuidString,
        type: data.type,
        footprints: [data.uuid.uuidString],
        timestamp: data.timestamp
    )
})

let createdChecksumObjects = createdGroupedChecksumObjects + createdNonGroupedChecksumObjects

if !deleteObjects.isEmpty {
    let succeedRemoved = await repository.deleteHKData(deleteObjects) != nil
    // If a server fails to remove grouped objects, then, stop the process
    // Otherwise, you try to add new items that have the previous version on the server
    guard succeedRemoved else { return }
}
saveChecksums(.init(userId: userId,
                    removedChecksums: objectsToRemove.map(\.checksum),
                    newFootprints: createdChecksumObjects))
saveHKAnchors(changes.anchors.map({ .init(userId: userProvider.userId, sampleType: $0.1, anchor: $0.0) }))
repository.postHKData(groupingServerData + addedNonGroupedServerData, isFailedData: false)

Firstly, we mapped objectsToRemove to server objects. Then, in the same way, objects have to be created on a server. If the delete request fails, don't go further because we can't add new data to the server if it has the old version. We have to notify you to delete the previous version first. After success, you can save the HKQueryAnchor and the last fetch date.

When testing your final logic using different apps sending data to the HealthKit, you may face unexpected behavior. For example, some apps may update the HealthKit differently. Let me go through the example of the Connect app (Garmin). Let's take a simple example: You added new steps in this app, and when you query the HealthKit in your app, you expect to receive only the updated steps. But, this app works with the HealthKit differently. If the new steps are added, they remove the previous object for the current date and add new items instead of editing them. As a result, you receive removed objects and new objects in your app instead. So, be careful with it, too, and don't rely on expected behavior only. And regarding my previous examples with Groupping objects, in this scenario, you will encounter a problem when you expect some new data only, but instead, you need to update all of this data one more time with your flow in the app, which sounds terrible unless you cache data from HealthKit, which is overcomplicated and useless.

Summary

The moral of this post is very simple. If you'd like to avoid most of these bugs, discuss how to rely on this logic on the server. Otherwise, if you don't have a chance, like me, this post should cover most of the edge cases you can face. Also, my recommendation here is to use the native structure and try to avoid combining them on the client side.

PS: The presented solutions are far from ideal and could not fit into product-ready solutions. My goal is to highlight as many pitfalls before you use the HealthKit so you can predict the complexity of your task and avoid possible bugs in advance.

Mastering HealthKit: Common Pitfalls and Solutions

Written by Dmytro Barabash