Apple’s New CloudKit-Based Core Data Sync
At WWDC19, Apple entered into another chapter in their struggles with getting Core Data to sync reliably. Is it a case of “third times a charm”, or more like “fool me twice”?
Full disclosure: I am the developer of Ensembles, a Core Data sync framework that works with many different services, including CloudKit, Dropbox, WebDAV, and even peer-to-peer. So I am certainly not unbiased when it comes to discussing sync frameworks. But it may come as a surprise to hear that I am rooting for Apple to get this right.
Ensembles has been a great success, in that (a) It works!, and (b) I managed to sell some licenses. An indie developer selling licenses for a source code framework is as rare as hen’s teeth, so the 6 years I have been developing the framework have been very rewarding. Now, my development effort is much more firmly focused on Agenda, and I would be happy for Apple to take over what really should have been in their court from day one.
Have no fear, I have no plans to stop developing Ensembles, even if Apple is successful. I will keep using it in my existing products, and keep maintaining it. I don’t anticipate any big new features, but I am invested in keeping it working, so Ensembles is not going anywhere.
Let’s go back to the Monday of WWDC when for the first time I saw a session in the schedule mentioning Core Data and CloudKit in the same text. At first, I confess to thinking it was going to be some tips for setting up your own sync engine using CloudKit, something which many, many developers have had to do. But it soon became clear via the Twitters that it was actually the third incantation of Core Data sync.
You might think that I would be shocked and worried by this, but I really wasn’t. For a start, Apple had clued me in to the fact they might be working on such a solution a few years back. They gave me no details, of course, but if I were shocked about anything, it was that it only arrived this year, rather than two years ago. I had actually given up on ever seeing it, figuring they probably decided not to ship it in the end. I guess the product planning at Apple spans many, many years (…as we are also hearing regarding SwiftUI).
Given I have a lot of experience working in Core Data sync, you are probably wondering what my impressions of the new approach are. Will it work? What are the potential issues? I haven’t taken a close look at it, but have watched the session, and discussed it on Twitter, and I will now offer my opinion on what they are doing. Again, given I have no direct experience with the framework, you should take some of this with a grain of salt, but at least it might point to potential pitfalls. (I wasted many months trying to get the original Core Data sync to work reliably, before giving up and reinventing the wheel. Let’s hope Apple have their bases covered this time.)
The general approach Apple are taking seems sound enough to me. They are using their new generational storage (with history) to track changes, and update the cloud from that. Effectively having “versions” in your store is very powerful, because you can bring in new sync changes while the UI of the app continues on oblivious. (If you are interested in pushing this approach to the extreme, check out my experimental LLVS project.) Later you merge in the changes. This is preferable to the nightmare we used to have with concurrent changes, where one context would be forced to merge in change notifications, and failing to do so would lead to exceptions.
The new generational approach to storage in Core Data may in fact be what they were waiting for the whole time. Apple probably realized the old system of storing transaction logs for every change was not ideal, and so planned first to introduce solid generational support, before adding direct CloudKit sync. I haven’t used the generational support in any of my projects yet — they are mostly legacy apps — but I have not heard anything bad about it, and it seems the “right” approach, so I think this aspect is on the money.
Assuming it works as advertised, why would you not immediately jump on Core Data CloudKit integration? Well, there are a bunch of “policy” related issues you probably want to consider first, which may sway your decision, at least in the short term.
To begin with, as with any new Apple technology, it will only work on new operating systems. If you are developing a new app now, it is probably fine; if you have an existing app, it will probably be a year or two before you can consider adding it (…unless you are prepared to tell customers to upgrade to get sync).
Now to the nitty gritty. The nature of Apple’s sync, where you effectively have distributed stores, means you can’t globally validate data like you can with a single central store. Apple have chosen to work around this by disallowing validation on relationships. It’s something to keep in mind. For example, if you have a one-to-one relationship that was previously non-optional, once you add sync, you will have to make it optional, and concurrent changes could lead to an orphaned object where the relationship is nil. This is not really avoidable in a decentralized syncing system, though how it is handled can vary. In Ensembles, the same problem can arise, and a delegate method is called to allow the app code to correct the issue (eg delete an object); Apple have opted to just disallow validation of relationships, which means you will probably need to add your own checks to “correct” the data.
The last problem I see with Apple’s approach is perhaps the most important. Like the original Core Data sync, the new CloudKit sync does not introduce any concept of global identity for your objects. Why does this matter? Not being able to match objects on one device to those on another means you have to do unnecessary, and often complex, deduplication of your data. It seems Apple didn’t really learn from their mistakes here. If you take a look at the sample code from WWDC19, you will find a relatively simple app which is littered with complex deduplication logic. Hardly an easy sync solution.
The sad part of this is that Apple could help a lot here by simply exposing the CloudKit record names to the developer (Submitted as Feedback Issue: FB6120135). Allowing the developer to choose the record names would introduce a configurable global identity, and the deduplication could be eliminated.
Ensembles has always had the notion of global identity. You can add sync to a Core Data app with Ensembles in about 20 lines of code, and that is because the framework itself is able to map objects on one device, to those on another. As a practical example, if you have already been syncing using some other mechanism, and move to Ensembles, you can provide the global identifiers you have already setup, and Ensembles will merge the data automatically, regardless of how many devices initially have a copy of the data. And if you have a singleton-like object, such as something for app settings, you can assign a predetermined identifier (eg “Settings”), so that all devices share the same set of options. Having control over global identity makes things much simpler, and I hope Apple realize this, and open up control over the record names in CloudKit.
Conclusions? I think there is promise here. It is certainly the best approach we have seen to Core Data sync from Apple. They seem to have the foundation in place (ie generational versioning) to do it properly. But even if they get it completely right, you will probably want to weigh the negatives: Can you be content with just supporting the latest operating systems? Are you happy being locked-in to CloudKit? Is it OK to drop relationship validation in your model, and adopt a lot of deduplication complexity? On the plus side, CloudKit offers free and fast sync, so there is plenty going for it. The ball is in your court.