Core Data: don’t store large files as binary data

Alexander Edge
3 min readJul 31, 2014

--

TL;DR: if you plan to ever change your data model (let’s face it, who does?), store large (>100KB) files using the file system instead.

I’m a fan of Core Data. I used it four years ago in my first app, Twitcher (sadly no longer on sale), and we’re using it today in Peeps — fast, fun, simple group video messaging for iOS.

Data Model

Viewing messages in a conversation

Peeps’ data model contains an entity representing a short, 10-second video clip sent to other participants in the conversation.

For performance, large files such as video clips (> 100KB) would normally not be stored in a database as BLOBs, instead stored in the file system and referenced in some way. Since iOS 5.0, Core Data has the option to store attributes externally.

From What’s New in iOS 5.0:

Managed objects support two significant new features: ordered relationships, and external storage for attribute values. If you specify that the value of a managed object attribute may be stored as an external record, Core Data heuristically decides on a per-value basis whether it should save the data directly in the database or store a URI to a separate file that it manages for you.

We are using the external storage option in Peeps and can verify that video data are indeed saved externally in a new directory (in iOS 8.0 this looks like .YourAppName_SUPPORT/_EXTERNAL_DATA).

Migration

Most developers who have used Core Data in production apps will know that sometimes changes to the data model are unavoidable. In order to keep disruption to our beta testers to a minimum, we performed lightweight migration along the way and are now on version 9.

Our users love sending and receiving messages. While managing the space that these messages occupy on disk is something that we are currently working on, it is not a problem for our users just yet. Migration, however, is.

In the latest update (1.2.0), we performed lightweight migration on application launch. This was done asynchronously since we already knew that it took a significant amount of time. Unfortunately some of our users are encountering errors — corrupt data file here, insufficient disk space there. Copying the SQLite store as a backup during migration makes sense, but on closer inspection it appears to be copying the contents of the entire external data directory, too. I have filed rdar://17869205 for further clarification.

Benchmarking

I have benchmarked the difference between migration using binary data with external storage and using just a unique identifier. The sample project is on GitHub.

The test creates 1000 objects, each with a unique identifier and (optionally) a video file of 165KB as binary data. By adding a data model version with a new attribute, lightweight migration is performed on the next application launch.

When using binary data with external storage, migration took 37.5 seconds.

When not using binary data with external storage, migration took just 0.6 seconds.

It’s clear to see that storing large files in the file system with a naming scheme is the way to go. I have already begun working on an on-disk cache instead of using binary attributes.

Summary

Peeps is available in 45 countries and our users are sending thousands of messages. One of the most important aspects of the app is speed. Our users love how fast Peeps is. Fast to receive messages (we use background fetching), fast to send messages (we use fine-grained video compression). Core Data has been at the heart of it all and will continue to be so for the foreseeable future.

--

--