Prefetching images size without downloading them [entirely] in Swift

A journey into popular image formats to find how to get their size by fetching as little as needed

The original article is also available on my blog.

Working with custom layouts and remote images can be tricky; it can easily become a chicken-eggs problem where you need the size of your images to reserve the correct space for your layout but you have to download all of them to make it right. It’s a bit mess and even if you use tables / collections you cannot do good prefetching.

Adjusting layout incrementally while you are downloading produce the well know web-like effect where every element become a crazy clown until the end of the download.

Your final result is a poor UX / UI experience and many disappointed users. So, if you are not enough lucky to deal with your backend colleague (or possibly kill your designer) you may be in trouble.

But don’t worry, I’m here to say compromise is possible.

You know, every image is a stream of binary data, well-structured where the size of the canvas, along with some other interesting infos, is made explicit at the very beginning of the file itself: so if you start downloading only a bunch of data just to get these info you can stop the download immediately and obtain the size of your image. Typically this operation require only few bytes (even if your downloaded block maybe larger around 50kb or less) of download per image, regardless of the real file size.

Obviously the structure of these data is strictly related to the format of the image itself. So the first thing we need is to know how to read what is called the header of a file; we’ll do it for the most common web image formats (PNG, JPEG, GIF and BMP) but you can expand it to support even more file types

Each of these formats start with an unique signature which can tell us the kind of format used to encode the data, followed at certain point by another chunk of data with the size of the image canvas. Let’s start exploring it!

PNG

A PNG file consists of a PNG signature followed by a series of chunks (complete specs are here). The first 8 bytes of a PNG file always contain the a fixed signature which just indicates that the remainder of the file contains a single PNG image. PNG file is grouped in chunk of data; each chunk of data is composed by 4 parts:

  • Length (4 bytes) which defines the length of the chunk
  • Type (4 bytes) which defines type of chunk
  • Data (variable) which contains the chunk’s data
  • CRC (4 bytes) redundancy code to validate the correctness of the data above

The chunk we are interested in is called IHDR and — as to specs — must be always appear first just after the signature. It contains the following ordered data:

An example of PNG file in an HEX editor. The red section is the IHDR chunk with the width and height field highlighted.

Clearly we are interested to width and height attributes only: they are 4-byte integers (zero is an invalid value). The maximum for each is 231 − 1 in order to accommodate languages that have difficulty with unsigned 4-byte values.

So in order to get the size of a PNG file we just need of 33 bytes regardless the total size of the image.

GIF

GIF is a bitmap image format; it starts with a fixed length header (GIF87a or GIF89a where 7a and 9a identify the version) immediately followed by a fixed length Logical Screen Descriptor giving the size and other characteristics of the logical display.

An example of GIF file with the initial width/height fields just after the GIF signature.

With 10 bytes we are ready to catch the size of a gif even before downloading it. If you are interested in GIF format the original RFC is a great start.

JPEG

A JPEG image file can be of two different formats: if we read FF D8 FF E0 signature it’s a JPEG File Interchange Format (the most common), while with FF D8 FF E1 it’s an Exchangeable Image File Format (which is a bit more complicate to parse).
We’re dealing with the first one; a JPEG file consists of a sequence of segments, each beginning with a marker (which is marked with 0xFF byte followed by a byte indicating what kind of marker is it). The frame dimension are located in a segment called SOF[n] (which mean Start Of Frame n — where n means something reserved to JPEG decoder); there is not a particular order for these segments so we need iterate over our data searching for one of these patterns: FFC0, FFC1 or FFC2.
The values are big endian, so we may have to reverse the bytes on our system; once we find this frame, we can decode it to find the image height and width.

JPEG File is a bit more complex; we need to find the SFOn frame which is placed without a fixed order. So we should iterate over the date until we found it.

Our Project

Now that we know some dirty secrets of our image formats we can go further by writing a program which act as a generic pre-fetcher. The scope is to provide a GCD based class where you are able to enqueue your request and receive the result in a callback function.
Additionally our class may maintain an internal cache to immediately return already performed url requests (NSCache is good enough for our demo but in a real scenario we should keep this data on disk; nothing complex but it’s outside our scope right now).

We can use it to perform fetching in a widely range of cases, like pre-fetching for UITableView or UICollectionView.

Conceptually our project has 3 different entities: a ImageFetcher (which just expose a function to enqueue our request by passing at least url and a callback; internally it also manage the queue) a FetcherOperation (subclass of Operation, which is responsible of the async data download via URLSessionTask) and an ImageParser (which evaluate partial data and eventually return format and size of the image).

ImageFetcher

As we said the fetcher is essentially a class which manage a queue of operations, keep a cache and manage an URLSession to download efficiently our data. This class must be an NSObject in order to be conform to URLSessionDataDelegate which is used by the URLSession instance to report each new portion of data.

ImageFetcherOperation

ImageFetcherOperation is just a subclass of Operation and its used to encapsulate the logic behind the data download and lookup.
An operation receive data from URLSession instance from parent class (the fetcher) and attempt to call the ImageParser until a valid result is returned (or it fails with an error). Once a result is provided the operation is immediately cancelled (avoiding the download of further data of the image) and the result is returned to the callback.

ImageParser

The parser is the core of the the project; it takes a bunch of data and attempt to parse it in one of the supported formats.

First of all it checks the file signature at the very beginning of the stream; if no known signature has been found, parent operation will be cancelled and an unsupportedFormat error is therefore returned.

When a known signature has been found the second check is performed on data length: only when enough data for the format is collected parser can go further and attempt to retrive the size of the frame (until then it returns nil and operation continues to accumulate image data from server).

If enough data is available the last step is to parse the stream and search for frame size; obliviously it’s strictly related to the file format as we seen above. Code is pretty straightforward and fast: except for JPEG (where we need to make a small iteration) all the remaining formats had fixed length fields.

The code below illustrate this class:

Now you are able to call image parser just doing:

Conclusion

How much you can save? Due the nature of JPG format the interesting segment has not a fixed position so data download maybe variable; moreover you are not in control of the downloaded packet data length most of the time you are receiving more data than you really need (but even so you do not need to fetch much of the image to find the size).
However, in most cases the the downloaded data is below 50 KB.

Where we can use this class? Probably the best place to use this fetcher is inside the UICollectionView/UITableView pre-fetching methods introduced in iOS 10: with this function you are able to fetch the size of the image prior showing it to user.

The project, along with a CocoaPods/SPM structure is available on GitHub page. Fell free to use it for your needs and tell me if you like it. Any PR is welcomed!

Bonus Track

If you are interested in learning more about file formats Synalyze It! Prois a great software which also highlights the grammar of lots of popular file formats!