Crash analysis: Unable to load publicsuffixes.gz resource from the classpath: 64K limit of ZIP

15 min readJul 9, 2022

The Android client of my company is divided into the main app and a series of feature components integrated through SDKs. We have different build variant for different markets. For example, th is for the Thai market, tw is for the Taiwan market and vn for Vietnam market. At the beginning of the year, the main app encountered a very strange problem when building apk: when SDK A and SDK B were integrated into the main app at the same time, the internal APK for th and vn would crash at runtime, making the internal APK unusable. The root cause of the crash is that the publicsuffixes.gz file in okhttp is missing from the APK, which causes an exception to be thrown in the okhttp code and the app crashes. The missing file is not only publicsuffixes.gz, a lot of other java resource files are missing.

java.lang.IllegalStateException: Unable to load publicsuffixes.gz resource from the classpath

The investigation was inconclusive at the time, but found the following two solutions:

After removing the fabric crash SDK and unifying firebase SDK to a certain version, this problem will disappear.
Update the android gradle plugin version and the problem goes away too.

At that time, it was suspected that the grale plugin we added during the app build process triggered a bug in the android gradle plugin, resulting in the java resources not being packaged into the apk. The investigation was not continued as the problem disappeared.

A month later, the main app integrated a new SDK C, and found that the internal APK of th and vn had the problem of missing java resources again, and the previous SDK A happened to only integrated into th and vn.

After a day of trying, I found that no matter how I commented out the plugin or exclude some dependencies, the problem could not be solved.

Due to the huge impact of this problem, if the root cause is clear, even if it is temporarily solved now, it will be difficult to ensure that it will not recur in the future. So I decided to dig deeper into the problem.

To be clear, the java resources mentioned here are not resources in the res directory of Android Project. Generally, java resources can be seen in the jar, such as publicsuffixes.gz in okhttp, which can be found in the jar of okhttp:

Investigate in depth

Check third-party gradle plugin

Since the build problem only occurs in the internal APK that integrates SDK A, and the integration of SDK A requires the introduction of a gradle plugin provided by them. I first suspected that the third-party gradle plugin had bugs that interfered with the build process by mistake.

Generally, third-party gradle plugins will use the Transform API to process the compiled intermediate products. According to the code, the Transform API can handle both CLASSES and RESOURCES, where RESOURCES refers to java resources.

Checked the multiple plugins introduced in the app, checked their use of the Transform API, and found that their transform process only processes classes. For example, the following is the source code of aspectj’s plugin. It can be clearly seen that the return value of Transform.getInputTypes is only CLASSES, excluding RESOURCES.

override fun getInputTypes(): Set<QualifiedContent.ContentType> {
    return Sets.immutableEnumSet(QualifiedContent.DefaultContentType.CLASSES)
}

So the problem of missing java resources should not be caused by the custom gradle plugin. There is a high probability that a bug in the android gradle plugin has been triggered.

Check for compilation intermediates

Use the internal build type to build internal th APK locally, check the compiled product and find that a large number of java resources are missing in the output apk.

As the app’s gradle.properties has the following configuration:

android.enableR8=false

We know that that proguard is used for code obfuscation when building the internal APK. According to the log at the time of building, the version of proguard is 6.0.3.

Check the compilation intermediate output and find that the output of the proguard step app/build/intermediates/transforms/proguard/appThailand/0.jar contains all the java resources and class files in the app. The proguard stage is the last step of Transform, and the output contains all java resource files, which also proves that all plugins that use the Transform API have not deleted the java resources by mistake. I suspected the problem caused by the third-party plugin before, and this suspicion was wrong.

So my guess is the missing of java resources is most likely to occur when packaging the APK file.

Investigate Android Gradle Plugins (AGPs)

The version of the android gradle plugin (Android Gradle Plugin, hereinafter referred to as AGP) used for app assembling is 3.4.1. Download the source code of the corresponding version, and inspect PackageApplication.java which is the source file of packaging APK task. Insert lots of logs to print information in the part related to packaging java resources, compile a 3.4.1 AGP by yourself, then replace the AGP used in assembling with my locally compiled version. After doing this, build the th internal APK to reproduce the problem.

I found bugs in the following method:

private static ImmutableMap<RelativeFile, FileStatus> getJavaResourcesChanges(
        Iterable<File> javaResourceFiles, File incrementalFolder) throws IOException {
    ImmutableMap.Builder<RelativeFile, FileStatus> updatedJavaResourcesBuilder =
            ImmutableMap.builder();
    for (File javaResourceFile : javaResourceFiles) {
        try {
            updatedJavaResourcesBuilder.putAll(
                    javaResourceFile.isFile()
                            ? IncrementalRelativeFileSets.fromZip(javaResourceFile)
                            : IncrementalRelativeFileSets.fromDirectory(javaResourceFile));
        } catch (Zip64NotSupportedException e) {
            ...
        }
    }
    return updatedJavaResourcesBuilder.build();
}

There is only one file in javaResourceFiles, namely app/build/intermediates/transforms/proguard/appThailand/0.jar, which is the output of the proguard phase. It is known that this file contains all java resources and class files.

But after updatedJavaResourcesBuilder.putAll, only about 300 items are stored in it, while the tw assembling task stores all items. The key to the problem is why IncrementalRelativeFileSets.fromZip does not return all files. Tracking the IncrementalRelativeFileSets.fromZip source code shows that it finally calls the following method:

@NonNull
public static ImmutableSet<RelativeFile> fromZip(@NonNull File zip) throws IOException {
    Preconditions.checkArgument(zip.isFile(), "!zip.isFile(): %s", zip);
    Set<RelativeFile> files = Sets.newHashSet();
    try (ZFile zipReader = ZFile.openReadOnly(zip)) {
        for (StoredEntry entry : zipReader.entries()) {
            if (entry.getType() == StoredEntryType.FILE) {
                files.add(new RelativeFile(zip, entry.getCentralDirectoryHeader().getName()));
            }
        }
    }
    return ImmutableSet.copyOf(files);
}

So the question is why ZFile.entries is not returning all files in 0.jar correctly. ZFile is a class in apkzlib in AGP, apkzlib is mainly used to read and create ZIP files.

Investigate apkzlib

The source code of ZFile.entries is as follows:

public Set<StoredEntry> entries() {
    Map<String, StoredEntry> entries = Maps.newHashMap();
    for (FileUseMapEntry<StoredEntry> mapEntry : this.entries.values()) {
        StoredEntry entry = mapEntry.getStore();
        assert entry != null;
        entries.put(entry.getCentralDirectoryHeader().getName(), entry);
    }
    /*
     * mUncompressed may override mEntriesReady as we may not have yet processed all
     * entries.
     */
    for (StoredEntry uncompressed : uncompressedEntries) {
        entries.put(uncompressed.getCentralDirectoryHeader().getName(), uncompressed);
    }
    return Sets.newHashSet(entries.values());
}

Since this is only reading the contents of the jar, not creating the jar, there are no elements in uncompressedEntries, all elements come from this.entries, and we only need to know how the values in this.entries are populated.

When the ZFile is opened, this.entries is populated with readData:

private void readData() throws IOException {
    ...
 
    if (directoryEntry != null) {
        CentralDirectory directory = directoryEntry.getStore();
        assert directory != null;
 
        entryEndOffset = 0;
 
        for (StoredEntry entry : directory.getEntries().values()) {
            ...
            FileUseMapEntry<StoredEntry> mapEntry = map.add(start, end, entry);
            entries.put(entry.getCentralDirectoryHeader().getName(), mapEntry);
            ...
        }
 
        directoryStartOffset = directoryEntry.getStart();
    } else {
        ...
    }
    ...
}

Obviously, the number of entries in this.entries depends on the number of entries in the return value of CentralDirectory.getEntries:

Map<String, StoredEntry> getEntries() {
    return ImmutableMap.copyOf(entries);
}

Similarly, just need to know how the entries in the CentralDirectory are populated. CentralDirectory.makeFromData will be called in ZFile, and CentralDirectory.readEntry will be called in a loop in it to fill the entries in CentralDirectory. Each time readEntry is called, there will be one more entry in entries.

static CentralDirectory makeFromData(ByteBuffer bytes, int count, ZFile file, ByteStorage storage)
        throws IOException {
    ...
    CentralDirectory directory = new CentralDirectory(file);
 
    for (int i = 0; i < count; i++) {
        try {
            directory.readEntry(bytes, storage);
        } catch (IOException e) {
            ...
        }
    }
 
    return directory;
}

The count parameter of the makeFromData method determines how many elements are in ZFile.entries. When opening a ZFile, the ZFile.readCentralDirectory method calls the makeFromData method:

private void readCentralDirectory() throws IOException {
    ...
 
    Eocd eocd = eocdEntry.getStore();
 
    ...
 
    CentralDirectory directory =
            CentralDirectory.makeFromData(
                    ByteBuffer.wrap(directoryData), eocd.getTotalRecords(), this, storage);
    ...
}

Eocd.getTotalRecords is the parameter count, now just need to figure out how the return value of Eocd.getTotalRecords comes from:

int getTotalRecords() {
    return totalRecords;
}

Eocd.getTotalRecords returns totalRecords directly, and totalRecords is assigned when Eocd is constructed:

Eocd(ByteBuffer bytes) throws IOException {
    F_SIGNATURE.verify(bytes);
    F_NUMBER_OF_DISK.verify(bytes);
    F_DISK_CD_START.verify(bytes);
    long totalRecords1 = F_RECORDS_DISK.read(bytes);
    ...
    Verify.verify(totalRecords1 <= Integer.MAX_VALUE);
    totalRecords = Ints.checkedCast(totalRecords1);
    ...
}

The value of totalRecords comes from totalRecords1, which is the return value of F_RECORDS_DISK.read. To know what’s going on here, you need to understand what EOCD is.

EOCD

EOCD (End Of Central Directory record) is a structure in ZIP file.

According to the ZIP documentation, the general structure of a ZIP file is as follows:

[local file header 1]
[encryption header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[encryption header n]
[file data n]
[data descriptor n]
[archive decryption header]
[archive extra data record]
[central directory header 1]
.
.
.
[central directory header n]
[zip64 end of central directory record]
[zip64 end of central directory locator]
**[end of central directory record]**

Note that 0.jar is not a ZIP64 file, and apkzlib does not support ZIP64. You can ignore the ZIP64 related structure. Compared with ZIP, ZIP64 can compress more and larger files.

The EOCD is located at the end of the ZIP file and consists of the following data:

The length of the EOCD signature is 4 bytes, and the content is fixed at 0x06054b50. Back to the constructor of EOCD:

Eocd(ByteBuffer bytes) throws IOException {
    F_SIGNATURE.verify(bytes);
    F_NUMBER_OF_DISK.verify(bytes);
    F_DISK_CD_START.verify(bytes);
    long totalRecords1 = F_RECORDS_DISK.read(bytes);
    ...
    Verify.verify(totalRecords1 <= Integer.MAX_VALUE);
    totalRecords = Ints.checkedCast(totalRecords1);
    ...
}

In the EOCD constructor, first check whether it is EOCD signature, read 4 bytes; then read 2 bytes, its the number of this disk field, since apkzlib does not support multiple-part (e.g. “multiple floppy-disk”) archives, so the value must be 0; then read two more bytes, its the number of the disk with the start of the central directory, as apkzlib does not support multiple-part archives, this value must also be 0.

There is no problem without supporting multiple-part archives, because it was designed for floppy disks, which is why these field names have “disk”.

F_RECORDS_DISK.read is actually reading the total number of entries in the central directory on this disk in the above table, which is equivalent to the total number of compressed files.

F_RECORDS_DISK.read will read two bytes in little-endian, treat it as a 16bit unsigned integer, and finally store it in totalRecords. Note that the type of totalRecords is int, which is enough to store the value read by F_RECORDS_DISK.read. Since apkzlib does not support multiple-part archives, the total number of entries in the central directory on this disk and the total number of entries in the central directory should be equal.

It can be seen that when ZFile is constructed, it will first read EOCD, according to the totalRecords in EOCD, then read the corresponding number of StoredEntry to fill in CentralDirectory.entries, the data returned by ZFile.entries() is from CentralDirectory.entries.

There is no way for plugins to affect the logic of ZFile, and there are no obvious bugs in this process. The only thing I care about is that the original data of totalRecords is stored in two bytes.

Validation guess: EOCD field overflow

Android developer should be more sensitive to two-byte storage. After all, there is a 64K reference limit, so is there a problem of 64K here?

If more than 65535 files are compressed in one ZIP file, when writing EOCD, since there are only two bytes to write the total number of compressed files, it is bound to overflow. After the unsigned short overflows, it is much smaller than the real value. When apkzlib reads the total number of compressed files, it will get a number much smaller than the actual number of compressed files. apkzlib completely relies on the total number of compressed files to read the files in the ZIP, resulting in a large number of compressed files not being read by apkzlib, which may be the root cause of this problem.

Evidence 1: Number of compressed files in 0.jar

Unzip the intermediate output 0.jar of the two internal APKs of tw and th on the computer:

There are 56,560 items in tw’s 0.jar, and 70,410 items in th’s 0.jar.

It should be noted that after decompressing the jar on macOS, command+I to view the number of items will be much larger than the total number of compressed files in the jar, because a folder is also an item. And depending on the jar’s compression configuration, folders on the file path may not count towards the number of compressed files.

Therefore, it can be determined that the field of the total number of compressed files in the 0.jar EOCD of tw must not overflow, but the field of the total number of compressed files in the 0.jar EOCD of th may or may not overflow, which needs to be confirmed.

Evidence 2: Binary data of 0.jar

In order to know whether the total number of compressed files in EOCD in 0.jar of th overflows, just check the binary data of 0.jar.

Below is a snippet of binary data at the end of th’s 0.jar file:

6e69 6665 2f52 2464 7261 7761 626c 652e
636c 6173 7350 4b05 0600 0000 0062 0162
0197 1a64 000e 783c 0400 00

Look for the binary data 0x06054b50 of the EOCD signature from the end of the 0.jar file. Since the data in the ZIP is stored in little endian, what you are looking for is the 50 4b 05 06 sequence, followed by a 2-byte 0, and then a 2-byte 0, the following 2-byte number is the total number of compressed files:

6e69 6665 2f52 2464 7261 7761 626c 652e
636c 6173 7350 4b05 06 00 0000 0062 0162
0197 1a64 000e 783c 0400 00

The binary 62 01 is the total number of compressed files. Since it is stored in little endian, it is actually 0x0162 = 354.

Similarly, the following is the binary fragment at the end of tw’s 0.jar file:

2464 7261 7761 626c 652e 636c 6173 7350
4b05 0600 0000 000c ce0c cee4 544d 008b
172d 0300 00

Using the same method, the total number of compressed files of tw’s 0.jar can be found, that is, 0xce0c = 52748.

Obviously, the total number of compressed files in EOCD in 0.jar of th overflowed, which caused it to be much smaller than the corresponding value in 0.jar of tw.

Evidence 3: How proguard writes EOCD

0.jar is the output of the proguard phase. Check the method of writing EOCD in the source code of proguard 6.0.3:

private void writeEndOfCentralDirectory() throws IOException
{
    ...
    writeInt(MAGIC_END_OF_CENTRAL_DIRECTORY);
    writeShort(0);                    // Number of this disk.
    writeShort(0);                    // Number of disk with central directory.
    writeShort(zipEntries.size());    // Number of records on this disk.
    writeShort(zipEntries.size());    // Total number of records.
    ...
}
 
private void writeShort(int value) throws IOException
{
    outputStream.write(value);
    outputStream.write(value >>> 8);
}

zipEntries is an ArrayList, and its size method returns an int type. When writingShort, only the lower 16 bits are written, and the upper 16 bits are discarded. Once zipEntries.size() returns a value greater than 65535, writeShort is equivalent to writing an overflowed unsign short value.

How to fix

Although there is an overflow in the field recording the total number of compressed files in the EOCD of the ZIP file, the compressed file itself is not lost and still exists in the ZIP file. Review the file structure of ZIP:

[local file header 1]
[encryption header 1]
[file data 1]
[data descriptor 1]
.
.
.
[local file header n]
[encryption header n]
[file data n]
[data descriptor n]
[archive decryption header]
[archive extra data record]
**[central directory header 1]**
.
.
.
**[central directory header n]**
[zip64 end of central directory record]
[zip64 end of central directory locator]
[end of central directory record]

We can ignore the total number of compressed files in EOCD and directly read all the central directory headers in the ZIP.

Modify the CentralDirectory.makeFromData method, ignore count, and change the loop condition to continue reading as long as there is readable data. In theory, all files can be read out.

Change i < count in the following code to bytes.hasRemaining():

static CentralDirectory makeFromData(ByteBuffer bytes, int count, ZFile file, ByteStorage storage)
        throws IOException {
    ...
    for (int i = 0; i < count; i++) {
        try {
            directory.readEntry(bytes, storage);
        } catch (IOException e) {
            ...
        }
    }
 
    return directory;
}

The modified code is:

static CentralDirectory makeFromData(ByteBuffer bytes, int count, ZFile file, ByteStorage storage)
        throws IOException {
    ...
    for (int i = 0; bytes.hasRemaining(); i++) {
        try {
            directory.readEntry(bytes, storage);
        } catch (IOException e) {
            ...
        }
    }
 
    return directory;
}

Recompile AGP and use the modified AGP to package the th internal APK. It can run normally and there is no loss of java resources.

Upgrade AGP version

After all, modifying the AGP is not a long-term solution. The final solution should be to upgrade the AGP. The latest version is 4.1.2.

After the above analysis, we know that the EOCD field overflow is caused by the small field storage space in the early design of ZIP.

Ordinary ZIP file can only compress up to 65535 files, but even if it exceeds, it is fine, only the fields in the EOCD overflow. At the same time, the ZIP file also has a file size limit of 4GB (2³² - 1 bytes), which limits the compression of a single file in the zip file. The original size, the compressed size of a single file, and the size of the ZIP file itself cannot exceed or reach 4GB.

Later, PKWARE proposed ZIP64, ZIP64 and ZIP are very similar in file structure, the difference is:

ZIP64 inserts the zip64 extended information extra field corresponding to each compressed file into the zip file by using the extra field in the local file header
ZIP64 inserts zip64 EOCD record and zip64 EOCD locator before original EOCD

In the above two ways, ZIP64 increases the file size limit to 16 EB ( 2⁶⁴ — 1 bytes) and the number of compressed files to more than 4 billion (2³² — 1 ).

Unless ZIP64 is used instead, these fields are bound to overflow. Does AGP 4.1.2 fix this problem?

View the code for obtaining java resources in the source code of AGP 4.1.2:

@NonNull
private static Map<RelativeFile, FileStatus> getChangedJavaResources(
        SplitterParams params,
        Map<File, String> cacheKeyMap,
        KeyedFileCache cache,
        Set<Runnable> cacheUpdates)
        throws IOException {
    Map<RelativeFile, FileStatus> changedJavaResources = new HashMap<>();
    for (SerializableChange change : params.getJavaResourceFiles().get().getChanges()) {
        if (change.getNormalizedPath().isEmpty()) {
            try {
                IncrementalChanges.addZipChanges(
                        changedJavaResources, change.getFile(), cache, cacheUpdates);
            } catch (Zip64NotSupportedException e) {
                ...
            }
        } else {
            IncrementalChanges.addFileChange(changedJavaResources, change);
        }
    }
    return Collections.unmodifiableMap(changedJavaResources);
}

AGP 4.1.2 has not used apkzlib for processing ZIP here, but has changed to the self-implemented ZipCentralDirectory . Tracking IncrementalChanges.addZipChanges eventually calls RelativeFiles.fromZip:

@NonNull
public static Set<RelativeFile> fromZip(@NonNull ZipCentralDirectory zip) throws IOException {
    Collection<DirectoryEntry> values = zip.getEntries().values();
    Set<RelativeFile> files = Sets.newHashSetWithExpectedSize(values.size());
    for (DirectoryEntry entry : values) {
        files.add(new RelativeFile(zip.getFile(), entry.getName()));
    }
    return Collections.unmodifiableSet(files);
}

Just need to know how the ZipCentralDirectory.getEntries method is implemented. It is written in Kotlin, so the readZipEntries method is actually called:

val entries: Map<String, DirectoryEntry> by lazy { readZipEntries() }
 
 
private fun readZipEntries(): Map<String, DirectoryEntry> {
    val buffer = directoryBuffer
    val entries = mutableMapOf<String, DirectoryEntry>()
    while (buffer.remaining() >= CENTRAL_DIRECTORY_FILE_HEADER_SIZE && buffer.int == CENTRAL_DIRECTORY_FILE_HEADER_MAGIC) {
        ...
    }
    return Collections.unmodifiableMap(entries)
}

The number of loops here does not depend on any number of files, but is repeatedly reading the data in the central directory header area until there is no data. It seems that java resources will not be lost.

In addition, if ZipCentralDirectory reads and uses the total number of compressed files in EOCD elsewhere, it will still cause bugs. Check the source code of ZipCentralDirectory to read EOCD:

private fun readEOCDFromBuffer(buffer: ByteBuffer): CdrInfo {
    val info = CdrInfo()
    // Read the End of Central Directory Record and record its position in the map. For now skip fields we don't use.
    buffer.position(buffer.position() + SHORT_BYTES * 4)
    //short numDisks = bytes.getShort();
    //short cdStartDisk = bytes.getShort();
    //short numCDRonDisk = bytes.getShort();
    //short numCDRecords = buffer.getShort();
    info.cdSize = uintToLong(buffer.int)
    info.cdOffset = uintToLong(buffer.int)
    //short sizeComment = bytes.getShort();
    return info
}

Here, the buffer has passed the EOCD signature before it is passed in, and immediately skips 4 SHORT_BYTES after entering, that is, skips 8 bytes, just skips the total number of compressed files. That is to say, ZipCentralDirectory does not read the field in EOCD that records the total number of compressed files, so it will not be affected by the overflow of this field.

Therefore, upgrading the AGP version to 4.1.2 can indeed solve the problem of java resource loss.

Conclusion

The root cause for the loss of java resources in APK is that the field in the EOCD of the ZIP to record the number of compressed files is only stored in 16 bits, while the 0.jar generated in the proguard phase has compressed more than 65535 files, resulting in the overflow of the field. The apkzlib in AGP 3.4.1 relies on the value of the number of compressed files in the EOCD of the ZIP. As a result, when building the APK file, a large number of files in the zip are discarded during the collection of java resources, which eventually leads to the loss of a large number of java resources.

To fix this problem, consider:

Reduce the number of compressed files in 0.jar. Feasibility is not high.
Manually modify and compile AGP. Can be used as a temporary solution.
Upgrade AGP to 4.1.2. Permanently fix this issue.

中文版:

打包APK缺失Java资源: ZIP的65535问题

公司产品Android客户端分为主app及一系列业务sdk，打包时可以打不同市场的包，例如th为泰国市场，tw为中国台湾市场。vn为越南市场。年前主app打包遇到一个非常古怪的问题：当同时集成sdk A和sdk…

medium.com