How to compare .apk / .aab files | Part 1

Anatoliy Varyvonchyk
Bumble Tech
Published in
9 min readJul 3, 2023

Have you ever thought about how your .apk / .aab from your current release differs from a previous one?

The Android team suggests using an apk analyzer for this, which is part of Android Studio. And that’s ok if you’re only making those comparisons every now and again, but what if you want to compare on a more regular basis? This has the potential to enable you to go even further and integrate analysis reports as a part of your CI pipeline, which could provide a really detailed understanding of how a single commit affected final results.

In this article, I’ll explain how you could analyse different aspects of a build: size, manifest, apk splits, dex info etc.

Please note that I’ll be sharing several code examples. It’s possible to develop solutions utilising Gradle plugins and Kotlin, and therefore, I will share Kotlin code that uses Gradle APIs for process operations. These APIs can be substituted with a standard ProcessBuilder if necessary. You could also implement similar logic using other programming languages.

Let’s take a look at how can we visually represent a comparison:

Let’s compare

AppBundle

The most important and most obvious element for comparison would be the AppBundle (.aab) file itself. AppBundle contains all your app’s compiled code and resources. If needed from AppBundle, we can extract the .apk file for further analysis. If you haven’t yet migrated to AppBundle, you can also compare .apk files.

Analysers

Application size

An obvious comparison to make would be the sizes of the new and current .apk installed on a user device. However, .aab contains resources for different device configurations, making it less informative about the actual impact of the changes. Instead, we’re only going to compare installation size based on the device configuration we provide — e.g. Pixel 6.

To work with AppBundle, the Android team have provided a command line tool which covers our needs — bundletool. If we try to use `get-size total command to understand download size for a final user then we will get a range of possible options.

bundletool get-size total --apks={apkset.apks}
MIN_bytes,MAX_bytes

This range might be quite big and doesn’t help us to understand the difference between the download size of two .aab files. We can specify optional parameters to get ranges for specific configurations (SDK, ABI, SCREEN_DENSITY, LANGUAGE). That would allow us to get precise data for each configuration, but it would still be awkward to compare two tables with numbers. To get a more measurable metric, let’s instead compare download size for a specific device:

bundletool get-size total --device-spec={device_spec.json} --apks={apkset.apks}
MIN_bytes,MAX_bytes
# MIN == MAX, as we specifying exact device configuration
fun getTotalSize(execOperations: ExecOperations, apks: File, deviceSpec: File): Long {
val outputStream = ByteArrayOutputStream()
execOperations.exec {
it.commandLine("bundletool")
it.args(
"get-size",
"total",
"--device-spec=${deviceSpec.absolutePath}",
"--apks=${apks.absolutePath}"
)
it.standardOutput = outputStream
}
val output = outputStream.toString(StandardCharsets.UTF_8).trim()
val minMax = output.lines().last().split(",")
return minMax[0].toLong()
}

device_spec.json — a file that specifies the device configuration you want to target. You can either generate that file from an already existing device, or you can manually create a .json file even if you don’t have access to the device for which you want to build a target apk set.
apkset.apks — apk set archive with .apks file extension. It is a container archive which includes an apk set for all device configurations our app supports. It can be generated from .aab file using following command:

bundletool build-apks --bundle={appBundle.aab} --output={path_to_apkset.apks}
fun generateApks(execOperations: ExecOperations, appBundle: File, outputFile: File) {
execOperations.exec {
it.commandLine("bundletool")
it.args(
"build-apks",
"--bundle=${appBundle.absolutePath}",
"--output=${outputFile.absolutePath}"
)
}
}

Now we know how to compare the difference between an old and new version size for a specific device, and we can report it to the developer. For a further, albeit minor improvement, you can format results and report not the plain bytes difference but any other formatting suitable for your needs (example). Let’s explore what else might be interesting for analysis inside the .apks file.

Apk splits

If we unzip it, we’ll find a splits folder which contains multiple .apk files.
We can categorise those into few groups:

  • Language related .apk files: base-en.apk, base-pl.apk etc.
  • Density related .apk files: base-mdpi.apk, base-hdpi.apk etc.
  • ABI related: base-armeabi-v7a.apk, base-arm64-v8a.apk etc.
  • Main apk: base-master.apk

For developers, it would be interesting to see if we lose support of ABI between app versions and when we add support for a new language.

Let’s compare a set of files by name and report new / removed files.
But, first we’d want to extract .apk files from .apks archive. This is a simple zip archive which we can unzip:

fun extractApks(fileSystemOperations: FileSystemOperations, archiveOperations: ArchiveOperations, apks: File, outputDir: File) {
fileSystemOperations.copy {
it.from(archiveOperations.zipTree(apks))
it.into(outputDir)
}
}

Now we have a directory which contains multiple .apk files and we can compare contents of it before and after changes.

import java.io.File

data class ArtifactPair(
val current: File, // folder containing apk splits
val baseline: File // folder containing apk splits
)

private fun compareSplits(extractedApks: ArtifactPair): SplitsData {
val currentSplits = findSplits(extractedApks.current)
val baselineSplits = findSplits(extractedApks.baseline)

val currentSplitsByName = currentSplits.associateBy { it.nameWithoutExtension }
val baselineSplitsByName = baselineSplits.associateBy { it.nameWithoutExtension }

val newSplits = (currentSplitsByName.keys - baselineSplitsByName.keys).toSortedSet()
val removedSplits = (baselineSplitsByName.keys - currentSplitsByName.keys).toSortedSet()
val matchingSplits = currentSplitsByName.keys.intersect(baselineSplitsByName.keys).toSortedSet()

return SplitsData(
currentSplits = currentSplits,
baselineSplits = baselineSplits,
currentSplitsByName = currentSplitsByName,
baselineSplitsByName = baselineSplitsByName,
newSplits = newSplits,
removedSplits = removedSplits,
matchingSplits = matchingSplits
)
}

private fun findSplits(extractedApks: File): List<File> =
extractedApks.resolve("splits")
.walkTopDown()
.filter { it.name.endsWith(".apk") }
.toList()

SplitsData now contains all required information about .apk splits and we can report it for developers. We can also compare matching splits by size and report if those will have significant differences.

Manifest Changes

One of the important sources of data for analysis is AndroidManifest.xml file. Analysing it automatically can allow us to prevent unexpected changes in manifest (like adding new Permissions or Services we didn’t intend to have).
Bundle tool can extract manifest from AppBundle to a standard output and we can then save it as plain .xml and analyse later. To do so, you can use this command:

bundletool dump manifest --bundle={path_to_bundle.aab}
fun extractManifest(execOperations: ExecOperations, appBundle: File, outputFile: File) {
outputFile.outputStream().buffered().use { output ->
execOperations.exec {
it.commandLine("bundletool")
it.args(
"dump",
"manifest",
"--bundle=${appBundle.absolutePath}"
)
it.standardOutput = output
}
}
}

After extracting manifests from both bundle files, we can proceed with making a comparison.
There might be multiple possible options for how to report changes between two .xml documents. The final decision depends on your needs, and the complexity of the solution will depend on how you want to present the results.

If you use the XMLUnit library for comparison, you’ll get structural differences between 2 XML documents. From those structural differences, you can report new, missing or changed nodes.

In this example, let’s focus on a simple solution which uses the git diff approach and uses standard diff output. To get differences between two files, you can execute the next command (git should be installed on a machine):

git diff --no-index {path_to_first_manifest.xml} {path_to_second_manifest.xml}
fun generateDiff(execOperations: ExecOperations, baselineManifest: File, currentManifest: File): String {
val outputStream = ByteArrayOutputStream()
execOperations.exec {
execOperations.exec {
it.commandLine("git")
it.args(
"diff",
"--no-index",
baseline.absolutePath,
current.absolutePath
)
it.standardOutput = outputStream
}
}
return outputStream.toString(StandardCharsets.UTF_8).trim()
}

--no-index param is required to get results from a folder which is not under git control.
The results will be reported as a plain string in unified diff format. See an example of a git diff output:

--- /path/to/original timestamp
+++ /path/to/new timestamp
@@ -1,3 +1,9 @@
+This is an important notice!
+It should therefore be located at the beginning of this document!

This part of the document has stayed
the same from version to
@@ -8,13 +14,8 @@
compress the size of the changes.

-This paragraph contains text that is outdated.
-It will be deleted in the near future.

It is important to spell
-check this dokument. On
+check this document. On
the other hand, a misspelled word isn't
the end of the world.

As we’re not interested in the header prefix, we can exclude it. Additionally, instead of displaying everything as a plain string, we prefer to present changes individually. To accomplish this, we can use the following code:

fun String.gitDiffToChanges(): List<String> {
if (this.isEmpty()) {
return emptyList()
}

val groupedChanges = mutableListOf<String>()
val outputLines = this.lines()
var diffBlockStart = -1
outputLines.forEachIndexed { index, currentLine ->
if ((currentLine.startsWith("@@") && currentLine.endsWith("@@"))) {
if (diffBlockStart != -1) {
groupedChanges.add(outputLines.subList(diffBlockStart, index).joinToString(separator = "\n") { it })
}
diffBlockStart = index + 1
}
}
if (diffBlockStart != -1) {
groupedChanges.add(outputLines.subList(diffBlockStart, outputLines.size).joinToString(separator = "\n") { it })
}
return groupedChanges
}

A downside of the git diff approach is that you can’t control output results and make fancy paddings skipping unaffected nodes etc. On the upside, that solution is really easy to support compared to a custom solution which requires implementing logic to analyse structural differences between two .xml documents.

Changes in dex files

Now that we know how to extract splits .apk files, let’s go further and extract individual .dex files from each .apk, and compare matching splits. To decide what we want to report, we first need to understand what data we can extract from a dex file. You can read more about .dex file content here.

Let’s focus on reporting the amount of methods and the number of fields that are stored in the .dex file. We’re lucky enough to have an already existing implementation for parsing dex files, written by Google. However it parses the whole dex file, while we are only interested in the header item of the dex file. So we can simplify parser and speed it up by reducing unnecessary calls.

As a first step, we do want unzip .apk and collect .dex files from it:

unzip {apk_file_path} "*.dex"
fun extractDexes(fileSystemOperations: FileSystemOperations, apk: File, outputDir: File): List<File> {
fileSystemOperations.copy {
it.from(archiveOperations.zipTree(apk))
it.include("*.dex")
it.into(outputDir)
}
return outputDir
.walkTopDown()
.filter { it.name.endsWith(".dex") }
.toList()
}

The next step would be to parse the .dex file. I won’t include full code here, but you can find a link to full gist. Now, let’s discuss the key elements of parsing.

The Header info starts with the constant array/string DEX_FILE_MAGIC — list of bytes that must appear at the beginning of a .dex file in order for it to be recognized as such. That value encodes a format version number, which is expected to increase monotonically over time, as the format evolves. As a first step of parsing the .dex file, we should verify magic constant.

@Throws(IOException::class)
private fun verifyMagic(dexFile: RandomAccessFile) {
val magic = ByteArray(8)
dexFile.readFully(magic)

val possibleMagicValues = listOf(
"dex\n035\u0000".toByteArray(StandardCharsets.US_ASCII),
"dex\n037\u0000".toByteArray(StandardCharsets.US_ASCII),
"dex\n038\u0000".toByteArray(StandardCharsets.US_ASCII),
"dex\n039\u0000".toByteArray(StandardCharsets.US_ASCII)
)

if (!possibleMagicValues.contains(magic)) {
throw IOException("Magic number is wrong -- are you sure this is a DEX file?")
}
}

Next, we should read the endian constant.
This will tell what is the order of bytes when reading INT values.

fun isBigEndian(): Boolean {
dexFile.seek(ENDIAN_TAG_POSITION)
val endianTag = readInt()
return if (endianTag == ENDIAN_CONSTANT) {
false
} else if (endianTag == REVERSE_ENDIAN_CONSTANT) {
true
} else {
throw IOException("Endian constant has unexpected value " + Integer.toHexString(endianTag))
}
}

Now we just need to read int values (which should be done differently depending on endian notation) from the file header with specific offset to get the information we’re looking for. As a result, we will get an object which contains a number of methods and fields of that specific .dex file. After processing all dexes from an .apk, we need to sum up the results to get the total count of methods / fields for a specific .apk file.

Reporting changes in app bundle files

We can also unzip .aab files and report contents inside of it for comparison. Some of the data might be filtered from the report:

  • /res/ — contains all the resources defined in the app (icons / layouts / fonts etc.)
  • /dex/classes/ — contains dex files

Both will produce noise on release builds due to obfuscation, and changes here don’t provide us additional insights compared to file size.

private fun listBundleContents(
appBundle: File,
ignorePaths: List<String> = setOf("/res/", "/dex/classes")
): Set<String> =
ZipFile(appBundle).use { zip ->
zip.entries()
.asSequence()
.filter { entry -> !entry.isDirectory && !ignorePaths.any { entry.name.contains(it) } }
.map { it.name }
.toSet()
}

After that, we can compare those two sets of strings and report for developers’ missing / new entries.

Closing note

A .aab file contains more data, which can be used for analysis. To get some ideas about what else you can extract and analyse, I suggest checking The Android App Bundle format. One of the potential options for a new analyser is to learn how to read and report changes in .pb files. For example resources.pb describes the code and resources in each module and it might be useful to highlight changes there for developers.

Today we discussed analysers which will help developers clearly see the difference between current and new versions. However we were focused on analysers processing appbundle files. Another source of data (that you could build different analysers around) is a project module structure, plus external dependencies used in the project.

We will discuss what we can do with that in the second part of this article. Stay tuned!

In the meantime, let me know what you thought of this in the comments :)

--

--