How to compare .apk /.aab files | Part 2

Anatoliy Varyvonchyk
Bumble Tech
Published in
8 min readNov 2, 2023

In part one of this series, we discussed how we can analyse bundle contents and what data can be extracted for analysis. Now, let’s cover what else we can do to help developers understand important changes in .apk contents.

Dependency changes

One of the most important parts of comparison is to know which new dependencies were introduced or how the transitive dependencies were updated. When a developer makes a minor update to a library version, this could lead to unexpected changes in multiple transitive dependencies, and could upgrade them to a newer version.

Dependency Analysis Android plugin

This gradle plugin allows you to get detailed insights about your project dependencies and many more. Even if you’re not going to make automatic reports, I suggest connecting that plugin to your current project and launching health checks. This might give you ideas of what you can improve in your project (removing unused dependencies, changing dependencies to compile only or changing suboptimal dependencies structure in the project). Here, let’s going to focus on a GraphViewTask. This task allows us to get structured data of the project dependencies state.

Here’s an example of a report which we’ll use for analysis:

{
"variant":{"variant":"debug","kind":"MAIN"}
"configurationName":"debugCompileClasspath",
"nodes":[
{"type":"project","identifier":":libraries:Network"},
{"type":"module","identifier":"org.jetbrains.kotlin:kotlin-stdlib","resolvedVersion":"1.7.10"}
// Other modules and libraries
]
}

For parsing such .json file we can use kotlinx.serialization or any other library. The next code snippet allows us to parse such json and get models from it (DependencyGraphJson):

class DependencyGraphParser(
private val json: Json = Json { ignoreUnknownKeys = true }
) {
@OptIn(ExperimentalSerializationApi::class)
fun parse(jsonFile: File): DependencyGraphJson =
jsonFile.inputStream().use { json.decodeFromStream(it) }

@Serializable
internal data class DependencyGraphJson(
val variant: BuildVariantJson,
val configurationName: String,
val nodes: List<DependencyGraphNodeJson>
)

@Serializable
internal data class BuildVariantJson(
val variant: String,
val kind: String
)

@Serializable
internal data class DependencyGraphNodeJson(
val type: String,
val identifier: String,
val resolvedVersion: String? = null
)

DependencyGraphDifference

The first thing that comes to mind is reporting changes in the project structure to the developer, they might include:

  • Which modules have been added or removed.
  • Whether a new dependency to the project has been introduced.
  • How the library version has changed (upgraded or downgraded, and the old and new values).

To achieve that, we need to compare 2 DependencyGraphJson models and report results to the developer. Let’s define the main models required for that.

sealed class Dependency {
data class ProjectDependency(val path: String) : Dependency()
data class MavenDependency(val coordinates: MavenCoordinates) : Dependency()
}

data class MavenCoordinates(val groupId: String, val artifactId: String, val version: String?)

sealed class DependencyChange {
abstract val dependency: Dependency
data class Added(override val dependency: Dependency) : DependencyChange()
data class Changed(override val dependency: Dependency, val oldVersion: String?, val changeType: DependencyChangeType) : DependencyChange()
data class Removed(override val dependency: Dependency) : DependencyChange()
}

enum class DependencyChangeType { Upgrade, Downgrade, Other }

// final model which contains all changes
internal data class DependencyChanges(
val changes: Map<String, List<DependencyChange>> // configurationName to changes
)

We can define 2 types of Dependency in the project:

  • ProjectDependency — when your moduleA depends on moduleB
  • MavenDependency — when your module depends on external library

We can also define 3 possible variants for DependencyChange:

  • Added — when we added a new dependency
  • Removed — when we removed a dependency
  • Changed — when a dependency have been changed Upgraded / Downgraded or Other (when we were not able to get an old version)

Now, when we have main models defined, we can compare results and make human readable reports for developers.

class DependencyGraphDiffer {
private fun calculateDependencyChangesInternal(graph1: DependencyGraphJson, graph2: DependencyGraphJson): DependencyChanges {
val dependenciesById1 = graph1.dependenciesById // defined below
val dependenciesById2 = graph2.dependenciesById // defined below

val addedKeys = dependenciesById2.keys - dependenciesById1.keys
val removedKeys = dependenciesById1.keys - dependenciesById2.keys
val sameKeys = dependenciesById1.keys.intersect(dependenciesById2.keys)

val dependencyChanges = mutableListOf<DependencyChange>()
dependencyChanges += addedKeys.sorted().map { key ->
DependencyChange.Added(dependenciesById2.getValue(key).toDependency())
}
dependencyChanges += removedKeys.sorted().map { key ->
DependencyChange.Removed(dependenciesById1.getValue(key).toDependency())
}
dependencyChanges += sameKeys.sorted().mapNotNull { key ->
val dep1 = dependenciesById1.getValue(key)
val dep2 = dependenciesById2.getValue(key)
if (dep1.resolvedVersion != dep2.resolvedVersion) {
DependencyChange.Changed(
dep2.toDependency(),
dep1.resolvedVersion,
getChangeType(dep1.resolvedVersion, dep2.resolvedVersion)
)
} else {
null
}
}
return DependencyChanges(mapOf(graph1.configurationName to dependencyChanges))
}

private fun getChangeType(version1: String?, version2: String?): DependencyChangeType {
return if (version1 != null && version2 != null) {
val comparisonResult = VersionNumber.parse(version1).compareTo(VersionNumber.parse(version2))
if (comparisonResult < 0) {
DependencyChangeType.Upgrade
} else {
DependencyChangeType.Downgrade
}
} else {
DependencyChangeType.Other
}
}

private val DependencyGraphJson.dependenciesById: Map<String, DependencyGraphNodeJson>
get() = nodes.asSequence()
.map { "${it.type}:${it.identifier}" to it }
.toMap()

private fun DependencyGraphNodeJson.toDependency(): Dependency =
when (type) {
"project" -> Dependency.ProjectDependency(identifier)
"module" -> Dependency.MavenDependency(MavenCoordinates(identifier, resolvedVersion))
else -> throw IllegalStateException("Unknown type of dependency: $type")
}

private fun MavenCoordinates(groupAndArtifactId: String, version: String?): MavenCoordinates {
val parts = groupAndArtifactId.split(":")
return MavenCoordinates(parts[0], parts[1], version)
}
}

After comparing the differences between two models, we have full information about the project dependency changes and we can report the changes to the developer for each configuration that was collected.

Below, you’ll see an example of upgrading retrofit libraries from version 2.6.0 to 2.9.0 and how they’re reported to the developer.

License Analyser

Library licence might stop you from adding it to the project. Most of the libraries in the Android community are open sourced and free for use, but for others you might be obligated to disclose your project source code or pay to use them. You definitely don’t want to add such libraries to the codebase before understanding the consequences. Receiving reports which will raise developer attention on adding libraries with suspicious licence would be great.

However, this is a complex task to implement. You’d need to manually review the licence of each and every library that you’re going to add. And you’d need to be careful when the library is getting updated as the licence might also change. You might be interested in checking Black Duck software Composition Analysis which is designed to solve that problem (note that it’s not free and that analysis takes time). You can also take a look at Licensee, a free alternative from Jake Wharton. But, let’s see what we can do by ourselves.

To get information about the licence, we can inspect the pom.xml file. This is an example of a pom.xml file of com.squareup.retrofit2:retrofit:2.9.0 dependency. As you’ll notice, it has a <licences> block, which we’re looking for.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.squareup.retrofit2</groupId>
<artifactId>retrofit</artifactId>
<version>2.9.0</version>
<name>Retrofit</name>
<description>A type-safe HTTP client for Android and Java.</description>
<url>https://github.com/square/retrofit</url>
<licenses>
<license>
<name>The Apache Software License, Version 2.0</name>
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
<distribution>repo</distribution>
</license>
</licenses>
<!-- other content -->
</project>

Now, we can propose the next solution:

  • Define whitelist of licence urls, which we examined and approved to use.
  • For newly added (or changed) dependency to the project, search for a licence url.
  • If the url is missing or unknown (not whitelisted), report it to the developers.

Developers can later manually review library licences and decide whether it can be used (and update the whitelist as well). The only remaining question is: How can we get the pom.xml file? It may differ from company to company. If you use a Nexus repository, which is configured as a proxy cache for external dependencies, you can request and parse pom.xml files from it. Another option is to mimic Gradle behaviour for dependency resolution and obtain artefact metadata.

In the next code example we assume that we can get a proper url to pom.xml from the dependency. Requests and parsing utilising ktor and fasterxml libraries.

internal class MavenApiImpl(
engine: HttpClientEngine = CIO.create()
) : MavenApi {

private val xmlMapper = XmlMapper.Builder(XmlMapper())
.defaultUseWrapper(false)
.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
.build()
.registerKotlinModule()

private val httpClient: HttpClient = HttpClient(engine) {
install(JsonPlugin) {
serializer = JacksonSerializer(xmlMapper)
accept(ContentType.Application.Xml)
}
}

// Then we can traverse project, extract url from licences and compare with our whitelist
override suspend fun getLicenses(dependency: MavenCoordinates): Project =
httpClient.get(dependency.toPomXmlUrl()).body()

}

@JacksonXmlRootElement(localName = "license")
internal data class License(
@field:JacksonXmlProperty(localName = "url")
val licenseUrl: String?,

@field:JacksonXmlProperty(localName = "name")
val name: String?,
)

@JacksonXmlRootElement(localName = "project")
internal class Project {

@field:JacksonXmlElementWrapper(localName = "licenses") // wrapper doesn't work with constructor parameters
@field:JacksonXmlProperty(localName = "license")
val licenses: MutableList<License> = mutableListOf()

}

Common Vulnerabilities and Exposures (CVE) Analyser

Let’s now think about vulnerabilities in external libraries. You can find many tools on the market that do this and can analyse your whole project. As in the previous analyser, you may want to make a lightweight solution, focused solely on newly added / updated libraries and prevent those to appear in the new code. DependencyCheck can be used for that. DependencyCheck uses the National vulnerabilities database to get information about existing vulnerabilities.

It has different options and configurations how it can be used (cli interface, plugins, self-hosted solution, and many more). We can use the core engine part and get results directly from using those artefacts. It might be a challenge to configure it in a way that completely avoids false positives, so we can define a configuration file to ignore false positives vulnerabilities. Let’s discuss what needs to be done to use that tool.

Create a dependency (in terms of DependencyCheck library) from maven library (we used addDependency method):

fun createDependency(file: File, coordinates: MavenCoordinates): Dependency {
val depsPath = coordinates.toString()
// API forces to specify file, in reality it is not used, but will be later added to .json report
val dependencyToAdd = Dependency(file, true)
dependencyToAdd.sha1sum = Checksum.getSHA1Checksum(depsPath)
dependencyToAdd.sha256sum = Checksum.getSHA256Checksum(depsPath)
dependencyToAdd.md5sum = Checksum.getMD5Checksum(depsPath)
dependencyToAdd.displayFileName = depsPath
dependencyToAdd.name = coordinates.artifactId
dependencyToAdd.version = coordinates.version
dependencyToAdd.packagePath = coordinates.toString()

dependencyToAdd.addAsEvidence(
"gradle",
MavenArtifact(coordinates.groupId, coordinates.artifactId, coordinates.version),
Confidence.HIGHEST
)
return dependencyToAdd
}

Setup and analyse the DependencyCheck engine.

DependencyCheck will process results and save those in file with one of the possible formats. Let’s save it as a json file and parse it later. After parsing, we’d need to filter vulnerabilities which have been identified as false positives and stored in our configs.
Here’s how to do that:

internal fun analyseDependencies(
jsonOutputFile: File,
dependencies: Set<Dependency.MavenDependency>,
whitelistedConfig: List<String>
) {
Engine(Engine.Mode.STANDALONE, Settings()).use { engine ->
dependencies.forEach {
engine.addDependency(DependencyCreator.createDependency(jsonOutputFile, it.coordinates))
}
try {
engine.analyzeDependencies()
} catch (ex: ExceptionCollection) {
// add logs for developers
}
engine.writeReports("", jsonOutputFile, "JSON", null)
}
val json: Json = Json { ignoreUnknownKeys = true }
val reportJson: ReportJson = jsonOutputFile.inputStream().use { json.decodeFromStream(it) }

reportJson.dependencies.map { jsonDependency ->
val filteredVulnerabilities = jsonDependency.vulnerabilities.filter { it.name !in whitelistedConfig }
filteredVulnerabilities.map {
// Vulnerability found -> report it to developers
}
}
}

Closing note

It’s worth noting that this isn’t an exhaustive list of analysers. And there are many approaches you can take to develop them. Try to focus on what’s most beneficial for your specific scenario. You should also think about integration into your CI, so it will run automatically on feature branches, and results will be reported in a readable way. Once all of this is in place, developers will be able to check results before merging a feature branch to a master and fully understand how their changes affected final .apk. Which will allow your team to catch unexpected changes before they will appear on a real user’s devices.

Thanks for reading and good luck with reporting changes! Let me know your thoughts or questions in the comments.

--

--