Data classes in Kotlin: how do they impact application size
Kotlin has numerous excellent features: null safety, smart casts, string interpolation and more. However, one of its features developers love the most, I have observed, are data classes. They are so well-loved that they are often used where no data class functionality is required.
In this article, with the help of an experiment, I will try to understand the real cost of using a high number of data classes in an application. I am going to delete all the data classes without breaking the compilation. Then I will share the experiment’s results and outcomes. During the experiment, the application will be broken, but this is not an issue for us because we just want to measure impact.
Data classes and their functionality
During the development process, we often create classes, whose main purpose is to store data. But in Kotlin, these can be declared as data classes to obtain additional functionality:
componentX()for destructuring assignment (
val (name, age) = person);
copy()with the option of creating a copy of the object with or without changes;
toString()with the name of the class and values for all the fields inside;
But we don’t pay for all the functionality — not by a long stretch. For release builds, the optimisers such as R8, ProGuard, DexGuard and others are used. These can delete unused methods, and that means that they can optimise data classes.
This is what will be deleted:
componentX()subject to the condition that destructuring assignment is not used (but, even if destructuring assignment is present, in the case of more aggressive optimisation settings, these methods can be replaced with direct reference to the class field);
copy(), if it is not used.
This is what will not be deleted:
toString(), since the optimiser cannot know whether this method will or will not be used somewhere (for example, for logging); similarly it will not be obfuscated;
hashCode(), since deleting these functions could change the behaviour of the app.
hashCode() always remain in the release builds.
Scale of the changes
To measure the impact that application-scale data classes have on the app size, I decided to put forward a hypothesis: not all the data classes are necessary for a project, and they can be replaced with ordinary ones. Since for release builds we use an optimiser, which can delete the
copy() methods, transforming the data classes into ordinary ones can boil down to the following:
However, this behaviour cannot be implemented manually. The only way to delete these functions from the code is to redefine them in the following form for each data class in the project:
Manually for 7749 data classes in the project.
The use of a mono repository for apps exacerbates the situation. This means that I don’t know how many of these 7749 classes I need to change in order to measure the impact of data classes on just one app. So, I have to change everything!
Making this volume of changes manually is impossible so this is the time to remember about compiler plugins — which are wonderful yet undocumented. We have already told you about our experience of creating a compiler plugin in the article “Fixing serialization of Kotlin objects once and for all”. But that is where we generate new methods, while this is where we need to delete them.
There is a plugin Sekret freely available on GitHub, which allows you to hide, in
toString(), the fields in the data classes specified with the annotation. This is what I used as the basis for my new plugin.
From the point of view of creating a structure for the project, practically nothing has changed. This is what we will need:
- Gradle plugin for simple integration;
- Compiler plugin, to be connected via a Gradle plugin;
- A project with an example, on which we can run various tests.
The most important part of the Gradle plugin is the
KotlinGradleSubplugin declaration. This subplugin will be connected via
ServiceLocator. Using the basic Gradle plugin we can configure
KotlinGradleSubplugin, which will configure the behaviour of the compiler plugin.
A plugin compiler has two important components:
CommandLineProcessor. The former is responsible for integrating our logic into compilation stages; the second, for handling the parameters for our plugin. I won’t describe them in detail here but you can view the implementation in the repository. I would just like to point out that, unlike the method described in another article, we will be registering
At this point in time, it is essential to prevent the compiler from creating some methods. For this, we’re going to use
DelegatingClassBuilder. It will delegate all the calls to the original
ClassBuilder while at the same time allowing us to redefine the behaviour of the
newMethod. If we try to create the methods
hashCode(), then we will return an empty
MethodVisitor. The compiler will write code for these methods to it, but it will not get into the class being created.
Thus, we intervened in the process of creating data classes and completely excluded the above-mentioned methods from them. You can make sure these methods are no longer there by using code accessible in the
sample project. You can also check the JAR/DEX byte code to make sure it doesn’t contain any of these methods.
The entire code is available in the repository, where you will also find an example of plugin integration.
For the purposes of comparison, we will use the Bumble and Badoo release builds. The results were obtained using the Diffuse tool, which outputs detailed information on the difference between two APK files: the sizes of the DEX files and resources, and the number of lines, methods and classes in the DEX file.
The number of data classes was determined heuristically through analysis of the strings deleted from the DEX file.
toString() implementation for data classes always begins with the short name of the class, an open bracket and the first field of the data class. There is no such thing as a data class without fields.
From the results you can conclude that, on average, each data class represents 120 bytes compressed and 400 bytes uncompressed. At first sight this doesn’t seem to be much, so I decided to check how many it works out on the application as a whole. It became clear that all the data classes in the project represent 4% of the size of the DEX file.
It is also worth clarifying that, because of the MVI architecture, we tend to use more data classes than applications in other architectures, possibly resulting in reduced impact on your application.
Using data classes
By no means am I urging you to avoid using data classes, but, when making a decision on whether to or not, you need to take every aspect into consideration. Here are some questions which are worth asking before declaring a data class:
- Are the
hashCode()implementations needed? If they are, it is better to use a data class, but remember that
toString()is not obfuscated.
- Do we need to use a destructuring assignment? Using data classes for this reason alone is not the best solution.
- Is the
toString()implementation needed? There is unlikely to be business logic that depends on the
toString()implementation, so sometimes you can regenerate this method manually using IDE means.
- Do you need a simple DTO to send data to another layer, or to hold some configuration information? An ordinary class is suitable for these purposes if the previous points are not relevant.
We cannot completely refrain from using data classes in the project, and the plugin referred to above breaks the application. Methods were deleted for the sake of assessing the impact of a large number of data classes. In our case, this was 4% of the size of the app’s DEX file.
If you want to assess how much space data classes take up in your application, you can do it by yourself using my plugin. If you too have carried out the same experiment, please feel free to share your feedback!