How right equals() method could improve the performance of your application
When we are talking about performance improvements we usually think about algorithms and their complexity, but some things are not so obvious but crucial.
Let’s pretend that we have the database with thousands of records of students. Our students have the next characteristics:
public class Student {
private String city;
private String university;
private String fullName;
private int classNumber;
}
Generate students we will from the following values:
private static final String[] cities = {"New York", "Los Angeles", "Chicago", "Houston", "Phoenix"};
private static final String[] names = {"James", "Mary", "John", "Patricia", "Robert"};
private static final String[] lastName = {"Smith", "Johnson", "Williams", "Brown", "Jones"};
private static final String[] university = {"Liberty University", "California State University, Fullerton","Texas A&M University — College Station", "University of Central Florida","The Ohio State University — Columbus"};
All student fields will be set randomly from the appropriate array, a class number from 1 to 5 and full name merged from random first name and second name.
Now for this Student.class we have to write equals() method. Usually, we don’t take care about how our equals() method looks like and generate it by IDE (Intellij IDEA in my case), but is this solution the best from a performance perspective?
So we have three equals() methods. What do you think, what is the fastest?
Generated by Intellij IDEA
public boolean ideaEquals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;Student student = (Student) o;if (classNumber != student.classNumber) return false;
if (!Objects.equals(city, student.city)) return false;
if (!Objects.equals(university, student.university))
return false;
return Objects.equals(fullName, student.fullName);
}
Custom #1
public boolean custom1equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;Student student = (Student) o;
if (!Objects.equals(fullName, student.fullName)) return false;
if (classNumber != student.classNumber) return false;
if (!Objects.equals(city, student.city)) return false;
return Objects.equals(university, student.university);
}
Custom #2
public boolean custom2Equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;Student student = (Student) o;
if (!Objects.equals(city, student.city)) return false;
if (!Objects.equals(fullName, student.fullName)) return false;
if (!Objects.equals(university, student.university))
return false;
return (classNumber == student.classNumber);
}
If you chose IntelliJ generated equals() you think in the right way, but…
Let’s create 1000 student entities
public static Student generateStudent() {
final Student student = new Student();
student.setCity(new String(cities[generateInt(0, 4)]));
student.setClassNumber(generateInt(1, 5));
student.setFullName(generateFullName());
student.setUniversity(new String(university[generateInt(0, 4)])); return student;
}
and count how many entities are equal with this pretty heavy method
private static int equalStudentsCount(List<Student> students) {
int similar = 0;
final int size = students.size();
final long start = System.currentTimeMillis();
for (int i = 0; i < size; i++) {
for (int j = i + 1; j < size; j++) {
if (students.get(i).equals(students.get(j))) {
similar++;
}
}
}
return similar;
}
For exact results, we will test these methods with the same list of values. Each method we will execute 100 times and calculate the average value
public static void countAverage() {
int size = 1500;
int loops = 100;
final List<Student> students = generateStudentList(size);
final long start = System.currentTimeMillis();
for (int i = 0; i < loops; i++) {
equalStudentsCountIDEA(students);
}
final long result =(System.currentTimeMillis() - start)/loops;
}
Here is the result of tests:
As you can see IntelliJ IDEA generated equals() method is pretty fast but not in all cases. But how? In this method we compare integer value first and then all strings. The main reason for such behavior is the uniqueness of field values.
The compering of primitive values before strings is a good performance perspective but if these values are common this could create excessively checks in the method. For example, if we have 1 thousand entities and all of them have the same boolean value, there are not reasons to put boolean check higher than checking of other fields even though boolean comparing is extremely faster than string or integer.
In our case, our integer field classNumber is between 1 and 5, it has low uniqueness, so there are no reasons to put it on the top of checks. Fields city and university have the same uniqueness. The most unique field in Student.class is fullName, there are 25 possible combinations of the first name and second name. But why then Custom #1 not always faster than IDE generated?
Let’s take a look at the String.class equals method:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
First “if” section checks if objects link on the same object in the string pool(This is why I during generating set Student city and university using new String()
). After this method checks, if strings have the same length, this makes String equals() method sometimes really quick. But if Strings have the same length the bad part comes, method step by step compare every char of String. This is the worst part of the String.class equals() method and the reason why people, usually, put string comparing on the bottom of the equals() method.
Also, this is the reason why Custom #1 equals() method not always faster than IntelliJ IDEA one. Some combinations have the same length but different values (For example “Mary Brown” and “Mary Jones”, “Patricia Smith” and “Patricia Jones”), but before String.class equals() method figured out that these strings are different It had compared a lot of chars.
Now let’s test these methods with just unique first names, without last names
private static final String[] uniqueNames = {"James", "Mary", "John", "Patricia", "Robert", "William", "David",
"Richard", "Thomas", "Charles", "Daniel", "Matthew", "Anthony", "Donald", "Mark", "Paul", "Steven", "Andrew",
"Kenneth", "Joshua", "George", "Kevin", "Brian", "Edward", "Ronald"};
As we can see, the results are quite impressive, let’s compare Custom #1 and IDEA generated equals() methods more detail.
In some cases more unique values let higher speed to Custom #1 equals() method, this increase is not significant if we compare Custom #1 and IntelliJ generated methods, but in general this change increased speed of all equals methods, up to more than 600 milliseconds.
Of course this speed increase is not huge and it’s not worth writing everywhere your custom equals() method. I prefer to use Lombok everywhere where it is possible too. This helps to keep code cleaner. But this is just a simple template, in real projects we have entities with many more fields and more difficult hierarchy. And wrong equals() method can significantly reduce the speed of application.
IntelliJ equals() method generator thinks smartly, It puts first primitive values like boolean, int, short, etc. But this doesn’t work in all cases.
So there a couple of rules that can increase the speed of methods that use equals() method:
- Put unique field checks first, even if this is a String.class objects.
- Pay attention to the Strings long and type of uniqueness before putting it on the top of equals() checks. If every String is unique but very long and differs by only value in the end (For example
very-very-long-string-1
andvery-very-long-string-2
), this is a bad idea to put this field comparing on the top of equals() checks. - If you don’t sure about the uniqueness of each field in the database then put booleans, enums, integers first.
- Don’t forget about the null check before compering all fields of entities.
I hope this article helps you to improve your application speed. Usually, we choose less code amount rather than application performance, for someone this test results might be not significant, but nowadays a lot of applications contain millions of entries in the database with hundreds of fields and use heavy algorithms with high complexity that may use equals() method, so this milliseconds easily may turn into seconds or even minutes.