A Java Tour

Published in

Analytics Vidhya

10 min readMay 5, 2020

Let us benefit from free time we have these days by revisiting the Java programming language!

Java is widely recognized to be a powerful language producing a high quality code… well at least if you can handle the beast! Because it is also true that Java is a (quite) verbose language that does not try to hide you anything: the lack of syntactic sugar may make you feel close to what a computer actually does, and this feature can be a source of frustrations for beginners or even advanced scripters usually working with JavaScript or Python. Even though, the Java community is well aware about it and makes great efforts to improve the product in a fully backward compatible way.

Maybe you’ve learned programming with Java, and maybe it was a complete disaster! Then, as many people, you discovered Python, TypeScript, Kotline or whatever, and it felt like a breeze! All these years, you kept thinking that Java was an outdated language, slowly performing and still in used only because of existing code base. Well, let’s see what modern Java really looks like.

In this small talk, we are going to study a use case of some realistic problem. A customer has a data source (database, file, whatever) that we can query to obtain information of the form

people_name ; zip_code ; waste_weight_yesterday

We have no direct access to the data source, as often. Rather, we need to query it to an external service:

public interface RemoteService {
   InputStream datasource() throws IOException;
}

The customer requires us to find the top 3 of the people (String) that produced the highest amount of waste yesterday in decigrams (int), by zip code (String). The data source is expected to be safe of any corrupted data (it usually never happens, but let’s pretend!). The customer asks to do it with Java for integration in its own code base. Hopefully, what a chance! I’m just writing about this problem, so you won’t have to code anything, just copy/paste: lucky you.

The minimal requirement for a good architecture is at least a sketch of the interface that describes the result of our computation. Basically, our report will be some object we can manipulate as follows:

interface Report {   enum TopPosition {
      TOP_1, TOP_2, TOP_3;
   }   Optional<Entry> getTop(String zipCode, TopPosition position);
   List<String> getAllZipCodes();
}record Entry {
   String name();
   String zipCode();
   int wasteAmount();
}

Note: We‘ll discuss the record type later on. For now, let’s skip this!

Note: We should have designed the interface with a Set instead of a List , but it will not fit well with the below paragraph! So after this paragraph, make the mental exercise to change List to Set by yourself !

Before implementing any further the mechanisms, let’s warm up by querying the Report in a non trivial way. The small exercise we address is to find all the top 1. Let see different implementations. The bad Java first:

List<Entry> top1s = new ArrayList<>(); // array-like, with mut. size
List<String> codes = report.getAllZipCodes();
for (int i = 0; i < codes.size(); i++) {
   Optional<Entry> top1 = getTop(codes.get(i), TOP_1);
   if (top1.isPresent()) {
      top1s.add(top1.get());
   }
}

Needless to say this implementation (which sadly still exists in the community) is a war declaration to good sense. If your List is a LinkedList , this above iteration is widely to be of order O(n²) . In addition, the ArrayList implementation may not be the best choice for many pushes as we do, since the internal array buffer inside needs to be copied many a large amount of time. Unfortunately, students are often told to code like this.

Let’s see another implementation, in a more modern way:

List<Entry> top1s = new Stack<>(); // pushing in stack is O(1)
for (String zipCode : report.getAllZipCodes())
   getTop(zipCode, TOP_1).ifPresent(top1s::push);

Here above, we stress the fact that a method reference top1s::push may be thought off as a lambda expression, which in turn may be thought off as an anonymous interface instantiation. The interface here is guessed by Java, since ifPresent on Optional<Entry> simply requires a Consumer<Entry> , so

Consumer<Entry> c = top1s::push;

is syntactically equivalent to

Consumer<Entry> c = entry -> top1s.push(entry);

which is in turn syntactically equivalent to

Consumer<Entry> c = new Consumer<>() {
   @Override public void accept(Entry entry) {
      top1s.push(entry);
   }
}

The lambda expressions and method references may be thought off as syntactic sugar the the latter. In fact, there are also compilation differences. In general, the compiler will prefer the method reference approach compared with the anonymous class one. For lambda expressions, it really depends on what’s the content of the lambda.

We can go further with the above example, by avoiding an a priori boiler-plated empty list initialization:

List<Entry> top1s = report.getAllZipCodes()
       .stream()
       .map(zipCode -> report.getTop(zipCode, TOP_1))
       .flatMap(Optional::stream)
       .collect(Collectors.toList());

It is interesting to note that although written in a functional style, the latter is actually more verbose than the second for loop based approach, without begin that much expressive. We have also lost the benefit of the explicit choice of list implementation.

Let us now turn to the implementation of our report and the data processing itself. We are going to proceed step by step. First we will process the lines as quickly as possible to free the resource:

Report process (RemoteService service) throws IOException {   List<String> rows = new Stack<>();   try(InputStream inputStream = service.datasource();
       Scanner scanner = new Scanner(inputStream)
   ) {
      while(scanner.hasNextLine())
         rows.push(scanner.nextLine());
   }

Maybe you are not familiar with the try-with-resource syntax, also it’s been a language feature since JDK7. The try-with-resource argument may accept any list of AutoCloseable subtype, and at the end of the try-block, or if any exception occur, each of the provided AutoCloseable with be… automatically closed. This means that in Java, you should never call the close method yourself. This is similar to the with syntax of Python.

We are now going to process each line. The client needs require us to aggregate each Stringaccording to their zip code, and then find the top 3. Let us wrap each String to some more convenient class:

class Row {
   private final String source, zipCode;
   private final int wasteAmount;
   Row(String source) {
      this.source = source;
      zipCode = source.substring(
                        source.indexOf(";")+1,
                        source.lastIndexOf(";")
                    );
      wasteAmount = Integer.parseInt(
                        source.substring(source.lastIndexOf(";"))
                    );
   }   String getZipCode() { return zipCode; }
   String getName() {
      return source.substring(0, source.indexOf(";"));
   }
   int getWasteAmount() { return wasteAmount; }
}

As you can see, this class is constructed from a String but keep a more complex internal state, as it pre-extracts the zip code and it converts to integer the amount of waste. It finally provides methods to access those information, and the name, in a more transparent way. Also note the encapsulation pattern which fully protects the internal state of the instance. From the external point of view, only the getters exist: a client of this class should not know more.

Is such a wrapping of the String source efficient? The cost of the String is roughly 64 bits for the internal array char[] reference + 16 bits x length for each character in the array, so

String_weight ~ 64 + length * 16

In a common use case, the name is of order 2⁴ and the zip code of order 2². The waste amount should be quite small in decimal representation, let’s say again 2² digits. Hence the weight of the source field is about

source_weight ~ 64 + 2^5 * 16 ~ 2^9

The weight of our class is 64-bits by internal object references (and we have 2 of them, 2 String), 32 bits for the int primitive type, and an additional cost for the weight of each string, so:

row_weight ~ 64 + (64 + 2^5 * 16) + 64 + (64 + 2^2 * 16) + 32 ~ 2^10

The ratio is 1.5, which is not that big. Conversion from String to Row would not be such a concern, and it will greatly improve the algorithmic aspect. Conversion can be done in a very smooth way using Stream fashion:

rows.stream().map(Row::new)  // Stream<Row>

See how we called the constructor of Row via a method reference. We know have a stream of rows. There is no need to convert it again to a List , we can already aggregate using the collect facilities of Stream:

rows.stream().map(Row::new)
             .collect(Collectors.groupingBy(Row::getZip))

The result of grouping is a Map<String,List<Row>> that maps every zip code (String) to a list of corresponding Row . We are now going to iterate over each key of this map, and for each key, sort the list according to the waste amount and limit the result to 3 elements. Again, the Stream interface is rich enough to handle this computation:

rows.stream().map(Row::new)
             .collect(Collectors.groupingBy(Row::getZip))
             .entrySet().stream()
             // Stream<Map.Entry<String, List<Row>>>
             .collect(Collectors.toMap(
                 Map.Entry::getKey,
                 entry -> entry.getValue().stream() // Stream<Row>
                            .sorted(???)            // to do
                            .limit(3)               // Stream<Row>
                            .toArray(Row[]::new)    // Row[]
             ));

Here above, the sorted method on steam requires a Comparator<Row> . The definition is pretty clear:

@FunctionalInterface public interface Comparator<T>

The @FunctionalInterface annotation means that a comparator only has one abstract method. It is an indication that one is invited to instantiate such interface either in class implementation, or via method reference.

In our case, it would not make much sense to implement the Comparator interface at the level of the Row , because they are not all comparable from the business point of view: we are only asked to compare them zip code by zip code. (Off course it would make sense to globally compare them, but it is an extrapolation). Digging a bit, we discover that there exists a facility in JDK8 that will extract int from a Row as uses it as comparison key:

.sorted(Comparator.comparingInt(Row::getWasteAmount))

To sum up, we have now

var mapping
  = rows.stream().map(Row::new)
        .collect(Collectors.groupingBy(Row::getZip))
        .entrySet().stream()
        .collect(Collectors.toMap(Map.Entry::getKey,
            entry -> entry.getValue().stream()
                          .sorted(comparingInt(row::getWasteAmount))
                          .limit(3)
                          .toArray(Row[]::new)
        ));

In the above last collect, we could have been collecting using a List but the array solution is usually cheaper.

We have also used the special var keyword, because the compiler can guess by itself that the right member is a Map<String,Row[]> and there is no need to repeat it. The var is a JDK10 feature.

All the lines have been processed successfully, well done! Let us now implement our Report and Entry types. We already have all the information collected in a Map<String, Row[]> , and we already have full control on the Map since we are the creator of it. Let’s just enclose this information in a Report shell:

class ReportImpl implements Report {
   private final Map<String, Row[]> mapping;
   ReportImpl(Map<String, Row[]> mapping) {
      this.mapping = mapping;
   }   @Override public Set<? extends String> getAllZipCodes() {
      return mapping.keySet();
   }   @Override public Optional<Entry> getTop(String zipCode,
                                           TopPosition position) {
      var idx = switch(position) {
         case TOP_1 -> 0;
         case TOP_2 -> 1;
         case TOP_3 -> { yield 2; };
      }
      return Optional.ofNullable(mapping.get(zipCode))
                     .map(arr -> arr.length > idx ? arr[idx]: null)
                     .map(???);
   }
}

Note the use of the var keyword to declare a variable that is initialized by the right member. Here, the compiler can infer by itself that idx is of type int .

Note also the use of the switch-expression here above. In JDK14, switch-expressions allow one to pattern match on primitive types, string, and enumerations (as the usual switch actually), with the following differences:

there is no fallthrough behavior: no need to break between cases, it’s more like a mapping
the right side of a mapping can be a block ended by a yield to provide a value, like in our third case
in case of matching on enumeration, there is no need for a default case: the compiler detects if the mapping is exhaustive.

Switch-expressions are aimed to be improved with sealed class in JDK15, but that’s for later!

For out concern, all we need now is to be able to create an Entry . Since an Entry is really a last process step exposed to the client, we are not going to pollute our Row class by making it an Entry , although it already contains all the info! Why? Because we want to keep these two separated from each other: a Row is purposed to represent a processable entity over which with we can compute, while an Entry is just an data transfer object, like a snapshot of the data.

That’s a perfect fit for the record introduced as a preview feature in JDK14! A record is nothing else than a class that cannot contain any other field than the one provided in its constructor. Remember how it was not the case for our Row class ? That’s because a Row is really something we manipulate and that may contain a lot of information. It is clearly different from the sum of its constructor parameters. It may goes differently for an Entry which is more like some heterogeneous-tuple we want to share with the client.

record Entry(String name,String zipCode,int wasteAmount) {
   EntryDto { // post-construct process, the record component exist!
      assert name != null;
      assert zipCode != null;
   }
}

The record mechanism automatically generates a method x() for each x field (component) provided in the constructor. It also accepts a post-constructor block that will be called when the components are provided. We can use it for very basic validations (here we protect against null references).

The record mechanism also automatically generates hashCode and equals methods for us, based on the field. The record is a kind of Scala case class, with the difference that:

it cannot contain other fields than the one provided at construction
nor a class, nor a record, can inherit from another record

Here is the final code, all together:

record Entry(String name,String zipCode,int wasteAmount) {
   Entry {
      assert name != null;
      assert zipCode != null;
   }
}class Row {
   private final String source, zipCode;
   private final int wasteAmount;
   Row(String source) {
      this.source = source;
      zipCode = source.substring(
                        source.indexOf(";")+1,
                        source.lastIndexOf(";")
                    ).trim();
      wasteAmount = Integer.parseInt(
                        source.substring(source.lastIndexOf(";"))
                              .trim()
                    );
   }   String getZipCode() { return zipCode; }
   String getName() {
      return source.substring(0, source.indexOf(";")).trim();
   }
   int getWasteAmount() { return wasteAmount; }
}class ReportImpl implements Report {
   private final Map<String, Row[]> mapping;
   ReportImpl(Map<String, Row[]> mapping) {
      this.mapping = mapping;
   }   @Override public Set<? extends String> getAllZipCodes() {
      return mapping.keySet();
   }@Override public Optional<Entry> getTop(String zipCode,
                                        TopPosition position) {
      var idx = switch(position) {
         case TOP_1 -> 0;
         case TOP_2 -> 1;
         case TOP_3 -> 2;
      }
      return Optional.ofNullable(mapping.get(zipCode))
                     .map(arr -> arr.length > idx ? arr[idx]: null)
                     .map(row -> new Entry(
                                        row.getName(),
                                        row.getZipCode(),
                                        row.getWasteAmount()
                     ));
   }
}Report process (RemoteService service) throws IOException {
   var rows = new Stack<String>();
   try(InputStream inputStream = service.datasource();
       Scanner scanner = new Scanner(inputStream)
   ) {
      while(scanner.hasNextLine())
         rows.push(scanner.nextLine());
   }
   var mapping = rows.stream()
        .map(Row::new)
        .collect(Collectors.groupingBy(Row::getZip))
        .entrySet().stream()
        .collect(Collectors.toMap(Map.Entry::getKey,
            entry -> entry.getValue().stream()
                          .sorted(comparingInt(row::getWasteAmount))
                          .limit(3)
                          .toArray(Row[]::new)
        ));
   return new ReportImpl(mapping);
}

Hope you learned something about Java in this small chat we had!

I whish you’re now convinced that Java is no pure-evil ultra verbose language. Don’t hesitate to share your feelings about it!

A Java Tour

Written by Justin Dekeyser