VocabHunter — A tool for learners of foreign languages

By Adam Carroll

Working here at King you meet people from all over the world. In my case I’m English but living and working in Barcelona. I’m busy learning Catalan and being a techie, I’ve developed some software to help in the learning process. One of the important of steps in the process of learning a language is starting to read in that language. However, a problem that many of us face is that we don’t have enough vocabulary to understand the material we want to read. My idea was to make this easier by providing an easy way to find new vocabulary in a document so that someone can learn it before reading. Enter VocabHunter…

VocabHunter is an Open Source desktop application that runs on Mac, Linux, or Windows and can be used for analysing documents. It can read a variety of document formats and the user can easily step through the words in a document to find the important words that they don’t yet know. Here’s an example of VocabHunter in use, in this case looking at the English vocabulary in Great Expectations:

Java 8

Apart from wanting a tool to help learning foreign languages, I wanted to try building something from the ground up in Java 8. Lambdas and streams are two of the stand-out features of this version of Java and are something that internally at King we’re starting to use right across our Java codebase. Analysing text provides a great chance to play with these new features and creating VocabHunter was the perfect opportunity to put these skills into action.

Java 8 streams provide facilities for mapping, filtering and grouping. You can think of them as being a bit like working with SQL for Java collections. One of the tasks that VocabHunter performs while analysing a document is to take an individual sentence and find the word uses within that sentence. These uses are then combined with those from all the other sentences in the document to form the results of the textual analysis.

When we work with Java 8 streams, we work in a declarative way. It is often helpful to think of the use of streams as being like a recipe made from a sequence of steps. In order to find the word uses in a specific sentence, VocabHunter uses a chain of Java 8 stream operations which can be visualised and understood as follows:

String: a sentence from the document being analysed
|
| split the sentence into words using a regular expression
v
Stream : The words in the sentence, possibly including blanks
|
| Filter: remove any blanks and words with fewer than the minimum letters
v
Stream : The words in the sentence
|
| Group: Create a map from each word to the number of times it is used
| in the sentence
v
Map<String, Long>: A use count for each word in the sentence
|
| Map: Create a WordUse object for each of the counted words from the sentence
v
Stream: A Java 8 stream of WordUse objects for the sentence all ready to be combined

The Java 8 code that does this is expressed in the source code as follows:

private Stream<WordUse> uses(String line, int minLetters) {
Map<String, Long> counts = PATTERN.splitAsStream(line)
.filter(w -> !w.isEmpty() && w.length() >= minLetters)
.map(String::toLowerCase)
.collect(groupingBy(Function.identity(), counting()));

return counts.entrySet().stream()
.map(e ->
new WordUse(e.getKey(), e.getValue().intValue(), line));
}

You can see this code in its context in the following class: SimpleAnalyser.java

The above is the heart of VocabHunter! In a small class we have the majority of the logic for analysing documents. Much of the rest of the program code is taken up with building the Graphical User Interface.

JavaFX

VocabHunter has a simple and I hope user-friendly GUI written in JavaFX. This rich client platform now comes as standard with Java 8 and is intended to replace Swing. If you’ve worked with Swing in the past you’ll really appreciate the more modern approach of JavaFX. To start with, as I hope you can see from the screenshot at the beginning of this blog post, JavaFX has a fresh new look. The framework uses CSS to style the user interface and this is put to good effect in VocabHunter to transform a boring button like this:

to something much brighter, easier to understand, like this:

Working with CSS in this way instead of embedding the styles in the Java code provides a great separation of concerns and means that someone more skilled than me at design can help to improve the look of the program without their needing to be Java programmer. The VocabHunter logo shown on the front screen is simple text styled with CSS to impressive effect:

This is achieved with the following CSS:

.title.left {
-fx-fill: linear-gradient(
from 0% 0% to 100% 200%, repeat, #EBFFF0 0%, #008000 90%);
}

.title.right {
-fx-fill: linear-gradient(
from 0% 0% to 100% 200%, repeat, #FFF0ED 0%, #a43030 90%);
}

In a similar vein, the user interface is laid out using FXML an XML-based layout language that avoids the need to describe layout in Java as people used to do with Swing. There is even a free graphical editor for FXML, the Scene Builder, to help putting together this markup.

JavaFX has a lot of other features that are worth exploring and in VocabHunter I’ve made extensive use of the Property Binding feature to connect up parts of the UI. Please feel free to have a look around the source code on GitHub to see how it all works.

Open Source

At King, we use a lot of Open Source software and it seemed natural to make this personal project Open Source too. As a Java developer, it was the perfect chance for me expand my understanding of Java 8 while creating a tool that my colleagues, friends and the wider Open Source community could use regularly and innovate as they wish. You can find it on GitHub here: https://github.com/VocabHunter/VocabHunter.

One of the great things about Open Source is the wealth of libraries available out there. VocabHunter for example makes use of the Apache Tika library to read documents. Using this fantastic software the program is able to read all sorts of different file formats, including Microsoft Word, PDF, OpenDocument, and many more!

If you think you can help out, your code contributions would be most welcome. Or perhaps you’re a graphic designer and can improve on my layouts, icons, colour schemes and so forth. Please feel free to send me a pull request.

Join the Kingdom — JobsAtKing

Originally published by Adam Carroll at techblog.king.com on March 21, 2016.