Autocompletion in Playground of Apache Beam

Published in

Akvelon

10 min readAug 30, 2023

Interesting tasks often arise where several technologies meet together, especially those that are not typically integrated. This blog post will be just such a case. Today, you will learn the story of how merging several technologies allowed us to add the Java add-in for the Apache Beam SDK to a Сode Editor that was written in Flutter.

With that, we will introduce to you the project I contributed to: the Apache Beam Playground. Apache Beam is an open-source model that unifies batch and streaming pipelines. By simplifying large-scale data processing, it allows developers to build programs that define the pipeline using one of the Apache Beam SDKs. For our story today, it suffices to say that Apache Beam allows you to create data processing pipelines. You can use it to do everything from a simple migration to parallel processing and analysis of data streams. It can be used in absolutely different scenarios: from processing server logs and detecting data anomalies in analytics, to preparing data from machine learning models.

You can download and run it locally, but if you need to test small things, you can work with it directly in your browser in a specially-created environment called Beam Playground, that uses Flutter for the responsive UX (or UX frontend). The Playground provides an interactive, feature-reach code editor for developers to try pipeline examples and their own code with Apache Beam SDKs. When you click on Run, the code you’ve written will be compiled and run on the server, and then the result will be displayed to you in the normal console as well as in the form of a graph. And to get a better understanding of the SDK, Playground already has a lot of ready-to-run examples in each of the available languages: Java, Go, Scala, and Python.

Akvelon’s team is proud of our contributions to making Playground functional and user-friendly: everything you see on the Playground page was made by us, and we also developed a Flutter-based code editor as part of the project, which you can download as a separate package and use in your own project.

Code Add-On Functionality

Idea

And now I’ll tell you about one of the features we created that is familiar to all programmers: code autocompletion, which can be enabled by pressing a key from the suggested options in the pop-up window.

When we first started working on this feature, we had a question: When using Dart, how can we know about all classes and methods that are contained in the Java SDK for Apache Beam? Actually, Apache Beam is an open-source project, so you can look at the code if you want as I go along. Our entry point into the project will be a folder that contains the SDK for each language, from which we will take information about its classes and methods. Of course, we can’t work with Java directly from Dart, so we will have to pass the data through some bridge. In our project we have taken a yaml file as the standard. It will contain all information about the SDK in a hierarchical structure. This file will be generated from a Java program, and this program will be automatically started from a Gradle task list in some scenarios.

Creating java program

After the task is clarified, we must write a Java program that will extract the information about the contents of the third-party library. There are two ways to do this task: Reflection and Ast tree. If you have worked with Java, you have probably already heard about Reflection: it is a mechanism embedded in the language itself that allows you to analyze its own structure in runtime. It’s probably the most popular method for Java, and we tried it too, but we ran into periodic ClassNotDefFoundException errors, which is one of the most common problems that arise when you parse a library in the runtime.

We can get rid of this by gathering information about the library solely from its files using the Ast tree. This is the method we have stopped at. The Ast-tree stores components of the program code down to its separate operands.

For clarity, let’s look at the simplest example from geeksforgeeks, how “hello, world” is split into ast. It all starts with the definition of the class, its modifier, and name. Then, a block with the contents of the class is defined. It contains information about the method definition with all of its modifiers. There is also information about the return type and the arguments that it takes. Finally, there is information about the contents of the method: the “hello, world” function call. This is more than enough, because for our case we only need the class and method modifiers and their names.

Now we have all the information we need to figure out how our Java program should work. Its principle is quite simple — the main method contains just four lines: first we read the path to the SDK, then we collect all data about classes as well as their methods and properties via ast tree, write them into a file, and output them via the output channel.

Despite the apparent simplicity, there were a few things to consider:

We can’t just ask the library to return classes and methods of the entire package only by the folder path — we had to go through the entire file structure in the folder, extract class names from the file name, and then take all public methods and properties from that class.
It was important to discard all the test classes, because they’re useless in the autocomplete.
We decided to keep all the classes and methods in the file in alphabetical order, so that it would be easier to keep track of them and there would be less conflict bugs.

Now that we have the code of the program, we need to understand the mechanism by which it runs. As I said before, we will run it through a Gradle task. We will give the program two parameters. The first one is the path to the SDK. We did not include it directly into the program so we could have more flexibility and easier testing. In addition to that, we will specify the output channel in the task (a file with the extension yaml in the folder where yaml files for all languages are stored).

Finally, we can run the task and see the result. A file that is nearly 250 kb in size and containing 14,000 strings has been generated. We can already read this file from Dart. However, it is not enough to allow us to insert the generated methods into the tooltip. After all, we can’t be sure that our Java program works as it should without testing it.

Test coverage

Since we have three other languages besides Java, we will test everything from Dart code to unify. Here is how we will do it: we will create a moc library with empty classes and methods, run our program by feeding the path to this moc library as an input. Then we will compare the result of the program with a pre-created golden yaml file that contains the correct description of classes and methods of the wet library.

However, there is one problem with this way of testing: we can’t just run the Java code — we need to compile it first. And we have never thought about it before because Gradle did everything for us — we just gave it the dependencies and then it already built the build folder for us. In the case of running from Dart, it doesn’t work that way. We could think about running Gradle directly from Dart, but still we wouldn’t want to depend on Gradle for the test code. We will be doing everything on our own.

Firstly, we should define dependencies, for which we have two libraries: for creating ast tree and for yaml file generation. We found jar paths on maven and will download them into our project with the versions we specified, if they are not already downloaded. By the way, this has yet to be automated. If you have any ideas on how to do this, please post in them comments, as I’d be interested to hear your thoughts. Then, we need to generate a classpath, which is a string that contains paths to all the dependencies we need to compile the program.

After that, we will run the javac command with all the parameters we need to compile. Now we’re ready and we can run the method I mentioned before, which will run the program on the wet library and compare the result with the golden yaml file.

Finally, we have a fully working and tested program that extracts all the information we need about the java SDK and writes it into a yaml file. Now we can move on to Dart. Let’s look at how the yaml files for languages are loaded from the Flutter application itself.

Display suggestions

We will start with the fact that conventionally, our entire project can be divided into two big parts: the Flutter Code Editor and the Playground frontend. Everything that is related to working with code is logically put in a separate independent project. Moreover, it is useful not only for us: it already has about 70 stars on GitHub. For the client, Code Editor is just a widget that you can embed anywhere. It can highlight code for many languages, collapse blocks, work with autocomplete, make code parts read-only or invisible for different needs, and many other useful things. Since Code Editor is a single widget, all the logic of working with it is contained in one controller, the Code Controller. It, in turn, contains other control nodes: for folding, text highlighting, and so on. For autocompletion, our editor has the autocompleter class. In addition to methods that are used exclusively inside the library, autocompleter supplies the setCustomWords method. Just through this method, all information collected in java.g.yaml will be put into autocompleter, and then the entire logic of tips display will be controlled by code editor, on the Playground side it will be enough just to put the data into this method.

Speaking about the Playground part itself, one of the important decisions is that we don’t need to load all the files at once to save resources. The files are big and processing them takes time, so users are unlikely to use several languages at once and are more likely to stick with the one they know best. Since our hints do not contain context yet, we will store them in a String list that we will wrap with the SymbolsDictionary class. Our string will be filled with the YamlSymbolsLoader class. It is a very simple class: its main task is to get the path, use it to read the file, properly parse it, and put all values from the file into our SymbolsDictionary.

What remains to be understood is, when should we pull the loading of dictionaries? We have only two such events in our project: the initial download and the change of the example language. In order to have a single entry point for notifications about the need to load the dictionary for the language, we created a class SymbolsNotifier. It is a simple class that provides two methods: getDictionary by language and addLoaderIfNot, which is just called when the first example is loaded or the language changes. It loads the SDK data through the provided loader and stores it in an internal private dictionary, which it then gives back via getDictionary.

The SymbolsNotifier change listener is the SnippetFileEditingController, which is responsible for the state of the example file on the Playground side (and there can be multiple files in an example). It passes the data on to the autocompleter, because the Code Controller of the Code Editor is also contained within it.

Conclusion

We now have a complete picture of how this feature will work in our project, and I hope that you found this brief overview of Apache Beam Playground and Flutter Code Editor interesting. Going back to autocomplete: this is not the final version of the feature. At the moment, we don’t analyze context and show in general all methods and classes that match the substring we’re looking for. In the future, we can work to display only those functions that are in the class variable, or those classes that are in the imports. In addition, we hope to add the method signature and its documentation in the tooltips in the future.

Autocomplete is certainly an interesting feature, but it’s not the only one. If you’re interested in hearing more about Playground or Code Controller, reach out to us at welcome@akvelon.com.

We invite you to try Flutter Code Editor on pub.dev or directly from https://github.com/akvelon/flutter-code-editor, and we encourage you to share your feedback or report any issues here that you may face.

About Akvelon

To get in touch with Akvelon, request a custom feature, or share about your use case by sending a message at https://akvelon.com/contact-us/.

Akvelon provides Flutter, Web, and Mobile application development services - get in touch to request a custom app or modernize your existing app experience!

To learn more about Akvelon’s ML, Data and Analytics, and other services visit Akvelon.com or check out Akvelon’s Blog