OutSystems Compiler Data Optimizations

Published in

OutSystems Engineering

8 min readApr 22, 2016

The OutSystems platform uses a visual programming language to represent our applications. It is this visual language that enables our customers to easily create and maintain amazing applications.

However, that language can’t be used directly by the machine, and so it needs to be translated. This translation is a phase that happens when you publish your applications. We call this translation phase the compilation, and like many other programming languages that are compiled, we use this phase to perform some optimizations.

In this blog post I will talk about one class of optimizations that we perform during compilation, which we’ll call compiler data optimizations.

First, I will present to you the problems that we are addressing with the optimizations and next I will explain how we achieved that.

The need for Compiler Data Optimizations

Typical server-client interaction

OutSystems applications are server-rendered web applications.

This means that the typical flow of our applications is like this:

NOTE: For AJAX interactions, the flow is more or less the same. In AJAX interactions we skip the preparation after the screen action and only render part of the screen.

If you analyse the picture you’ll see one arrow that represents data that is sent from the server to the client and one arrow that represents data that is sent from the client back to the server. Those data exchanges happen a lot in our modern interactive applications.

Although we know that Internet bandwidth is increasing every day, there is still the need to keep the size of data that travels back and forth at a minimum so the interactivity and the overall responsiveness of the application stays great.

Data exchange between server and client

Consider the following screen of a simple application built with the OutSystems Platform:

And the following screen action associated with the click of the Apply button:

Some things of notice:

We read two attributes of the Contact entity list in the UpdateContactPhone screen action. This data must be kept somewhere so it’s available when the screen action executes. Since the HTTP protocol is stateless this data is sent to the client along with the screen, so when we execute the server action it is available. This data is what we call viewstate (for historical reasons, since the first ASP.NET implementations also called it that).
The UpdateContactPhone screen action may be called for any row of the list, so the entire list needs to be stored in the viewstate.
We only need two attributes of the Contact entity in the screen action. This is especially important because the Contact entity contains the binary that represents the photo of the Contact.
We never need the Photo attribute in the screen. So we don’t even need to get it from the database.
After UpdateContactPhone screen action ends, the preparation is called again and the screen is re-rendered. During this we read the Contact entity list again. However we don’t need the to keep that list because is going to be re-created in the preparation when we run the aggregate to fetch the Contact entity list again.

Without any compilation optimizations, we must assume that anything can be read anywhere and all the screen data must go into the viewstate. However in practice this is a waste of resources because, as we said above, we usually only need a part of it.

In this example, only the attributes Id and Phone of the Contact entity list are needed in the viewstate. There’s no need to allocate resources to save all those photos to the viewstate and bounce them back and forwards between the client and the server.

How the Compiler Data Optimizations work

General Strategy

The Compiler Data Optimizations work by performing what is known in compiler theory as live variable analysis. From wikipedia:

In compiler theory, live variable analysis (or simply liveness analysis) is a classic data-flow analysis performed by compilers to calculate for each program point the variables that may be potentially read before their next write, that is, the variables that are live at the exit from each program point. Stated simply: a variable is live if it holds a value that may be needed in the future.

We need to analyse variable liveness in two specific points: At the end of the screen rendering, so we know exactly what needs to be stored in the viewstate; and on every aggregate execution so we know exactly what is needed from the database.

We made some small changes to the live variable analysis algorithm, but the basic concepts still remain.

This algorithm uses a graph to represent the data-flow of what we need to analyse and then perform a “backwards way” analysis (The analysis is done in a backwards order, i.e., in the opposite of the normal execution order).

Graph representation

Like we said in the previous section, we start by constructing a graph that represents the data-flow during the execution of the screen. This graph is constructed as follows:

Note that each circle represents a sub graph:

The preparation sub graph represents the preparation action. Since in the OutSystems language the actions are already represented as graphs, this sub graph is usually the same as the preparation action graph itself. The end nodes of the preparation are connected to the first node of the screen rendering sub graph. The destination nodes are not connected to anything.
The screen rendering sub graph is in fact a sequence of nodes representing the widgets of the screen. The last node of this sequence is connected to each of the screen action sub graph.
The screen action sub graphs represent the flow of each screen action. Like the preparation sub graph, the screen action sub graphs are usually the same as the screen action graphs themselves. The end nodes of each screen action are connected to the preparation sub graph in the loop-back.

Identifier Concept

Instead of working directly with variables like any common live variable analysis algorithm, we created a concept called Identifier.

Those Identifiers represent unique pieces of basic types of information, be it attributes of a compound object or list or simply variables of basic types. In the case of composite objects or lists, the identifier is a sequence, starting with the variable reference and containing all the attributes names in the path.

This will help us optimize the data, down to the minimum indivisible data type. As our applications have typically large objects containing multiple fields (and like we said in the previous section, sometimes with large binary chunks of data), this has significant impact in the performance of the algorithm.

Examples:

Variable called Enabled, datatype Boolean, has Identifier [Enabled]
Variable called Count, datatype Integer, has Identifier [Count]
Variable called AddressInfo, whose datatype is a composite object (Address) containing attributes City (datatype: Text) and Country (datatype: date), has two Identifiers: [AddressInfo, City] and [AddressInfo, Country]. Since the AddressInfo is a composite object we flat-out it’s 2 attributes into 2 identifiers. Those identifiers are both sequences.
Variable called UserInfo, whose datatype is a composite object (User) containing attributes Name (datatype: Text) and Address (datatype: complex object Address), has three Identifiers: [UserInfo, Name], [UserInfo, Address, City] and [UserInfo, Address, Country]. Note that we flat out any composite object attributes in the path.

Graph Node Information

Each node contains the following information:

In[N]: a set of Identifiers alive at the point immediately before the node N. This is also sometimes called LiveIN[N] or Input[N] in some live variable analysis implementations.
Out[N]: a set of Identifiers alive at the point immediately after the node N. This is also sometimes called LiveOUT[N] or Output[N] in some implementations.
Assignment[N]: this can be either empty or a pair of (identifier, expression) if the node N contains an assignment. The first element of this pair is the identifier of the left-side of the assignment. The second element of this pair is the expression of the right-side of the assignment. Expressions in OutSystems are like in any other language, basically any legal combination of symbols that represents a value. This isn’t used in a common live variable analysis implementation.
Def[N]: a set of Identifiers that are assigned a value in N (in some implementations, Def[N] is also defined as a set of Identifiers assigned a value in N before any use, but this doesn’t change the solution of the dataflow equation). This is also sometimes called Kill[N] in some implementations.
Use[N]: the set of Identifiers that are used in N before any assignment. Sometimes called Gen[N] in some implementations. What is different from common live variable analysis implementations is that the right side Identifiers on any assignment are usually not considered uses. The exceptions are Identifiers used as arguments to functions and the right side Identifiers in a assign to a global variable (like a session variable).

Algorithm

The algorithm proceeds as follows:

While changes to any of the in’s occur do
 For each node N do Begin
  Out[N] = Disjunction of all In[s], where s is a successor of N
  In[N] = Use[N] + (Out[N] — Def[N])  If exist assignment[N]
   For each outputID in Out[N]
    If left-side[assignment[N]] is a prefix of outputID
     If right-side[assignment[N]] is a Identifier Begin
      suffix = suffix of outputID not in left-side[assignment[N]]
      Add right-side[assignment[N]].suffix to In[N]
     End Else
      Add all Identifiers in rigth-side[assignment[N]] to In[N]
End

Algorithm Notes

The algorithm basically runs for all nodes while there are changes in In[N] for any node N.

Note that the algorithm treats assignments in a special way: if we assign from one identifier to another, we only transport the liveness of the identifier on the right side to the identifier on the left side. So a flow containing only assignments doesn’t affect the liveness of variables at all and may be seen as a NOP.

What this means in practice is that we can assign one composite object to another one and this won’t mark the whole object as being used. Like we said before, since the objects in our applications tend to have lots of attributes, this also has a big impact in the output of the algorithm.

Conclusions

The usage of this algorithm assures that we keep viewstate usage as minimum. For the example screen, if we assume that each contact photo has an average of 40KB (if we use thumbnails instead of original resolution images) and the contact list contains 50 entries at a time (the default Line Count for OutSystems list widgets), we save 2MB on the viewstate and we also manage those 2MB in the Application Server<->Database communication, which help us make our applications more scalable while also giving a better experience to the end users of those applications.