Austin Brougher
Data Weekly by Jumpmind
4 min readSep 10, 2018

--

Increase METL Flow Performance with ‘Temp RDBMS’ Indexes

When combining data from multiple sources, I tend to reach for the ‘Temp RDMBS’ component. My familiarity with SQL and the ease of use make the decision almost automatic.

The ‘Temp RDBMS’ component is a quick way to build a temporary database that exists only while the flow is running. An H2 database is created during the initialization. Users define the input model at design time. When the database is created, tables are created to match the input model.

Once the flow is running, source components will feed the ‘Temp RDBMS’ with entity messages. Each entity will be inserted into the correct table. At a minimum, the user will define the select SQL to determine the contents of the output messages. There are situations where some additional DDL configuration can dramatically speed up the query. Once the flow is complete, the database is cleaned up leaving no trace.

Let's work through a quick example to demonstrate when adding an index to the temporary database is useful. Consider a situation where we have two data sources. The first is a database. The second is a data file. Our database has a People table. The people table has three columns; id, name, and age. The ID and name are assigned to each Person but the age is missing. We need to look up the age of each person by matching the name in the database to the name in the data file.

In order to test how much an index can speed up our flow, we want to use a lot of data. The best way to test with a lot of data is to use a script component that will generate data for both our database source and our file source. This script will feed the random entity messages to our Temp RDMBS and we’ll output the results of the Temp RDBMS to a ‘No Op’ component. In reality, we would use an ‘RDBMS Reader’ and a File reader as sources and an ‘RDBMS Writer’ to write the result back to the source database in place of the ‘No Op’.

To create our random data we will use the following script.

The above script will create records for both the database and the file source. The only thing in common between the two sources is the Person.name.

Once our script is complete, we should be stuffing lots of random data into our ‘Temp RDBMS’. The next step is to configure a query to pull that data back out of our temporary database.

We can use the following query to join on the name field.

In the above SQL you can see we query the id, and the name from source a (our database) and we return the age from source b (our file).

When we run this flow, the script will mock 10,000 database records and 10,000 file records. All of this random data will be loaded into the ‘Temp RDBMS’ and the SQL query will be executed. The join will take some time since the name is not a primary key and is therefore not automatically indexed.

Running the flow as-is took roughly 24 seconds for me. Of course, if we were responsible for looking up the age of 25,000 employee’s, this would take exponentially more time. The solution is to add an index to the database so names can be looked up quickly.

From within the ‘Temp RDBMS’ advanced editor window, we see the SQL query by default. If we click the dropdown, we can change the SQL option to DDL. For our example, we will add the following DDL. This DDL will be executed after the database tables are created but before any data is loaded.

Now when we run the flow, the join is much quicker and the flow completes in 1 second. As we add more random records, the flow run time increases exponentially without the index. With the index, the runtime remains close to 1 second.

Adding indexes to the ‘Temp RDMBS’ is one of the easiest ways to dramatically improve flow performance, but don’t limit yourself to only creating indexes. You can use any legal H2 DDL grammar including adding a sequence or even trigger logic.

The ‘Temp RDBMS’ component is one of my favorite go-to tools in the Metl toolbox. Once a user can define their own DDL, this tool becomes even more useful. Let us know in the comments how you use this component and if there is anything we can do to make it even better.

--

--