How to Decide Between Algorithm Outputs Using the Validation Error Rate

Monument
The Startup
Published in
3 min readAug 12, 2020

--

Monument (www.monument.ai) enables you to quickly apply algorithms to data in a no-code interface. But, after you drag the algorithms onto data to generate predictions, you need to decide which algorithm or combination of algorithms is most reliable for your task.

In the ocean temperature tutorial, we cleaned open remote sensing data and fed the data into Monument in order to forecast future ocean temperatures. In that case, we used visual inspection to evaluate the accuracy of different algorithms, which was possible because the historical data roughly formed a sine curve. Visual inspection is one tool in the data science toolbox, but there are other tools as well.

The Validation Error Rate is another useful tool in cases where you want to get more fine-grained or where visual inspection does not yield obvious insights. There are other error functions that can be used, but Validation Error Rate is the default error function in Monument.

What Is The Validation Error Rate And Why Is It Important?

The Validation Error Rate measures the distance between “out of sample” values and estimates produced by the algorithm. You can find this metric in the INFO box in the lower-left corner of the MODEL workspace.

As a general rule of thumb, the “more negative” your Validation Error Rate is, the more accurate the model is. Negative infinity would be a perfect model. In the real world, as we will see with our ocean temperature data, sometimes the best you can do is a small, but nevertheless positive number.

Currently, Monument only displays one Validation Error Rate at a time. To view the Validation Error Rate for other algorithms that you have trained, click the drop-down arrow on the right side of the algorithm pill and select SHOW ERROR RATE.

To compare the performance of the models, I have pasted below a table of all the Validation Error Rates applied to the ocean temperatures data, sorted from lowest to highest.

Algorithm Performance On The HABSOS Data

As we discovered in the tutorial, with default parameters, AR and G-DyBM perform the best on the cleaned and transformed data.

How To Improve Algorithm Performance

Typically, we can improve the Validation Error Rate — i.e. make it “more negative” — by adjusting the algorithms’ parameters. You can access an algorithm’s parameters by selecting PARAMETERS in the algorithm pill drop-down.

Choosing which parameters to edit to improve performance depends heavily on your business objectives and the nature of the data you’re looking at. We will cover common cases in future tutorials, but the best approach is to experiment yourself to develop an intuition around which parameters most improve results for different kinds of data.

Certain algorithms allow for automated parameter adjustment. In Monument, the LSTM and LightGBM algorithms also have “AutoML,” which is short for Automated Machine Learning. AutoML automatically adjusts an algorithm’s parameters to optimize performance. You can select AUTOML from the algorithm drop-down to access these capabilities.

For example, when we run AutoML on the HABSOS data, we can lower the Validation Error Rate by 0.04 from 3.273 to 3.233. Not a huge improvement on this particular data, but an improvement nonetheless. Often, the gains are much greater.

There are other reports within Monument that we can use to improve algorithm performance, including, dependent variables, forecast training convergence, and feature importance. We’ll explore these topics in future tutorials.

Interested in learning more about Monument? Book a free introductory Zoom call here.

--

--