TL;DR: Played around a bit with Rust+wasm and wrote a little crate to perform Levenberg-Marquardt fits. Then compared the binary execution, the wasm version, a JS implementation and a SciPy approach. To my delight found that the wasm version is equivalent to SciPy in results & execution speed.
In experimental science, one of the basic recurring tasks is to compare your experimental data with a parametrized model — to either search for physics in the data beyond the considered model or to characterize the data by fitting said parameters to the data. In the latter case, the fitting process is often a recurring task that is repeated for each new sample batch, and therefore the clever students go ahead and write short evaluation and plotting scripts that can be quickly utilized each time. As I am looking into Rust & WebAssembly (wasm) lately, I was wondering how a curve fitting implementation in this manner compares to other approaches.
Now I know one could basically use Django to write a Web application that uses the NumPy/SciPy library in the browser to quickly achieve similar results… but basically I was searching for a reason and a project to play around with Rust+wasm. And so I went ahead, and thought I write about my experience to anybody that is interested.
The Levenberg-Marquardt algorithm is effectively the standard when it comes to fitting parameters of a model to data. The basic idea is to use a combination of the Gradient-descent and the Gauss-Newton algorithm to effectively find a local minima close to some given starting values. The basic idea is to minimize the sum of square distance between data and model with respect to the measurement error (the bigger the error — the less important the distance is)
This is iteratively performed by updating p→p+δ, where the change δ is calculated from solving the system of linear equations
Where J is the Jacobian of the model with respect to the parameters, W the weighting matrix that is effectively a diagonal matrix with σᵢ as elements and λ is a iteratively varying damping factor. A nice recent write-up I followed on implementing the algorithm can for example be found here (by H. P. Gavin, Duke University).
The default approach: SciPy
To compare different implementations, I start with the one I know as default approach: just throw Python on it. But first we’ll need some data that we can fit. Today simulated data has to be enough. So, I choose a Gaussian curve with a constant offset as model for the discussion
Fitting with SciPy is quite straight forward. One just needs to define the distance function that is squared and summed in the χ² function, and then throw this together with initial parameters into the scipy.optimize.leastsq function. This is a wrapper around functions from a battle tested Fortran package MINPACK and should be considered as a reliable source for a correct implementation of the LM algorithm. I uploaded a short script that calls this scipy.optimize.leastsq here. The start parameters are chosen as
A = 20, μ = 3, σ = 0.2, c = 0 in this and each subsequent fit.
A = 41.3 +/- 1.6
μ = 4.200 +/- 0.028
σ = 0.630 +/- 0.030
c = 10.77 +/- 0.52
χ²/d.o.f = 0.922
Within the error bounds, this reproduces the generated model close to the real parameters that were used for data generation. The χ² value divided by the degrees of freedom (number of data points minus number of varied parameters) is close enough to one, where deviations come from the randomization and rounding of the data. On my machine ( a desktop with a AMD Ryzen 5 1600X ), the execution of scipy.optimize.leastsq on said dataset takes on average 1.2 ms.
Curve Fitting in Rust
Rust is a young programming language similar in it’s intention to C, but with a clever design that helps to avoid most of the pitfalls that can occur during runtime or when performing parallel code. Anybody not familiar with it is encouraged to take a look in The Rust Programming Language book, which is a great resource to learn about the why and how’s.
Using the ndarray crate as basis, I wrote a small crate that I named rusfun with the sole purpose to define one-dimensional functions, be able to run the Levenberg-Marquardt algorithm over it and to be able to pack the functions for wasm using wasm_bindgen. The crate is in an early stage, so don’t expect a lot of documentation & further functionality yet.
Given the data as a three column-file, as well as using the same starting parameters running, running the fit in Rust is also just a few lines, which I uploaded here. Timing the initialization & execution of the LM algorithm, a total of 0.4 ms execution time is measured on my machine. Without having really thought on how to optimize the code, the Rust implementation is already faster — which makes sense as all the overhead that comes from the Python↔Fortran interface in SciPy can be saved. In a later step it will be interesting to see how these compare for more computationally intensive models. But one step after the other. The fit results obtained from rusfun are analogue to the ones from SciPy with
A = 41.3 +/- 1.6
μ = 4.200 +/- 0.027
σ = 0.630 +/- 0.030
c = 10.77 +/- 0.51
χ²/d.o.f = 0.922
and only differ at values below the parameter uncertainty. The deviation could be from some minor implementation difference that the folks from MINPACK did in comparison to the resource I used.
Packing it to the Browser
I have prepared an app, I called FuncFit, and using netfliy made it available within a few minutes here: https://funcfit.netlify.com/ . To load data one basically needs a two or three column file, similar in format to the one I generated before with NumPy. Selects one of the pre-defined models (the list could easily be extended if that would be of desire), and press Fit.
The result should then look a bit like this. So we see, the same parameters are reproduced (yes, I didn’t bother rounding them sensibly… sorry), and the execution time is with 1.1 ms on a similar level as scipy.optimize.leastsq . So we lost our edge we had in the pure Rust version, but it seems as the Rust+wasm combination is on a same level in computational speed as the Python+Fortran combination.
This seems like a good observation to conclude for the day. I hope someone had fun reading this. Rust doesn’t seem to have a large user base in the scientific community yet. But maybe this changes some day. It’s definitely a nice feeling to have nearly no run-time errors in code execution once one is able to satisfy the compiler.