Rust and its stance in data science

Integration!

Rust is an amazing programming language. Its focus on memory safety, efficiency and performance makes it a great candidate for constructing frameworks and tools for machine learning and data analysis, which can make the best of the available resources in a computer. With that said, let’s stop that thought for a moment and keep in mind that many mature technologies for data science exist today. We have R, with a reasonably wide environment designed for statistics. We have MATLAB (and its alternate free brother Octave), which like it or not, is still extensively used in research and widely taught in science degrees, both inside and outside computer science. We even have Julia, which I like to call MATLAB’s cool younger cousin, and it boasts some interesting perks of its own. And of course, Python currently holds a pretty large piece of the DS cake. Not because it was specifically designed for these purposes, but because the language is simple enough to attract the less code-savvy, and because every library you’d ever need is in there. And many people would rather keep defying gravity than choosing a stack without the necessary tools for the job.

Maturity!

I have come to realise throughout my years as a PhD student that the wrong shiny tool for the job can make you waste much more time than the right yet not shiny one. So it happens that, although the number of crates and number of crate creators are steadfastly increasing, it’s not hard to spot some useful functionalities often employed in data science which are not available. One of them, although not necessarily one that would strike you as a major flaw, is reading and writing to files in the HDF5 format. The crates that we have today are either incomplete or in a state of very difficult use. hdf5-rs seems to be the one closest to becoming usable however, and one of my wishes for 2018 is that a new feature-complete release is made for this particular crate.

Bridging!

I will end with a semi-open question: what makes an ideal tool or library for data scientists? In my opinion, we can outline a few points.

  • They are easy to use without crippling performance. One way to achieve this is to make interfaces that users are familiar with from other technologies, while retaining what makes the code idiomatic. Can we make plt.plot(x,y) in Rust just as easy? Sure thing!
  • They are fast, efficient, and can be hardware-accelerated. Many algorithms employed in machine learning are much faster when run on one or more GPUs. There are no predictions of that changing in the near future. While we can claim that Rust code is pretty well optimised, the difference is less relevant when relying on GPU-accelerated computation APIs such as CUDA and OpenCL. It’s good that we currently have open solutions to his, to some extent, but library developers should not forget to use them. Even if just doing things on the CPU, consider whether SIMD can be used. If you do not wish to deal with low-level intrinsics, how about using a middle-level crate such as faster , or even BLAS and LAPACK bindings?
  • If it’s a library or a framework, the programming language used should be good. For a language that is only close to being 3 years since 1.0, it’s going pretty well. This bullet point can refer to what so many other Rust2018 blog posts have stated about the future of Rust.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store