Slicing [:] in NumSharp

Meinrad Recheis
SciSharp STACK
Published in
6 min readMay 8, 2019

The .NET community is one step closer to having a powerful Open Source Machine Learning platform thanks to NumSharp’s awesome new array slicing feature.

Python is the language for Machine Learning in part because of its great libraries like NumPy and TensorFlow. However, C# developers are terribly in need of powerful Open Source libraries for ML and Data Science too. NumSharp, being a best-effort C# port of NumPy by the SciSharp STACK organization, has recently taken a huge step forward by fully implementing slicing which allows creation of arbitrary sub-sets of N-dimensional arrays as highly efficient views over the original data. This makes it a useful tool for Machine Learning in C# in conjunction with TensorFlow.NET.

What is the big deal?

If you haven’t worked with NumPy you might not know how awesome slicing is. Python arrays allow to return a slice of an array by indexing a range of elements like this: a[start:stop:step]. But only with NumPy’s sophisticated array implementation slicing becomes a really powerful data manipulation technique without which Machine Learning or Data Science can not be imagined any more.

Luckily for those who can not or do not want to switch to Python for Machine Learning — and I am guilty of that charge too — NumSharp brings that power into the .NET world. As one of the developers of NumSharp, I present to you a few important use cases for slicing with exemplary code snippets in C#. Note that in C# is not possible to index in the same way as in Python due to differences in the language syntax. We decided to keep Python syntax for slice definitions however, so we use strings to index slices in C#. Check out this example and see how close NumSharp gets to NumPy.

Slicing columns out of a matrix in Python / NumPy

When written in C# with NumSharp the code is almost the same. Note the slight difference where the slice is indexed using a string as parameter for the indexer.

Slicing coluns out of a matrix in C# / NumSharp

As you can see, the NumSharp team put a lot of effort in to keep the code as similar to Python as possible. This is very important because this way, existing Python code that relies on NumPy can now easily be ported to C#.

Use case: Working with multiple Views of the same Data

Being able to pass only local portions of the underlying data (i.e. small patches of a big image) in and out of functions without copying is essential for runtime-performance, especially with big data sets. A slice is indexed with local coordinates, so your algorithms don’t need to know about the global structure of your data, effectively simplifying your life and ensuring highest possible performance due to avoiding unnecessary copying.

Use case: Sparse Views and Recursive Slicing

By specifying a step in addition to start and stop of the slice range sparse views of the array can be created. This is something that not even C# 8.0 with its new array slicing syntax can do (to my knowledge). When working with interleaved data this feature becomes incredibly important. You can keep the complexity of your algorithms as low as possible by designing them to work on contiguous data and feeding them a sparse slice which simulates a contiguous data source.

Slices can be further sliced which is a very important feature if you work with high-dimensional data. This also helps reduce algorithmic complexity as you reduce the dimensions of your data by recursive slicing.

Use case: Efficiently handling High-Dimensional Data

If you need to treat a data array as a volume and work with parts of it without having to do mind-boggling coordinate transformation calculations, then .reshape() is your friend.
All arrays created by .reshape() or slicing operations are only views of the original data. When you iterate over, read or write elements of a view you access the original data array. NumSharp transparently does the appropriate index transformations for you, so you can index into the slice using relative coordinates.

Use case: Reversing the order of elements at no extra cost

Slicing with a negative step is effectively reversing the slice’s order. What’s nice about it is, that it requires no copying or enumeration of the data to complete this operation much like IEnumerable.Reverse(). The difference is, that the view (which is the result of the operation a["::-1"]) presents the data in reversed order and if you can index into that reversed sequence without ever having to enumerate it at all.

Use case: Reducing complexity by reducing dimensions

When working with high-dimensional data, algorithms on that data can get highly complicated too. When working on the .ToString() method of NumSharp’s NDArraywhich can print out arbitrarily high-dimensional volumes I noticed how simple and beautiful that algorithm gets by systematically and recursively slicing the ND-volume in (N-1)D-volumes and so forth.

This divide et impera approach is made possible by slicing with NumSharp’s indexing notation over the range notation which returns lower-dimensional sub-volumes.

Range Notation vs. Index Notation

The range notation [“start:stop:step”] allows you to access a sub range of the given volume with the same dimensionality. So even slicing out only one column of a 2D matrix still gives you a 2D matrix with just one column. Here is a little C# snippet that demonstrates this:

Slicing a column using range notation

The index notation gives you a (N-1)-dimensional slice at the specified position of the N-dimensional parent volume. So carving out a column from a 2D matrix with index notation gives you a 1D vector:

Slicing a column using index notation

If you haven’t spotted the difference at a glance, here the two slicing definitions from above side by side, ange [":,2:3"] vs index [":,2"], which makes a big difference in the result. A full reference of the new slicing notation is available at the NumSharp wiki.

Side Note: ArraySlice<T>

While implementing slicing of N-dimensional views, I came to the conclusion that it might be interesting for a whole lot of other domains in .NET, so I factored it out into its own standalone library called SliceAndDice. It features ArraySlice<T> which is a light weight wrapper around any indexed C# data structure (like T[] or IList<T>) and allows you to make use of the same reshaping, slicing and view mechanics without all the other heavy numerical computation stuff. It is a nice and clean implementation of slicing awesomeness in just a few hundred lines of code!

Summary

NumSharp has just recently been empowered with the same slicing and view mechanics that arguably make NumPy one of the most important libraries of Python’s Machine Learning ecosystem. SciSharp STACK, being an Open Source organization consisting of only a handful of skilled developers, tries very hard to bring that same power to the .NET world. This recent improvement of NumSharp is an important stepping stone towards this goal.

--

--

Meinrad Recheis
SciSharp STACK

is a seasoned software architect, entrepreneur and open source contributor, currently working on open source ML/AI projects.