Universal Functions[uFuncs]: A weapon of “Numpy”

Saijal Shakya
The Startup
Published in
5 min readMay 23, 2020

NumPy or Numerical Python provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.

NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

Is NumPy important in Python data science world?

Numpy, namely provides an easy and flexible interface to optimize computation with arrays of data. Computation on NumPy arrays can be very fast, or it can be very slow. The key to make it fast is to use vectorized operations, which is generally implemented through NumPy’s universal functions(ufuncs).

So, What is uFuncs?

For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a vectorized operation.We can accomplish this by simply performing an operation on the array, which will then be applied to each element. This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.

Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays. Ufuncs are extremely flexible — before we saw an operation between a scalar and an array, but we can also operate between two arrays:

And ufunc operations are not limited to one-dimensional arrays — they can act on multidimensional arrays as well:

Computations using vectorization through ufuncs are nearly always more efficient than their counterpart implemented through Python loops, especially as the arrays grow in size. Any time you see such a loop in a Python script, you should consider whether it can be replaced with a vectorized expression.

Let me prove with an example. Imagine we have an array of values and we’d like to compute the reciprocal of each. A straightforward approach might look like this:

This approach took 44.6 ms per loop.

Looking at the execution time for our big array, we see that it completes orders of magnitude faster than the Python loop:

Now, it just took 68.9 µs per loop.

Specialized features of ufuncs

Specifying Output

For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored. Rather than creating a temporary array, we can use this to write computation results directly to the memory location where we would like them to be. For all ufuncs, we can do this using the out argument of the function:

This can even be used with array views. For example, we can write the results of a computation to every other element of a specified array:

If we had instead written y[::2] = 2 ** x, this would have resulted in the creation of a temporary array to hold the results of 2 ** x, followed by a second operation copying those values into the y array. This doesn’t make much of a difference for such a small computation, but for very large arrays the memory savings from careful use of the out argument can be significant.

Aggregates

For binary ufuncs, there are some interesting aggregates that can be computed directly from the object. For example, if we’d like to reduce an array with a particular operation, we can use the reduce method of any ufunc. A reduce repeatedly applies a given operation to the elements of an array until only a single result remains. For example, calling reduce on the add ufunc returns the sum of all elements in the array:

Similarly, calling reduce on the multiply ufunc results in the product of all array elements:

If we would like to store all the intermediate results of the computation, we can instead use accumulate:

Outer Products

Finally, any ufunc can compute the output of all pairs of two different inputs using the outer method. This allows us, in one line, to do things like create a multiplication table:

Another extremely useful feature of ufuncs is the ability to operate between arrays of different sizes and shapes, a set of operations known as broadcasting. We will catch up with “Broadcasting” next week.

--

--