Half Derivatives: An Operator Theory Prospective

14 min readMar 29, 2020

This article goes along with a general interest talk I prepared on “half derivatives,” defined below. Though there are lots of ways to talk about this topic, I’ve chosen to structure this to touch on a few math topics, making potentially surprising connections. The talk assumes some familiarity with intro calculus and linear algebra.

Half Derivative Definition

When we’re taught calculus, we’re taught second derivatives are a derivative of a derivative, and the third derivative is the derivative of that and so on.

A natural question is: Is there a “half derivative;” an operator which if you apply it twice it amounts to applying a derivative just once? Concretely, is there an operator H for which H(Hf) = Df. The answer is yes, at least for smooth functions f.

Furthermore, you can define a 1/7 derivative (which if you apply seven times, you get a first derivative) or a 2/3 derivative (which if you apply three times, you get a second derivative). In fact for any rational number, m/n, you can define the m/n derivative with the property that:

I’ll call this property the “implicit” property. Operators that satisfy this implicit property are called “fractional derivatives.” Throughout the paper, we will at times focus specifically on the half-derivative, but at other times will point out that the calculation can be done with more general fractions.

Actually this can be defined for irrational numbers as well, via limits of rational numbers. Fractional derivatives are weird, but they do have a nice property that they interpolate the whole number derivatives that we already know and love. In this picture the green line goes continuously from the 0th derivative of 1/2*x² to its 2nd derivative.

The paper will walk through the calculation of the fractional derivative of some common functions:

The exponential
Sine and cosine
Polynomials

Calculating the Half Derivative: Special Functions

In general half derivatives can be difficult to calculate, but let’s look at one case where it’s not difficult: The exponential function is an “eigenfunction” of the derivative, meaning that its derivative is the same function up to a constant multiplier.

You may then guess that we can replace the whole number n with any real number α.

If you check, you’ll see that an operator defined in this way satisfies the implicit property for the fractional derivative. For example:

But what about a function like sine? An expert trigonometer may remember a helpful trig identity here:

Knowing then what we know about the fractional derivative of the exponential, we can conclude that:

But check this out: Using the identity that i=e^(iπ/2) and -i=e^(-iπ/2), we can write this as:

This is a really cool representation, because we see that at α=1, this is cos(x), and as α increases from 0 to 1, the graph slowly changes phase.

We got this fractional derivative by writing sine as a sum of exponentials. Cosine can be handled similarly. In fact this representation shows that sine and cosine are just pit stops on a pair of continuous rotations through the complex unit circle. For other functions, we aren’t as lucky. Most functions can’t be written as a sum of exponentials, but this example motivates our approach.

The last function that we promised to calculate a half-derivative of is the power function x^m. But to do this will take more work. In particular, we will need to take the Fourier transform. If you want to take this approach as a lucky guess, you can jump to the last section. Otherwise we will take a long detour in an attempt to explain why the Fourier transform may be a natural thing to try.

Matrix Operators

Before we go forward, let’s first talk about matrices. The reason is that we’ve been talking about derivatives as linear operators, which makes them generalizations of matrices. (We’ve seen this analogy already when we mentioned eigenfunctions.) Basic operator theory is motivated from things known about matrices. For example, the half derivative is the square root of the derivative in the matrix sense: If the square root of a matrix M is A then A(Av) = Mv for any vector v. Similarly if we call the square root of the derivative “H”, then H(Hf) = Df; so H is the half-derivative.

Before we go on, I want to introduce bra-ket notation for matrices. We write a vertical bar followed by angle bracket for a vector, and an angle bracket followed by a vertical bar for it’s conjugate transpose.

This makes writing inner and outer products easy, for example.

So how exactly do we take the square root of a matrix? One way is with the Taylor series.

Unfortunately the Taylor series for the square root doesn’t exist, because √x isn’t differentiable at zero. We use a common trick, where we instead think about the Taylor series of √(1+x). We will worry about the “1+” later, but for now, we recall/lookup the Taylor series.

This Taylor series is defined for real numbers, x. But it ought to also work for matrices; that is, for a matrix, M, it ought to hold that:

Now √(I+M) should be a matrix which, if multiplied by itself, returns I+M. It’s not obvious that the Taylor series, which worked for number variables, will work the same way for matrices, but it’s a good guess.

To verify that it’s a correct guess, we can take the Taylor series and multiple it by itself. The resulting Taylor series should equal I+M, which would tell us that we indeed have the square root. To multiple two infinite series, we have to do a lot of distributing. By writing in a grid and multiplying all terms by all terms, we have that all terms containing a certain power of M lie in the same diagonal. For example, in the grid below all the powers of M³ fall in the purple diagonal. (We’ve only filled out the grid up to powers of of 4.)

We see that the first two diagonals add to I (the yellow diagonal) and M (the green diagonal). The remaining diagonals cancel out and are all zero! This is a fun calculation to see, but actually not surprising because this is what would happen if we did this multiplication on real number Taylor series.

A similar Taylor series could be done for matrices raised to other powers.

Singular Value Decomposition

If you knew already that Taylor series apply to matrices, then this is a natural first approach to taking a square root. But Taylor series are difficult to caclulate; in practice you could only ever get an approximation, because you can’t calculate all the terms. There is however a more clever way to take the square root, which we will explore in this section.

To get there, we take the singular value decomposition (SVD). Though the SVD has a more general form than we cover, we will talk about a simplified version that only applies to Hermitian matrices; these are matrices that are equal to their own conjugate transpose.

The SVD says: If a Hermitian matrix, M, has unit eigenvectors φ_i with corresponding eigenvalues λ_i, then it can be written as:

Another similar formula (not part of the SVD) that will be useful later is:

This is getting very abstract at this point. So let’s focus on a specific matrix.

This matrix has eigenvectors φ_1 = (1/√2, -1/√2) and φ_2 = (1/√2, 1/√2), with eigenvalues λ_1 = 1 and λ_2 = 9. You can check this. And you can verify that therefore,

where the outer products are

But watch what happens when we square the equation.

If you look the middle of each term ends up being an inner product! But φ_1 and φ_2 are orthonormal: The inner product of φ_1 and φ_2 is 0, but the inner product of φ_i and φ_i is 1. So the equation simplifies dramatically to:

The square just got applied to the eigenvalues! You can again verify this is true.

But you can see if you multiply by the matrix again, the same thing will happen again, so you generally have:

For this remarkable property, all we needed was for the vectors making up the outer products to be orthonormal. And eigenvectors are always orthonormal. So we have a rule for any matrix with its eigenvalues and eigenvectors.

You may see where this is going: If I can do this for any whole number n, then I ought to be able to do it for any real number α. And that’s absolutely right. But let’s check why that’s true. Watch these equations below: We just rewrite √(I+M) with the Taylor series, then we can replace each M^n with the same equation on λ^n. After we factor out the outer products, this becomes the same Taylor series on λ, so we can replace back with √(I+λ).

I’ll now tell you that you can multiply I in that equation by some small number, ε, and get the same result.

If your careful with the analysis, you can let that small number approach zero, and you get the desired result that:

Checking on our matrix from above, we get:

And calculating that was much easier than calculating the Taylor series! Again, you can do the same trick with any power of a matrix, and again get the power to apply to the eigenvalue coefficients.

In fact, any function with a power series, can be written this way.

Fourier Transform

Returning now to the derivative. Derivatives are linear operators on the vector space of smooth functions. But unlike 2-D vectors like we used in our example, function spaces are infinite dimensional, meaning that they don’t have any finite basis.

The SVD on matrices is a summation over a finite number of eigenvectors. In this section we make an analogy with an infinite number of eigenfunctions.

For matrices, we worked with eigenvectors as a basis. For operators, let’s choose a basis composed of eigenfunctions. Let’s try taking this family of eigenfunctions: {exp(2πiξx) : ξ a real number}.

The first question you should ask is: Can you write functions as a combination of this family of functions. For matrices, we wrote functions as summations of basis vectors. But now, we need to write an integral, because there’s a continuous spectrum of basis functions. We write the coefficients as a function, f-hat, to emphasize that there’s a coefficient for every real number. In the right-most expression, we write exp(2πiξx) as a vector to remind that it lives in a vector space.

This may look familiar; f-hat is the Fourier transform! It turns out f can be written this way if it is integrable; that is, if it has a finite integral over all real numbers.

The second question you should ask is: Are these eigenfunctions orthonormal? Recall that the standard inner product on functions is defined as:

where the * means the complex conjugate.

The answer here is: Yes, but… Rather than get into the gory details of that “but,” let’s instead state the Fourier inversion theorem, which says that:

Another way to say this is:

The sign change in the exponent is because the left side version of the bra-ket notation does the complex conjugate. What this is saying though is that applying

to the left side of our equation above “cancels” all the terms except for the ξ-th term. This is exactly what we desire from orthonormality. So this is enough

Above we called f-hat the Fourier transform. Formally the Fourier transform, marked with a calligraphic F, is the operator that takes f to f-hat.

Let’s see what happens when we apply the Fourier transform on the derivative.

We’re going to do integration by parts with the function f and exp(-2πiξx).

The first term must approach zero for both + and -infinity. This isn’t obvious, but it turns out that if this isn’t true, then f isn’t going to zero fast enough to be integrable. So this becomes simply:

Before we extended this, let’s notice what is saying. Because Df is just a function, we can write it as:

But what we just showed is that the coefficients are just simply 2πiξ times the coefficients we’d have if we just put in f.

But remember from above that f-hat is just a particular inner product on f.

To write D as an operator, we can just leave the f off.

Written like this, we now see how this generalizes the SVD. Not surprisingly the coefficient is the corresponding eigenvalue to the eigenvector.

Continuing, simple repeated integration by parts tells us that:

We can do the usual trick to extend to other powers. For square root, for instance, we have:

Using the appropriate Taylor series for the power, then, you can conclude:

Fractional Derivatives: Polynomials

We’re now finally ready to calculate the fractional derivative of the power function, x^m. We want to do this calculation for a general m, so we will only consider positive x, because x^m is not defined when x is negative and m is irrational.

Let’s start with the Fourier transform of f(x)=x^m, for x > 0.

The lower bound is zero, because we only consider positive values of x, and define f to be zero elsewhere. As an integration rule, whenever we see x appear as both a base and an exponent in a integrand, we should try a gamma function. And actually only a simple substitution is needed to get:

The Fourier transform of a power function on positive x is a function with ξ only in the denominator with an exponent of at least 1. Because we can multiply the input and the output by the same constant (remember it’s just an integral), any function that transforms to have a ξ only in the denominator with an exponent of at least 1 must be a power function. We’ll take for granted that two smooth functions with the same transform are the same.

Using the rule we made in the last section, we have for any α < m:

Because α < m, this again has a ξ only in the denominator with an exponent of at least 1. So it’s the Fourier transfer of a different power function. Specifically,

This tells us that:

Look at how cool this formula is, though! If you’ve encountered the gamma function before, then you know that it interpolates the factorial. When m and α are both integers then this boils down to:

This is formula you already know for repeated derivatives of power functions. If you had guessed that the way to generalize to fractional derivatives is to replace the factorials with the gamma function, your guess would have been exactly correct!

To review: We found with our earlier examples (exponentiation and sine) that a fractional derivative is easy to compute when you’re able to write a function as a linear combination of eigenfunctions. A detour through matrices showed why eigenvectors are so useful for this. Though we lucked out with those two functions, we wanted a more general way to write a function as a combination of eigenfunctions. To do this we looked at the Fourier transform. And we finally used this to calculate the fractional derivative of the power function.

These techniques help explore fractional derivatives, which are interesting in their own right, but the techniques work for smooth functions (more general than powers) and for operators (more general than derivatives).

To read more about fractional derivatives (more concrete properties and an application), I recommend this blog. And you can also check out the wiki.

Thanks to Melanie Beck and Sisi Song for feedback.