Why is Orthogonal Matrix Inverse Equal to its Transpose?

Fedor Selenskiy
6 min readFeb 11, 2023

--

Where did that come from?

My objective is to understand why, for an orthogonal matrix A, its transpose is its inverse.

· Definitions
Orthogonal Vectors
Normal Vectors
Orthonormal Vectors
Orthogonal Matrix
· Expressing Orthonormal Conditions as a Sum of Elements of Matrix A
Rows are Orthogonal
Rows are Normal
Columns are Orthogonal
Columns are Normal
· Calculating A^T A
Calculating Diagonal Elements
Calculating Non-diagonal Elements
· Calculating AA^T
Calculating Diagonal Elements
Calculating Non-Diagonal Elements
· Combining the Results
· Conclusion

Definitions

Orthogonal Vectors

Vectors x and y are orthogonal if:

Normal Vectors

Vector x is normal if:

Orthonormal Vectors

Normal vectors that are orthogonal are called orthonormal.

Orthogonal Matrix

Matrix A:

We can say matrix A is orthogonal if all the rows are mutually orthonormal and all the columns are mutually orthonormal [1].

Expressing Orthonormal Conditions as a Sum of Elements of Matrix A

Rows are Orthogonal

Lets take the first and the third rows as an example, and express the condition for them to be orthogonal as a sum. For the first and third rows to be orthogonal, their vector product must be 0:

Performing the multiplication:

Writing it as a summation:

Of course this has to be true for any pair of rows, not just the first and third rows. We can generalize this summation by using i instead of 1 and j instead of 3:

i and j have to be different — no vector is orthogonal to itself!

Rows are Normal

If we take the third row as an example, it is normal if the sum of all elements squared is 1:

Writing it as a summation:

Of course this has to be true for any row, not just the third row. We can generalize this summation by using i instead of 3:

Columns are Orthogonal

Using the same approach as for rows, we can arrive at the condition for any pair of columns to be orthogonal and express it as a sum:

Columns are Normal

Using the same approach as for rows, we can arrive at the condition for a column to be normal and express it as a sum:

Calculating A^T A

A^T looks like this:

Then A^T multiplied by A looks like this:

Calculating Diagonal Elements

Lets say we wanted to calculate the element in the third row and the third column of the resulting matrix.

To get that element, we have to multiply the third row of A^T by the third column of A, which are:

Of course, this can be generalized. If we wanted to calculate the element in the ith row and the ith column of the resulting matrix, we could replace 3 with i:

We already know what that is equal to! It is equal to 1, this is exactly the same summation that we arrived at when expressing the condition for every column to be normal.

This shows that all the diagonal elements of product of A^T A are 1.

Calculating Non-diagonal Elements

Lets say we wanted to calculate the element in the second row and the third column of the resulting matrix.

To get that element, we have to multiply the second row of A^T by the third column of A, which are:

Of course, this can be generalised. If we wanted to calculate the element in the ith row and the jth column, we could replace 2 with i and 3 with j:

We’ve already seen this, its equal to 0. This is exactly the same summation that we arrived at when expressing the condition for every pair of columns to be orthogonal.

This shows that all the non-diagonal elements of product of A^T A are 0.

This means that A^T A is:

Calculating AA^T

A multiplied by A^T looks like this:

Calculating Diagonal Elements

Using the same approach as before, if we wanted to calculate the element in the ith row and ith column of the resulting matrix, we would have to multiple the ith row of A by the ith column of A^T, which is:

As this is the same summation that we arrived at when expressing the condition for every row to be normal, we know it must therefore equal 1.

This shows that all diagonal elements of AA^T must be 1.

Calculating Non-Diagonal Elements

Using the same approach as before, if we wanted to calculate the element in the ith row and jth column of the resulting matrix, we would have to multiple the ith row of A by the jth column of A^T, which is:

As this is the same summation that we arrived at when expressing the condition for every pair of rows to be orthogonal, we know it must therefore equal 0.

This shows that all non-diagonal elements of AA^T must be 0.

This means that AA^T is:

Combining the Results

Since we showed that A^T A = I and AA^T = I, we can write:

Since left multiplying orthogonal matrix A by A^T, and right multiplying orthogonal matrix A by A^T, gives us the identity matrix I, we can assert that A^T is the inverse of A.

Conclusion

I wanted to dive into the maths behind an orthogonal matrices.

My goal was to assume as little as possible, and understand the reasoning. In the “Deep Learning” book (Chapter 2.6 “Special Kinds of Matrices and Vectors”), there was a reference to this property (equation 2.38), but there wasn’t any explanation, so I decided to derive it.

Please feel free to comment or shoot me a message directly if you find any mistakes, I’ll be happy to correct them and mention you in the edit.

Sources:

[1] Goodfellow I., Bengio, Y. and Courville, A. 2016. Deep learning. Cambridge, Mass: The MIT Press. [online] Available at: http://www.deeplearningbook.org

--

--

Fedor Selenskiy

MEng Computer Science, University of Southampton. Programming and maths enthusiast. aquilex.org