[Computer Vision] Coordinate Transformations

PHIL
4 min readNov 25, 2022

--

Use a transformation matrix to map points in one space to points in another: (x’, y’, z’) = f(x, y, z).

Two-dimensional transformations

Assume a vector p, and a transformation matrix M, we can get p’ = Mp where p’ from p transformed by M.

Common used transformation matrix

Homogeneous Coordinates

The “Translate” uses add, but sometimes we would like to use multiplication to make computation chained together. How to do it? Append a third component with 1 (augmented form / vector). After the 2x1 vector becomes 3x1, we can use a 3x3 matrix (identity matrix and the value to translate) to transform the vecotr with matrix multiplication and omit addition. This process is called homogeneous coordinates.

Using homogeneous representation, combinations of operations can be done in series, by multiplying the transformation matrices in some proper order. The common matrixes can be represented in homogeneous corrdinates.

We can then define “turn homogeneous (2 -> 3)” and “restore homogeneous (3 -> 2)” to conveniently transform vectors as the form we’d like it to be.

Affine Transformation

The general combination of translation, scaling and rotation is called an affine transformation. Mathamatically, given a vector x, a rotation matrix R, and a translation vector T, an affine transformation transforms x into x’.

For convenience we often represent affine transformation in homogeneous coordinates, or say merging R with T and setting the last component of the vector to 1.

Some general descriptions for affine transformation

Basically for a 3x3 matrix, we can say it’ affine transformation if (0, 0, 1) is the last row.

Inverting an affine transformation matrix

Sometimes we’d like to invert an affine transformation to map points back. There are two approaches.

1 The brute force - to invert like any other matrix.

2 Consider affine property - address rotation and translation respectively.

First, while affine transform x -> x’, we derive the relationship x’ -> x

Here we can view R-1 as the new R and -(R-1T) as the new T. The key is R-1 is simply trnspose of R.

Then, like what we did to make homogeneous coordinates, merging R-1 with -(R-1T) and setting the last component of the vector to 1, we get inverse of affine transformation.

The generalized form to inverse is below.

2D homography

The most general linear mapping between images is called a 2D projective mapping, or a 2D homography.

--

--