Calibrating Somebody Else’s Camera

Published in

Machine Learning & AI in Automated Map Making

8 min readMar 24, 2021

Camera calibration is a process to estimate an accurate model for a camera, which is essential for geospatial computer vision tasks like photogrammetry or visual positioning service (VPS). Calibration is typically as easy as waving a chessboard calibration pattern in front of the camera, and passing the chessboard images to OpenCV’s calibration routines.

**Fig. 1**: Standard chessboard calibration pattern (Source: OpenCV)

But how can you calibrate a camera if you don’t have the camera, but only have images (none of them containing chessboards)?

While chessboards are especially convenient, camera calibration can use any set of 3D points and their corresponding pixel locations in imagery. We found LIDAR data collected in the same region as our imagery, and formed a calibration pattern of 3D points from this data. This process and its many coordinate transforms are detailed below.

**Fig. 2**: We have images, but not the camera. Unfortunately, not a chessboard is in sight

Camera Models and Calibration Basics

A pinhole camera model offers a simple, useful mapping from 3D points in the world to 2D pixels in the image by projection onto the focal plane. Using homogenous coordinates, an elegant matrix multiplication relates a 3D world coordinate Xʷ = [Xʷ, Yʷ, Zʷ,1]ᵀ to the camera’s pixel coordinate 𝐱ᶜ=[wxᶜ, wyᶜ, w]ᵀ

Eq. (1)

where 3×1 vector 𝐎ᶜʷ denotes the camera’s origin (focal point), in world coordinates, and 3×3 rotation matrix 𝑅ᶜʷ describes the camera’s orientation within the world coordinate system. The 3×3 matrix 𝐾ᶜ contains the intrinsic parameters of the camera. We don’t know the intrinsic parameters for our unknown camera, so the purpose of camera calibration is to estimate them.

**Fig. 3**: Pinhole camera model maps a point Xʷ to point xᶜ on the focal plane (Source: Hartley and Zisserman)

Using many annotated correspondences between 3D points, 𝐗ʷ, and pixels, 𝐱ᶜ, across several images, the OpenCV functioncalibrateCameraestimates unknown intrinsic parameters, 𝐾ᶜ, and extrinsic parameters 𝑅ᶜʷ, Oᶜʷ. In the common chessboard calibration, the known world points 𝐗ʷ are at the corners of each chessboard square, while the pixels 𝐱ᶜ at the chessboard squares in the image can be detected automatically in imagery, resulting in highly automated camera calibration.

LIDAR Calibration Pattern

To collect the point/pixel pairs needed for camera calibration, we extract a set of 3D world points {X₁ʷ, X₂ʷ, …} from LIDAR data, and annotate their pixel locations {x₁ᶜ, x₂ᶜ, …} in several images. We are fortunate enough to have co-collected imagery and LIDAR (Fig. 4 top) in the same area as the unknown camera images we seek to calibrate. We thus annotate the 3D points 𝐗ʷ indirectly, by first annotating pixels xᴸ in the LIDAR-aligned image (Fig. 4 bottom left), and then translating these pixels to corresponding points in the 3D LIDAR cloud.

**Fig. 4**: (Top left) Pixel image co-collected with (right) LIDAR points. (Bottom left) annotated image; (bottom right) annotations transferred to LIDAR point “lookup table”

Unlike our unknown camera, the LIDAR camera is precisely calibrated, with known intrinsics, 𝐾ᴸ, and extrinsics 𝑅ᴸʷ, 𝐎ᴸʷ. We can thus find the pixel 𝐱ᴸ corresponding to every LIDAR point 𝐗ʷ via the pinhole projection matrix multiplication from Eq. (1): 𝐱ᴸ=𝐾ᴸ[𝑅ᴸʷ|−𝑅ᴸʷ𝐎ᴸʷ]𝐗ʷ.

We compute the pixels for every LIDAR point to “render” a lookup table for source LIDAR points at every pixel. As shown in the upper right of Fig. 4, LIDAR data is sparse and does not densely cover the entire face of surfaces, leaving visible “holes” on surfaces. If an annotated point falls on one of these “holes”, there will be no lookup value for the LIDAR point 𝐗ʷ. For this reason, we render our LIDAR points with a wider “radius” to fill in the holes and better simulate occlusions, as shown in Fig. 4 bottom right.

**Fig. 5**: Pixel annotations in images from the unknown camera

Next we annotate the several pixels 𝐱ᴸ on the LIDAR image, then use the LIDAR lookup image to retrieve the point 𝐗ʷ associated with each of these pixels 𝐱ᴸ. The annotated pixels are chosen at visually distinct locations like corners which can be easily recognized and annotated in the images from the unknown camera. The pixel annotations 𝐱ᶜ on a few unknown camera images are shown in Fig. 5.

Planar Calibration Pattern

The collection of annotated LIDAR points {X₁ʷ, X₂ʷ, …} and their corresponding pixels {x₁ᶜ, x₂ᶜ, …} in the images can now be used for calibration. Using these points as inputs to calibrateCamera, the function returns... not the desired intrinsic camera parameters, but a disheartening error message:

Bad Argument: For non-planar calibration rigs the initial intrinsic matrix must be specified in function 'cvCalibrateCamera2Internal'

OpenCV is stating that calibrateCamera expects that the points {X₁ʷ, X₂ʷ, …} are co-planar unless the user has a good initial guess for the intrinsic matrix 𝐾ᶜ. We certainly don't have a good guess for 𝐾ᶜ of the unknown camera, but it seems that our annotated LIDAR points 𝐗ʷ on the planar face of an overpass are co-planar – so why doescalibrateCamerareject them? Firstly, the slight noise of the LIDAR data displaces points from the face of the overpass. Secondly, the overpass itself has rivets and other features that protrude from the planar surface, so our points do not meet OpenCV's stringent demands that points are exactly co-planar.

Least-Squares Planar fit

Instead, we’ll find an alternative set of exactly co-planar 3D points. This is done by fitting a plane that is a least-squares approximation of the annotated LIDAR points {X₁ʷ, X₂ʷ, …}. A plane is defined mathematically by a single point, Pʷ, on the plane (describing the plane’s position) and a vector, nʷ, normal to the plane (describing the plane’s orientation). A plane with a least-squares fit to N ≥ 3 LIDAR points can be found with a few simple calculations. The single point on the plane is solved as the centroid of the points:

Eq. (2)

To find the normal vector, first assemble a 3×N matrix, A, of the LIDAR points relative to the centroid:

Eq. (3)

then define a 3×3 matrix 𝐵=𝐴𝐴ᵀ. The the estimated normal vector, nʷ, is calculated as

Fig. 6 below shows the fitted plane, normal nʷ, point Pʷ, and the source LIDAR points {X₁ʷ, X₂ʷ, …}.

**Fig. 6**: Illustration of least-squares planar fit to LIDAR, and its relationship to the three discussed coordinate systems: world, LIDAR, and planar

Finding Co-Planar Points

As shown in Fig. 6, the LIDAR points 𝐗ʷ derived from the annotated pixels 𝐱ᴸ will not necessarily fall on the fitted plane Pʷ, nʷ. We now find points on the fitted plane which correspond to each annotated pixel 𝐱ᴸ. For simplicity, transform the plane Pʷ, nʷ from world coordinates to a more convenient expression Pᴸ, nᴸ in the LIDAR camera’s coordinates:

Eq. (5)

where the extrinsics 𝑅ᴸʷ, 𝐎ᴸʷ are known for the camera. The known intrinsic matrix Kᴸ, includes a known focal length fᴸ, the distance from 𝐎ᴸʷ to the focal plane. A ray extending from the camera center (the origin in LIDAR camera coordinates) through a pixel 𝐱ᴸ=[𝑥ᴸ, 𝑦ᴸ]ᵀ on the focal plane is expressed as 𝐗ᴸ=d𝐯ᴸ, where d is a scalar distance in the direction of unit vector vᴸ∝[𝑥ᴸ, 𝑦ᴸ, fᴸ]ᵀ. Fig. 6 plots this ray as a dashed line through points 𝐱ᴸ on the focal plane.

All points 𝐗ᴸ on this ray are projected to pixel 𝐱ᴸ by the pinhole projection from Eq. (1). We want to find a point on the fitted plane Pᴸ, nᴸ which is also on this ray; i.e. the intersection of the ray and the plane. This point is found to be

Eq. (6)

Computing this planar intersection for every annotated pixel {x₁ᴸ, x₂ᴸ, …} provides a set of points {X₁ᴸ, X₂ᴸ, …} which are exactly co-planar, as required by OpenCV. Fig. 6 displays these points Xᴸ as red dots at the termination of the rays on the least-squares plane.

Let’s input these refined 3D points {X₁ᴸ, X₂ᴸ, …} and annotated pixels {x₁ᶜ, x₂ᶜ, …} from our unknown camera intocalibrateCamera.Once again, it returns a disappointing error message:

Bad Argument: For non-planar calibration rigs the initial intrinsic matrix must be specified in function 'cvCalibrateCamera2Internal'

So what happened this time?

Fig. 7: OpenCV still won’t calibrate the camera

Planar Coordinate System

It turns out that not only does OpenCV demand co-planar world points, it specifically expects that they are expressed in a coordinate system where the plane is defined as Z=0. The LIDAR camera coordinate system used to express points 𝐗ᴸ does not satisfy this requirement. As a final step, we must translate and rotate the LIDAR coordinate system, similar to Eq. (5), so that the fitted plane aligns with the X-Y plane, which would guarantee that the points {X₁ᴸ, X₂ᴸ, …} on that plane have a Z-component of zero. In other words, we wish to find a 3×1 translation vector 𝐎ᴾᴸ and a 3×3 rotation matrix 𝑅ᴾᴸ so that

The first constraint is satisfied by defining 𝐎ᴾᴸ = Pᴸ so that the translation places the coordinate system’s origin on the plane, as shown by the planar coordinate axes in Fig. 6 above.

We’ll next individually solve for each row of rotation matrix 𝑅ᴾᴸ. Let unit vectors eᵤᵀ,eᵥᵀ,eᵩᵀ express the rows, so that 𝑅ᴾᴸ=[eᵤ, eᵥ, eᵩ]ᵀ. The rotation matrix is, by definition, orthogonal (its transpose is its inverse), so that the second constraint from Eq. (7) may be re-written as

This means that the final row of 𝑅ᴾᴸ is the desired direction of the new Z-axis, as expressed in the LIDAR coordinate system; and, as desired, this is the plane’s normal vector, nᴸ. Similarly, the other rows eᵤᵀ and eᵥᵀ will depict the desired direction of the new X-axis and Y-axis, expressed in LIDAR coordinates. We can define eᵤ as any direction orthogonal to eᵩ =nᴸ=[nᵤᴸ, nᵥᴸ, nᵩᴸ]ᵀ, so let eᵤ∝[-nᵥᴸ, nᵤᴸ, 0]ᵀ (then scale to have unit-length). Finally, eᵥ must be orthogonal to both eᵤ and eᵩ, so define it as the cross-product eᵥ=eᵤ×eᵩ. With all three rows of 𝑅ᴾᴸ defined, it can be written as

The set of co-planar points {X₁ᴸ, X₂ᴸ, …} can now be transformed as 𝐗ᴾ=𝑅ᴾᴸ(𝐗ᴸ−𝐎ᴾᴸ). This new form for points {X₁ᴾ, X₂ᴾ, …} is expressed in the new coordinate system where they all fall on the X-Y plane, . Now, using co-planar 3D points {X₁ᴾ, X₂ᴾ, …} and the annotated pixels {x₁ᶜ, x₂ᶜ, …} as inputs tocalibrateCamera, it finally returns an estimate for the unknown intrinsics 𝐾ᶜ.

The calibration can be verified by using the estimates of 𝐾ᶜ, 𝑅ᶜᴾ, and Oᶜᴾ fromcalibrateCamerato reproject the points {X₁ᴾ, X₂ᴾ, …} back to camera pixels using Eq. (1). As intended, these pixels fall very close to the annotated pixels {x₁ᶜ, x₂ᶜ, …} (see Fig. 8). We’ve done it! We calibrated somebody else’s camera using OpenCV.