Camera Image Formation

From naive to pin-hole to lens-based camera models

Published in

Geek Culture

4 min readFeb 13, 2022

So, you know what a digital camera is and what are various components of a digital camera are.
If you don’t you should go through this awesome blog.

Now you want to know how actually an image is captured by the camera.

Understanding how exactly an image is captured by a digital camera is not straightforward as you have to build up from basics to the current digital cameras that we have to get a sound understanding of why various components of the camera are necessary.

Image Formation

Suppose you have a real-world 3D object (like a super realistic person in Figure-1) that you want to capture using a camera.

What you’d assume to do is to take a light-sensitive film (camera sensor) that would capture the object and you are done, you have an image.

NO!

The issue of exposing a light-sensitive film directly to an object is, multiple rays of light would be reflected in various directions from the same point of the object causing multiple projections on the light-sensitive film.

In such a case what our film/chip would capture is an average of intensity values of the object under consideration. We would end up with just a blob without any structure.

Figure 1: How light rays are projected from an object to a light-sensitive film (Figure by Author)

Pinhole Camera Model: Intuition

We need to restrict the number of reflections from a single point being captured by our light-sensitive film in order for us to get some meaningful structure.

The simplest way to restrict the number of reflections being captured is, well to restrict the number of reflections being passed onto the light-sensitive film for capturing.

To achieve this, we add a barrier between the object and the light-sensitive film as depicted in figure-2. The size of the opening in the barrier that allows light rays to pass through is very small. This opening is known as the aperture of a camera.

Figure-2: Adding barrier before the light-sensitive film (PinHole camera model, simplified)

Adding the barrier reduces the blurring which helps in capturing the appropriate structure of the object.

One more important thing to note is that the image captured by the film is flipped upside-down.

Pinhole Camera Model

Okay, so now we know we need a barrier with a small opening between the object being captured and the light-sensitive film to actually get an image.

This is exactly what a pin-hole camera model is. A pinhole camera is a camera without a lens but with a tiny aperture (the opening in the barrier) that captures the object under consideration on the image plane. The captured image is flipped upside down (depicted in figure-3).

In order to describe the pin-hole camera model mathematically, we will make use of the figure-4.

Figure-4: Pinhole camera model mathematics (Source)

In figure 4, the vertical line is the barrier, everything on left is the 3D real-world and everything on right is the 2D image world.

As the image plane is sitting at the same distance z from the barrier, the depth dimension is lost during the projection of 3D objects to the image.

Camera with thin lens

Issues with pinhole camera model

Now we have a camera model which can capture images, so why do modern digital cameras use lenses?

The issue with the pinhole camera model is that the pinhole used in the barrier should be very small in order to get crisp images.

Due to the fact that pinhole is very small one needs to wait for a long time to let enough light through the pinhole to capture the image.
This is impractical as the objects being captured by a camera are almost always moving.

To remedy the issue we can try and increase the size of the pinhole, but that would result in multiple light rays of the same point of the object being passed through the pinhole and mapped to various places on the light-sensitive film.
This would result in a blurry image.

Solution: Use Lens

Instead of a very small opening (pinhole) in the barrier, make a large opening in and use lenses.

A lens takes multiple rays of light from the same point of the object and maps them to the same point in the light-sensitive film.

When using a lens we can have a bigger opening in the barrier (reducing the exposure time) and we can still get crisp images.

One thing that should be noted is that camera with thin lens is only an approximation of the pinhole camera model and it has its own issues.

Summary and Conclusion

Now we know the theory behind how an image is captured by a camera, what is the pinhole camera model and what is the need for a lens.

Going into the theory of lenses and what are the different types of lenses used by various cameras is something left for the reader to explore.

It should be noted that most of the computer vision tasks (like 3D scene reconstruction) use a pinhole camera model even though the camera is having a lens.
This pinhole camera assumpting works fine for most computer vision tasks.