3D Imaging with Lensless Camera

Reconstructing 3D scenes from lensless data and simulating the collection of such lensless data from virtual scenes — Part 1/3

Julien Sahli
8 min readMar 14, 2023

This work is in line with the LenslessPiCam project conducted at the LCAV lab from EPFL. Before continuing, if you wish to delve in the subject more in detail, a general introduction to this project as well as links to specific parts of its implementation can be found in this post, and the corresponding code is available at this link. The current work aims at extending it to 3D as well as building a lensless data simulator. (The corresponding code will be available soon).

This work is divided in three parts:

Traditional cameras focus light through lenses in order to render sharp images. On the other hand, lensless cameras use diffusers which scatter the light in various patterns which need to be processed afterwards in order to reconstruct what the captured scene should be.

This form of photography is especially useful for applications where it may be difficult or impossible to install an optical lens due to space constraints or other considerations. However, this is still a developing topic, and many research possibilities are still to be explored.

First, we are going to look into one of the possible use case of lensless imaging which is for reconstructing 3D scenes, where traditional cameras usually only capture 2D images. As a first step, we will introduce how to produce such reconstructions from lensless data.

Consequently, we will take the opposite approach and see how we can simulate the behaviour of lensless cameras in order to produce original lensless data from 3D virtual scenes.

Quick refresh about 2D lensless reconstruction

We will now briefly review the functioning of the 2D reconstruction algorithms already implemented in the LenslessPiCam project.

As seen in this post, the generation of lensless data from 2D scenes can be modelled as a linear shift-invariant system (LSI). Its impulse response, as generally in the domain of the optics, is called the Point Spread Function, which we will refer as the PSF in the rest of this post.

LSI is a common assumption in the 2D case, which allows us to model the lensless data as the convolution of the light coming from scene with a single PSF, whose shape does not change regardless of the position from where the light is coming, as illustrated in the following figure:

Source: https://waller-lab.github.io/DiffuserCam/tutorial/algorithm_guide.pdf

As such a model is usually not invertible, we must find another way to reconstruct the original scene. One way of doing so is through an iterative algorithm which typically as the following steps:

  • Enter the current estimate of the into our model for the camera;
  • Compare the result with our measurement;
  • Compare the result with our measurement;

In practice, we define the problem as such:

  • The forward model that represents our system is the convolution between the light emitted by the original scene and the PSF of the diffuser, followed by a cropping:
  • The estimate is computed by minimising the following error metric, while taking in account certain priors like non-negativity of the original scene. This can be done by applying Gradient Descent on the error metric, or any equivalent variant of this algorithm:

By doing so, we are able to produce accurate reconstructions of 2D scenes such as the following:

Source: https://medium.com/@bezzam/image-similarity-metrics-applied-to-diffusercam-21998967af8d

Extension of 2D reconstruction to 3D scenes

Let’s now turn to the section on extending the above algorithm to 3D scenes. One could think that tackling the reconstruction of 3D scenes using 3D PSFs could simply consist of using the same method, by computing the convolution of the 3D scene with the system’s PSF in order to generate 3D data. Unfortunately, it is not the case.

To understand why, we have to think about how the collected image forms on the sensor. In the 2D case, we can model the scene as an image that is parallel to the camera sensor. As we have seen, any point on this image that is a light source will project the PSF on the sensor at a position which will depend on the position of the point in the scene.

Increasing the horizontal or vertical distance between light source and the centre of the scene by a certain factor will therefore increase the horizontal or vertical distance between the centre of the projection on the sensor and the centre of the sensor. We can therefore retrieve the scene by operating a deconvolution on the gathered data, by knowing the PSF of the system.

When we translate this system to 3D, we add an additional dimension to the scene, which means that, of course, the light sources which compose the scene acquire a new coordinate. However, the sensor of the camera does not changes, and still collects data that is in 2D, by stacking together the light coming from multiple depths.

Retrieving the 3D scene by performing a deconvolution on the sensor data as in the 2D case would require a device that could collect data in three dimensions, which our camera does not!

Wrong camera model for 3D reconstruction

Indeed, if we try to naïvely reconstruct a 3D scene in this way, it will produce an output whose values are all equal along the depths axis, as the same data that we collected on the 2D sensor would be broadcasted to match the dimensions of the scene. Therefore, we will need to find another method to reconstruct 3D scenes. Moving a point along the depth axis will not simply translate its representation along the depth axis as our sensor is only 2D! Similar to a magnifying lens, we will have a rescaling of an object as its distance to the camera changes. Moreover, a point source moving along the depth axis will not simply be translations but rather something akin to a rescaling of the caustic pattern.

Even if we consider the diffuser to be a simple 2D mask whose PSF would be its projection on the plane, the dimensions of this projection would be dependent of the distance of the light source casting the projection on the plane : the closer to the diffuser, the bigger the projection. In practice, when using actual diffusers, the shape of the PSF can change even more due to how the caustic patterns in it forms on the sensor — allowing for even better reconstructions — but the same observation still holds.

Source: https://opg.optica.org/optica/fulltext.cfm?uri=optica-5-1-1&id=380297

We can use this property to our advantage by splitting the 3D scene into multiple planes along the depth axis. Reconstructing the content of these layers can be approximated to reconstruct 2D scenes, with each one having their own PSF. The 3D PSF that we will use is therefore not the PSF of a system which would output 3D data from the 3D scene, as we do not have such a device. Instead, it is the stack of several 2D PSFs of systems which output 2D data from the specific specific 2D layers of the scene, situated at different depths:

Right camera model for 3D reconstruction

The equation to solve the system therefore becomes the following :

In terms of code, most of the changes to the 2D case consist of adding an additional dimension to the input PSF as well as to the scene estimate, but not to the lensless data.

In order to verify the correctness of the reconstructions, a scene from the Diffuser3D dataset was reconstructed once with their algorithm and software (ADMM implemented in MATLAB). Their reconstruction was compared with the LenslessPiCam’s gradient descent extended to 3D. The result are the following:

Several observations can be made. In the first place, we can see that the general shape of the two reconstructions is quite similar, especially when summing the data along the Z-axis (which is the main image in the upper left corner of both figures). However, we can notice the presence of small visual artefacts on either sides of the DiffuserCam reconstruction, which also appear a little bit in the LenslessPiCam reconstruction, but are way less noticeable. This is likely due to the functioning of the ADMM gradient descent that, relatively to the classic gradient descent, trades a little bit of precision against a faster convergence rate. Indeed, the ADMM reconstruction only took 40 iterations to already give a pretty good estimation of the original scene, which seems to be a resolution test chart, whereas the gradient descent was given more iterations to converge. Moreover, the Diffuser3D reconstruction present a scene whose elements are concentrated on a few of the closest depth levels available, where the LenslessPiCam reconstruction is distributed more broadly along the depth levels available. That is likely due to the use of regularization terms that would penalise the presence of several light sources that would be aligned on the depth axis and therefore hiding each other in real life, which have not been implemented in LenslessPiCam yet. Nevertheless, the similarities between the two reconstructions are quite promising! Once we’ll have implemented the lensless data simulators, we will try to reconstruct other scenes with LenslessPiCam and directly compare how they relate to the original scenes.

--

--