Where and how many images required for good photogrammetry 3D scan

I wrote this answer for one person in RealityCapture forum, but I think this tutorial can be useful for many people.

So, how we should acquire images. Basic rules that work for all modern photogrammetry tools.

First of all, we must remember, that photogrammetry required not silhouette of object but surfaces itself with details (textures).

Now let’s imagine simple chair:
 It have surfaces that points top, sides and bottom.

So we Must Have at least 1 image directed toward to the every surface.

And not less than 2 images in 10–15 degree to first camera.

Central camera will give you perfect texture. Other two with will give clean Depth maps for this Central camera (and later clean Dense Clound) that required for calculation 3D topology.

And this must be for Every surface you want scan! Every surface in ideal condition must have 3 shots.

But if we have surfaces that attached in high degree (90 degree like in example) we need additional images shot for “stitching” Dense Clounds in angles between main camera triplets.
 Like this.

“Final” scheme will looks like this:

So we have 15 camera only for 3 surfaces!

Ok, in real world with good camera like Nikon D810 and good lens we can “cheat” and use only 5 camera.
 But for this example with 3 surfaces at 90 degree all, even from D810 result will be not perfect.

So i can’t recommend shoot less than 11 images

or this will be not enough data for clean depth maps->dense clouds-> mesh, textures, and as result final topology will have less details or will have problem in topology (especially if object have weak surfaces).

And now if we see any nice object that we want to scan, we can plan where and how many images we should have for clean topology and textures.

Also we should remember about real camera and lens. Them can have DOF, aberrations, non linear distortions (last two problems common for area near corners and edges of photo). So real, good data from image is about 75–80% (sometime less) in center of image. And all this can required additional images for good 3D reconstruction.

After this post, i receive another question about “about a flat / rock, bumpy/ wall”. And my scheme about camera triplets can confused if we want scan “flat” surfaces.
 And I see that i did not explain how depth maps calculation part of MVS is working.

So i do this on “scanning flat surface” scenario.

We have two camera looking towards to wall. And distance between this 2 camera have about 60–70% overlap.

Light red area where we have “stereo” information and can reconstruct depth.
 As you can see, this is only 60–70% of image we took with 1 camera.
 So we can reconstruct only 70% of depth maps on every camera.

For reconstructing full wall depth we should take images with overlap enough for reconstruct all surface depth data without any gaps.

Here we have central camera and two “side” camera. In camera #1 we can recreate 100% of Depth. In cameras 2 and 3 only 70%.

But don’t forget that in real image due to real sensor, lens distortions, focus etc., we can “trust” only central part of image. And depend on camera, lens, and other conditions this can be only about 50~80% crop.

That’s why any photogrammetry software recommend take photos with 60–80% overlap for good results.