The dumb reason your fancy Computer Vision app isn’t working: Exif Orientation
I’ve written about lots of computer vision and machine learning projects like object recognition systems and face recognition projects. I also have an open source Python face recognition library that is somehow one of the top 10 most popular machine learning libraries on Github. Together, that means that I get asked a lot of questions from people new to Python and computer vision.
In my experience, there is one technical problem that trips people up more often than any other. No, it’s not a complicated theoretical issue or an issue with expensive GPUs. It’s the fact that almost everyone is loading their images into memory sideways without even knowing it. And computers are less than excellent at detecting objects or identifying faces in sideways images.
How Digital Cameras Auto-Rotate Images
When you take a picture, the camera will sense which end you have tilted up. This is so the picture will appear in the correct orientation when you look at it again in another program:
But the tricky part is that your camera doesn’t actually rotate the image data inside the file that it saves to disk. Because image sensors inside digital cameras are read line-by-line as a continuous stream of pixel information, it’s easier for a camera to always save the pixel data in the same order no matter which way the camera was held.
It’s actually up to the image viewer application to rotate the image correctly before displaying it. Along with the image data, your camera also saves metadata about each picture — lens settings, location data, and of course, the camera’s rotation angle. The image viewer is supposed to use this information to display the image correctly.
The most common format for image metadata is called Exif (short for Exchangeable image file format). The Exif-formatted metadata is shoved inside the jpeg file that your camera saves. You can’t see Exif data as part of the image itself, but it is readable by any program that knows where to look for it.
Here’s the Exif metadata inside our Goose jpeg image as displayed by
Notice the ‘Orientation’ data element. This tells the image viewer program that the image needs to be rotated 90 degrees clockwise before being displayed on screen. If the program forgets to do this, the image will be sideways!
Why does this break so many Python Computer Vision Applications?
Exif metadata is not a native part of the Jpeg file format. It was an afterthought taken from the TIFF file format and tacked onto the Jpeg file format much later. This maintained backwards compatibility with old image viewers, but it meant that some programs never bothered to parse Exif data.
Most Python libraries for working with image data like numpy, scipy, TensorFlow, Keras, etc, think of themselves as scientific tools for serious people who work with generic arrays of data. They don’t concern themselves with consumer-level problems like automatic image rotation — even though basically every image in the world captured with a modern camera needs it.
This means that when you load an image with almost any Python library, you get the original, unrotated image data. And guess what happens when you try to feed a sideways or upside-down image into a face detection or object detection model? The detector fails because you gave it bad data.
You might think this problem is limited to Python scripts written by beginners and students, but that’s not the case! Even Google’s flagship Vision API demo doesn’t handle Exif orientation correctly:
And while Google Vision still manages to detect some of the animals in the sideways image, it detects them with a non-specific “Animal” label. This is because it is a lot harder for a model to detect a sideways goose than an upright goose. Here’s what Google Vision detects if the image is correctly rotated before being fed into the model:
With the correct image orientation, Google detects the birds with the more specific “Goose” label and a higher confidence score. Much better!
This is a super obvious problem if you can see that the image is sideways like in this demo. But this is where things get insidious —normally you can’t see it! Every normal program on your computer will only display the image in its properly rotated form instead of how it is actually stored sideways on disk. So when you try to view the image to see why your model isn’t working, it will be displayed the right way and you won’t know why your model isn’t working!
This inevitably leads to people posting issues on Github complaining that the open source projects that they are using are broken or the models aren’t very accurate. But the problem is so much simpler — they are feeding in sideways and/or upside-down images!
Fixing the Problem
The solution is that whenever you load images in your Python programs, you should check them for Exif Orientation metadata and rotate the images if needed. It’s pretty simple to do, but surprisingly hard to find examples of code online that does it correctly for all orientations.
Here is code to load any image into a numpy array with the correct rotation applied:
From there, you can pass the array of image data to any standard Python ML library that expects arrays of image data, like Keras or TensorFlow.
Since this comes up so often, I published this function as a library on pip called image_to_numpy. You can install it like this:
pip3 install image_to_numpy
You can use it in any Python program to load an image correctly, like this:
import matplotlib.pyplot as plt
import image_to_numpy# Load your image file
img = image_to_numpy.load_image_file("my_file.jpg")# Show it on the screen (or whatever you want to do)
Check out the readme file for more details.
If you liked this article, consider signing up for my Machine Learning is Fun! Newsletter: