Concept of Modified Adaptive Thresholding Using Integral Image to Decompose Text under Illumination Effects in Natural Scenes

Text in natural scene

Adaptive thresholding, the simple way to perform image segmentation, is a form of image thresholding used to classify pixels as dark and light. Taking grayscale image as an input for this task is only good in case that text appears in low intensity area. In case that text appears in high intensity area, it leads to lower recall rate for text detection process.

However, it can be fixed by taking an inverted-grayscale image instead. The problem is how to determine automatically whether it is better to take the normal-grayscale or inverted-grayscale image for each original image.

This article proposes the simple way to do that by means of the adaptive thresholding using the integral image itself with some additional steps based on its principle. The proposed method consists of two main process, low/high intensity area segmentation and modified adaptive thresholding.


Illumination effect on image is one of the grand challenges for text detection and recognition in natural scenes. It usually affects to the text detection step by making some details lost or hard to extract text from the background. Many applications, especially mobile applications, have text detection and recogni-tion processes needed a solution for this problem.

Shadow and lighting are the most common illumination effects that lower a recall rate of text detection and recognition in real applications. Adaptive thresholding methods have been developed for decades to binarize image and account for variations in illumination.

In 1993, Pierre D. Wellner [1] proposed a fast adaptive thresholding which calculates the moving average of the last s pixels seen to be the local threshold. He treated the image as though it were a single row of pixels composed of all rows in the image lined up next to each other, and then used 1/8th of the image’s width for the value of s where denotes window’s size to calculate the average value.

In 2007, adaptive thresholding using the inte-gral image was proposed by Derek Bradley et al. [2]. It was the simple extension of Wellner’s method which gives a better representation of the surrounding pixels than a moving aver-age by sacrificing one additional iteration through an image. This method is clean, straightforward and easy to code.

Both Derek’s and Wellner’s methods focused on applying to docu-ment images. Fortunately, these methods can also be applied to natural scene images. However, the adaptive thresholding based on Wellner’s and Derek’s ideas do not work well in the case of text appearing in high intensity area. Its advantage aimed at solving shadow problem on the black printing on a white sheet of papers, that is, text appears in the low intensity area. Therefore, this article proposes the method that can work well for both cases: text appearing in high and low intensity areas.

Image binarization

Related Works

Here I’ll describe the adaptive thresholding using an integral image proposed by Derek Bradley et al. [2]. The integral image (or summed area table) is a powerful and efficient technique that helps speed up the area calculation in an image. Mathematically, it is the summation of information in upper-left area.

(left) An original grayscale image and (right) an integral image

Where I is the integral value at any point (x,y) and i is the intensity at any point (x,y), the fast calculation of the integral image I(x,y) can be computed efficiently by using

I(x,y ) = i(x,y )+I (x-1,y )+I (x,y-1 )-I(x-1,y-1)

After completely computing the integral image, the pixel classification is performed by the following steps.

(i) deter-mining the sum of pixel values, Rs, over a rectangle R within a moving window with size S×S, such that the window size depends on the image’s width W and the possibly maximum window’s width is 1/8th of W. Here, Rs defined by the following formula represents a local threshold. The shadow area R is on the original grayscale image.

Rs = I( x2 ,y2)-I(x1,y2)-I(x2,y1)+I(x1,y1)

where x1, x2 = x ± S/2 and y1,y2 = y ± S/2 .

The sum over rectangle R in the original image

(ii) obtaining the number of pixels in R by the following

C = (x2 -x1)×(y2-y1)

Finally, the expression for adaptive thresholding is defined as below

i(x,y)≤ (Rs/C)×(1-T/100)

where T is a percentage value. In the step of image binarization, each pixel is set to black if its value satisfies the condition; otherwise, it is set to white.

Proposed Idea

A. Analysis of illumination effects

The considered illumination effects are shadow, which decreases the intensity of effected area, and lighting, which increases the intensity of effected area. Shadow makes text extraction harder because of the less contrast between text and background while lighting causes the detail lost with the spot of light.

The adaptive thresholding is an efficient technique providing the good result for text-background decomposition when text appears in low intensity area. As mentioned in introduction section, binarization using adaptive thresholding does not work well in case of text appearing in high intensity area.

(left) An original grayscale image with text appearing in low intensity area and (right) the result of image binarization using adaptive thresholding.
(left) An original grayscale image with text appearing in high intensity area and (right) the result of image binarization using adaptive thresholding.

However, for text appearing in high intensity area, the image binarization using adaptive thresholding loses some details, especially in the text area. This problem can be fixed by taking complement-grayscale image as an input instead, because inversion of grayscale image for this case makes the area around text becoming dark. In other words, it’s to make text appear in low intensity area.

(left) A complement-grayscale image with text appearing in low inten-sity area and (right) the result of image binarization using adaptive thresholding.

As described in Related Works section, integral image is used to calculate the sum of values in an upper-left region for each pixel and Rs for the adaptive thresholding condition. All of these are just a part of binarization, I combine them with the area segmentation method to make it possible for overcoming both shadow and lighting problems.

Next, I’ll discuss about applying two different approaches: (i) the use of a segmentation using integral image (SUI) to divide the image into high and low intensity areas at first and (ii) the use of a modified adaptive thresholding (MAT) to classify each pixel as black or white. The MAT algorithm works differently on the divided areas.

B.Modified adaptive thresholding

In this subsection, the combination of SUI and MAT for segmentation and binarization are proposed.

The combination between SUI and MAT

Firstly, it takes a grayscale image for the segmentation. The outcome of SUI is the segmented image used as input for the thresholding step. Then, MAT uses the adaptive thresholding with different conditions for different areas. The segmentation using integral image classifies an image into two areas: the high intensity and/or the low intensity areas. To do this, the sum over rectangular area is used to calculate a local mean and then is compared to the mean value of the whole image with the following condition.


where M is the global mean — the mean value of a whole image. The considered pixel is classified to be in the low intensity area represented with black if its local mean is less than or equal to the global mean, otherwise it is considered as a part of the high intensity area represented with white.

(left) Grayscale image and (right) its segmented image

After a pixel is classified, it is binarized by using the modified adaptive thresholding. For the high intensity area, the binarization method is similar to Derek’s adaptive thresholding, but the considered values are from the complement-grayscale image, as described with the following condition.


where i’ is the value at any point (x,y) on complement-grayscale image and R’s is the Rs at any point (x,y) calculated from complement-grayscale image. The pixel within R’ area that local mean is T percent lower than that pixel, it is set to black, otherwise it is set to white.

For the low intensity area, the binarization method is different. The condition to classify each pixel uses the local mean of values within R area to compare with that pixel’s value. All values are from original grayscale image.


The pixel whose value is lower than or equal to the local mean of surrounding pixels is set to black, otherwise it is set to white.

(left) A segmented image and (right) a binarized image

In this way, the proposed method increases success rate of text localization. That is, we have text as many as in the scene images for recognition.) A segmented image and (right) a binarized image


[1] Pierre D. Wellner, “Adaptive thresholding for the digitaldesk,” Tech. Rep. EPC-93–110, EuroPARC, 1993.
[2] Derek Bradley and Gerhard Roth, “Adaptive thresholding using the integral image,” Journal of Graphics Tools, Vol. 12, №2, 2007, pp. 13–21.