Denoising Documents with background noise.
Nowadays, people’s ability to convert documents into digital and readable formats has become a real necessity. By scanning the documents, printed documents can be converted into a digital format. A common problem that occurs when scanning documents are the “noise” that can occur in an image due to the paper quality of the typewriter being used. Noise reduction is one of the steps in preprocessing.
Types of Noise in Optical Character Recognition
Gaussian noise
The main source of Gaussian noise in the digital image is created during the acquisition. The sensor has inherent noise due to the level of illumination and its own temperature, and the electronic circuits connected to the sensor inject their portion of the electronic noise.
Salt-and-pepper noise
The fat tail noise distributed or “impulsive” is called salt and pepper noise. An image containing salt and pepper noise consists of dark pixels in bright regions and bright pixels in dark regions. This type of noise is caused by analog-to-digital converter errors, bit errors in the transmission. It can be reduced by using dark box subtraction, median filtering, and dark/bright pixel interpolation.
Shot noise
The noise in the darkest parts of an image of an image is usually caused by quantum statistical fluctuations, that is, the variation in the number of photons detected at a given exposure level. This noise is called photon firing noise.
Mathematical Filters Used for Noise Reduction
Median Filtering
Median filtering is a non-linear method used to eliminate noise from images. It is widely used as it is effective in eliminating noise and preserving edges. It is effective to eliminate impulse type noise.
The median filter works by moving through the image pixel by pixel, replacing each value with the median value of the neighbouring pixels. The neighbour pattern is called a window, which slides, pixel by pixel, over the entire image area. The median is calculated by first sorting all the pixel values of the window in numerical order, and then replacing the pixel that is considered with the pixel value of the median.
Average Filtering
The average (or middle) filtering is the method of ‘smoothing’ the images by reducing the amount of intensity variation between neighboring pixels. The average filter works by moving through the image pixel by pixel, replacing each value with the average value of the neighboring pixel.
Filter Comparison (Average and Median Filter)
Existing Model for improving Background-Noise
Binarization and Thresholding Based Methods
A method to improve the background quality of grayscale images uses thresholding and binarization techniques. Some resources divide the thresholding techniques into two groups. The methods in the first group use global algorithms that employ global image characteristics to determine the appropriate thresholds to divide the pixels of the image into objects. The second group uses local image information to calculate the thresholds, similar to the local adaptive thresholding method that uses neighbourhood characteristics, such as the mean and the standard deviation of the pixels. However, the methods of the second group are much slower than the first, but their accuracy is greater.
Fuzzy Logic Based Methods
Improving the quality of the image using fuzzy logic operators is based on mapping the gray levels of the image in the diffuse space, and we know that the definition of a suitable membership function requires experience and prior knowledge. Improvement with fuzzy operators employs weighting characteristics proportional to some image characteristics, such as average intensity at higher contrast.
Histogram-Based Methods
An image histogram acts as a graphic representation of the intensity distribution in an image. Plot the number of pixels for each intensity value. The histogram for a very dark image will have most of its data points on the left and center of the graph. Conversely, the histogram of a very bright image with few dark areas will have most of its data points on the right and center of the graph, so the contrast in an image will be enhanced by histogram equalization.
Approach to remove pepper noise:
Adaptive Median Algorithm
For the natural image, neighboring pixels have a strong correlation. The gray value of each pixel is quite close to the neighboring pixels, and the pixels of the edge also have the same property. If the value of a pixel is greater or less than the value in the neighborhood, the pixel is contaminated by noise; otherwise, the pixel is an available pixel. In the process of noise reduction, we check sequentially each pixel, if the value of a pixel is greater than the average value in the mask, then we judge that the pixel is contaminated by noise and we replace it with the median value of the mask; otherwise, we keep the original pixel value unchanged.
This method not only reduces the calculation time but also preserves the details of the image as much as possible. The original value of the pixel is replaced with the median value in the mask, and in the next calculation process, the average value can make full use of the new value of the pixel. This forms an iterative process; not only decreases the complexity of time but also improves the effect of noise reduction.
Steps of the mix algorithm are shown below:
(1) The mask slides over the image, overlaps the center of the mask with the pixel on the image to search the center element f (i, j);
(2) To read the values of the corresponding pixels of the mask;
(3) To compute the average value (average) of the mask;
(4) To compare the value of each pixel with average, if the value of each pixel is greater than average, then searching the median value and let f (i, j) = med; otherwise, retaining the original value of the pixel unchanged;
(5) Repeating the step (4), until i = j = n.
Convolutional neural networks
Convolutional neuronal networks are formed by neurons that have weights and biases that can be learned. Receiving some tickets and making a point product with weights is one of the initial tasks performed by the neurons on CNN. Mostly follows a non- linearity. In general, the entire network results in a single differentiable scoring function: where the pixels of the raw image are at one end and the class scores at the other end. However, CNN is somehow different from the normal neural network. It has been considered that one of the assumptions is that most of the time they are images, which leads to some properties when designing their architecture. This results in the reduction of the number of parameters in the network and the greater efficiency of the main function.
Pseudo Code for CNN (convolutional neural network)
- First import Pandas Numpy Keras from python library.
- Initialize the path of training, cleaned and testing samples directory.
- Visualize training samples using the matplotlib library.
- Define autoencoder the layers specifications of our neural network.
- Input image array is of shape (420, 540, 1). Feed it to the first hidden layer of convolution having 32 nodes. Followed by another layer of convolution having 64 nodes. Both using (3,3) matrix as their window size and rectified linear function as their activation function.
- After every Convolution layer, max pooling of (2,2) size is applied to reduce the data to make the system work faster.
- The decoder is exact in the opposite order of encoder. A convolution layer of 64 nodes followed by 32 nodes layer having upsampling to bring back the original size of the image.
- At last output layer of convolution having 1 node is applied to get the output with sigmoid function as the activation function. Using (3,3) matrix as window size.
- Training images are loaded into X and cleaned image into Y for training our CNN model.
- Now split X and Y data into training and testing sets using traintestsplit function.
- Model is trained using a fit function with 10 epoch and batch size of 8.
- Now to check our results of the model we gave test samples images and output is compared with the input image using the matplotlib library plt.subplots function.
Results:
Future Work:
In the future, we wish to explore other types of deep neural networks as a potential solution to the image denoising problem.
We also wish to work on a hardware-based solution for blind people, which can do optical character recognition in real time so that the blind people can read/listen to texts, which are printed or even handwritten. We wish to work on a wearable computer vision device in form of glasses which will process in front images of a visually impaired person which will convert it to sound and whisper in the ear using hearing aids. Visually impaired will be able to walk around unfamiliar places to avoid obstacles to read newspaper hoardings or any text in real-time and focus to get a much-needed independence.