Document Scanner using OpenCV (HoughLines Approach)

Joel
Analytics Vidhya
Published in
3 min readApr 30, 2020
Original image (left) , Scanned image (right)

In this article steps to make document scanner using python and OpenCV from scratch has been explained. An interesting feature of this article is that Houghlines has been used for border detection.The popular contour method has performance issues with ill lit pictures with noisy boundaries so this is an alternate method you can work with.

The entire process has been explained step wise below.

  1. Import all required libraries
from skimage.filters import threshold_local
import numpy as np
import cv2 as cv

2. Read the image and process the image for edge detection

Image is converted to gray scale and blurring is carried for smoothing the image.Morphological dilation and erosion has been done to remove unwanted features(helps in edge detection).Then canny edge detection is used to find edges in the image.

img=cv.imread("document.png")
gray=cv.cvtColor(img,cv.COLOR_BGR2GRAY)
kernel = np.ones((5,5),np.uint8)
dilation = cv.dilate(gray,kernel,iterations =5)
blur =cv.GaussianBlur(dilation,(3,3),0)
blur= cv.erode(blur,kernel,iterations =5)
edge=cv.Canny(blur,100,200)

3. Using Houghlines to find document edges in the image after canny edge detection

Houghlines has been used to find top n (here,8) lines from the image after canny edge detection. Initially a high threshold (here,300) has been chosen and threshold is lowered iteratively to get the required number of Houghlines. Redundant lines (lines have similar rho and theta values)are removed and top four lines (corresponding to the four edges) are selected.

t=300;j=0while(j<8 and t>0):     
try:linesP=cv.HoughLines(edge,1,np.pi/180,t);j=linesP.shape[0]
except:j=0
t=t-10
lines=linesP.reshape(linesP.shape[0],2)
t=0;c=0;lu=[]
for l in lines:
c=c+1;rho,theta=l
for lt in lines[c:]:
t=0
if(lt[0]!=l[0]):
rhot,thetat=lt;k=abs(lt-l)<[50,0.5]
if(k[0] and k[1]):
t=-1;break
lu.append(l)
lr=np.asarray(lu[:4]);j=np.reshape(lr,[lr.shape[0],1,2])

4. Find intersection of four most likely edges to obtain the corner points of document

Intersections of edges corresponding to perpendicular sides of the border are found and these will be the four corners of the rectangle.

def l_inter(line1, line2):
r1, t1 = line1;r2,t2 = line2
A= np.array([[np.cos(t1),np.sin(t1)],[np.cos(t2),np.sin(t2)]])
b= np.array([[r1],[r2]]);x0,y0=(0,0)
if(abs(t1-t2)>1.3):
return [[np.round(np.linalg.solve(A, b))]]
def points_inter(lines):
intersections = []
for i, g in enumerate(lines[:-1]):
for g2 in lines[i+1:]:
for line1 in g:
for line2 in g2:
if(l_inter(line1, line2)):
intersections.append(l_inter(line1, line2))
return intersections
p=np.asarray(points_inter(j)).reshape(4,2)
Canny edge detection on pre-processed image (left) , Houghlines and corners drawn on original image (right)

5. To flatten the document, we use the methods perspective transform and warp perspective functions are used

Maximum and minimum sum of point coordinates correspond to bottom right and top left corners respectively.Similarly max. and min. difference gives top right and bottom left corners.These are then used to find width and height of the scanned document for the transformation matrix.

r= np.zeros((4,2), dtype="float32")
s = np.sum(p, axis=1);r[0] = p[np.argmin(s)];r[2] = p[np.argmax(s)]
d = np.diff(p, axis=1);r[1] = p[np.argmin(d)];r[3] = p[np.argmax(d)]
(tl, tr, br, bl) =r
wA = np.sqrt((tl[0]-tr[0])**2 + (tl[1]-tr[1])**2 )
wB = np.sqrt((bl[0]-br[0])**2 + (bl[1]-br[1])**2 )
maxW = max(int(wA), int(wB))
hA = np.sqrt((tl[0]-bl[0])**2 + (tl[1]-bl[1])**2 )
hB = np.sqrt((tr[0]-br[0])**2 + (tr[1]-br[1])**2 )
maxH = max(int(hA), int(hB))
ds= np.array([[0,0],[maxW-1, 0],[maxW-1, maxH-1],[0, maxH-1]], dtype="float32")transformMatrix = cv.getPerspectiveTransform(r,ds)
scan = cv.warpPerspective(gray, transformMatrix, (maxW, maxH))

6. Image is converted to B&W and is saved in the system

Binarization is done to convert scanned document to black and white.This is achieved through thresholding. Threshold and offset values can be set according to your requirements.

T = threshold_local(scan,21, offset=10, method="gaussian")
scanBW = (scan > T).astype("uint8")* 255
cv.imwrite("Scan.png",scanBW)

Note:

Various parameters used in the given example has been chosen as per their performance on this case.You may have to try different set of parameters for your case.

You can see the full source code on https://github.com/Joel1712/Document_scanner

--

--