Detecting the vanishing point in one-point perspective images using Computer Vision algorithms

8 min readDec 27, 2021

An image or photograph is called one-point perspective when it contains one and only one vanishing point. All parallel lines extending into the distance would converge to this vanishing point. This type of perspective is very easy to be found in images of roads, railway tracks or buildings.

Computer vision algorithms could be leveraged to identify this vanishing point. As the vanishing point is the intersection of all parallel lines, the line detection algorithm — Canny edge detector + Hough transform would be very helpful. Furthermore, RANSAC is used to locate the most likely vanishing point. In this article, I will explain these algorithms and demonstrate the steps to implement this simple application.

Canny edge detector

Canny edge detector detects edges based on the gradient of each pixel in the image and uses extra steps to improve the edge map. Edges are pixels with a high change in intensity, which is equivalent to the high gradient. The gradient at a single point (a pixel) (x,y) is a vector where the direction is the direction of maximum slope and the magnitude indicates how large the intensity changes. In a 2D image, the gradient is represented by the combination of partial derivatives w.r.t x and y:

Formula to calculate the gradient for each pixel in an image

i is the unit vector in the x-direction and j is the unit vector in the y-direction. f indicates the pixel values/intensity (so delta f indicates the intensity changes).

An extra step before calculating the gradient is blurring the image using a Gaussian filter. This step smooths the image and removes noises. In real life, we normally apply the derivative of the Gaussian filter directly instead of applying the filter and then calculate the derivative. This makes use of the associative property of convolution and reduces the calculation burden.

In the Canny edge detector, there are two extra steps to improve the accuracy of detected edges: non-maximum suppression and thresholding with hysteresis. Non-maximum suppression solves the problem of several detected edges where only one of them should be the true one (as shown below).

From left to right: The image, detected (many) edges, the true edge

Non-maximum suppression claims that if nearby pixels are part of the same edge, only the one with the maximum gradient is kept. We firstly identify the orientation of each edge pixel. Then, for each edge pixel (said E), we assert the two adjacent pixels orthogonal to this edge pixel E. If either adjacent pixel has the same edge orientation as E and a higher magnitude, this pixel E is not an edge (we just remove it from the detected edge pixels).

The next step is thresholding with hysteresis. We need two hyperparameters to do this step: a threshold T1 for strong edges and a threshold T2 for weak edges. For each edge pixel, if its magnitude is greater than T1, it will be classified as a strong edge; if the magnitude is less than T1 and greater than T2, it will be classified as a weak edge. After this, we will update weak edges (this is how “hysteresis” comes!) based on one simple criterion: for each weak edge, if any of the 8-pixel neighbourhoods is a strong edge, we should relabel the weak edge pixel as a strong one. Finally, we have an edge map with all strong edges.

Hough transform for straight lines

Hough transform identify lines in an image based on the edge pixels (which can be found using the Canny edge detector). Each edge pixel would “vote” for the lines that pass through it. Lines are identified based on the number of “votes”. The key idea in the Hough transform is transforming the straight line y = mx + b in the image space into a point (θ, ρ) in the parameter(Hough) space. For each edge pixel, there are infinite lines that pass through it. Those lines can be represented as a single line in the parameter space as shown below.

From left to right: 3 points in the image space, corresponding 3 lines in the hough(parameter) space

The intersection of lines in the Hough space(the blue point in the right figure) represents a line in the image space(the red dotted line in the left figure). It is common to bin the Hough space, i.e. divide the space into small bins and count the number of lines that pass through each bin as the number of “votes”. Each bin in the Hough space corresponds to a line in the image space. Binning can alleviate the effects of noises in the edge map.

We determine lines in the original image based on the number of “votes” of each bin, by comparing it with a preset threshold. Alternatively, we can rank bins based on the number of votes and select the top N bins. Other criteria such as the minimum line length and the maximum allowed gap (to ensure the segments from a line are treated as one) can be used to further justify lines.

RANSAC — RANdom SAmple Consensus

RANSAC is a popular algorithm to iteratively estimate a model based on the observed data that may involve outliers. In each iteration, we sample N data at random from the observed data, calculate the model based on the sampled data and then count the number of inliers, i.e. data that can be explained by the model. After N iterations (or any of the models is satisfactory), the algorithm stops and the best model (the one with most inliers) is returned.

To apply RANSAC in our problem, we iteratively do:

Sample 2 lines at random from N detected lines by Hough transform
Calculate the intersection of two lines, treat the intersection point as the vanishing point (the “model”)
Count the number of lines (inliers) that are explained by the intersection point. The inliers are those lines that go through or very close to the estimated point.

The best model, i.e. the most possible vanishing point is the one with the most inliers.

Algorithm to detect the vanishing point

Understanding the above algorithms, it is very simple and easy to implement a program in Python to estimate the vanishing point. Canny edge detector and Hough transform are provided in the OpenCV library. We can simply call these methods to locate lines in an image.

# Use Canny edge detector to locate edges. Parameters are tuned with some sample images, it may performs unexpected for other images
edge_image=cv2.Canny(original_image,40,70,apertureSize=3,L2gradient=True)
# Use hough transform to detect all lines
lines=cv2.HoughLines(edge_image, 1, np.pi/120, 55)

We do simple filtering to remove horizontal and vertical lines as they would not converge to the vanishing point.

valid_lines = [] # Useful lines for applying RANSAC
for line in lines:
    rho,theta = line[0]
    if (theta>0.4 and theta < 1.47) or (theta > 1.67 and theta < 2.74):
        valid_lines.append(line)

To implement RANSAC, we need two helper functions: A function to calculate the intersection point based on two lines, and a function to calculate the distance from a line to a point which is used to decide inliers later on.

# Function 1: Find the intersection point given 2 lines
def find_intersection_point(line1,line2):
    """Implementation is based on code from https://stackoverflow.com/questions/46565975, Original author: StackOverflow contributor alkasm 
    Find an intercept point of 2 lines model
    Args: line1,line2: 2 lines using rho and theta (polar coordinates) to represent 
    Return: x0,y0: x and y for the intersection point
    """
    # rho and theta for each line
    rho1, theta1 = line1[0]
    rho2, theta2 = line2[0]
    # Use formula from https://stackoverflow.com/a/383527/5087436 to solve for intersection between 2 lines 
    A = np.array([
        [np.cos(theta1), np.sin(theta1)],
        [np.cos(theta2), np.sin(theta2)]
    ]) 
    b = np.array([[rho1], [rho2]])
    det_A = np.linalg.det(A)
    if det_A != 0:
        x0, y0 = np.linalg.solve(A, b)
        # Round up x and y because pixel cannot have float number
        x0, y0 = int(np.round(x0)), int(np.round(y0))
        return x0, y0
    else:
        return None
        
# Function 2: Find the distance from a point to a line
def find_dist_to_line(point,line):
    """Implementation is based on Computer Vision material, owned by the University of Melbourne
    Find an intercept point of the line model with a normal from point to it, to calculate the distance betwee point and intercept
    Args: point: the point using x and y to represent
    line: the line using rho and theta (polar coordinates) to represent 
    Return: dist: the distance from the point to the line
    """
    x0,y0 = point
    rho, theta = line[0]
    m = (-1*(np.cos(theta)))/np.sin(theta)
    c = rho/np.sin(theta)
    # intersection point with the model
    x = (x0 + m*y0 - m*c)/(1 + m**2)
    y = (m*x0 + (m**2)*y0 - (m**2)*c)/(1 + m**2) + c
    dist = math.sqrt((x - x0)**2 + (y - y0)**2)
    return dist

The main RANSAC loop is implemented as mentioned above

def RANSAC(lines,ransac_iterations,ransac_threshold,ransac_ratio):
    """Implementation is based on code from Computer Vision material, owned by the University of Melbourne
    Use RANSAC to identify the vanishing points for a given image
    Args: lines: The lines for the image
    ransac_iterations,ransac_threshold,ransac_ratio: RANSAC hyperparameters
    Return: vanishing_point: Estimated vanishing point for the image
    ""
    inlier_count_ratio = 0.
    vanishing_point = (0,0)
    # perform RANSAC iterations for each set of lines
    for iteration in range(ransac_iterations):
        # randomly sample 2 lines
        n = 2
        selected_lines = random.sample(lines,n)
        line1 = selected_lines[0]
        line2 = selected_lines[1]
        intersection_point = find_intersection_point(line1,line2)
        if intersection_point is not None:
            # count the number of inliers num
            inlier_count = 0
            # inliers are lines whose distance to the point is less than ransac_threshold
            for line in lines:
                # find the distance from the line to the point
                dist = find_dist_to_line(intersection_point,line)
                # check whether it's an inlier or not
                if dist < ransac_threshold:
                    inlier_count += 1

            # If the value of inlier_count is higher than previously saved value, save it, and save the current point
            if inlier_count/float(len(lines)) > inlier_count_ratio:
                inlier_count_ratio = inlier_count/float(len(lines))
                vanishing_point = intersection_point

            # We are done in case we have enough inliers
            if inlier_count > len(lines)*ransac_ratio:
                break
    return vanishing_point

Canny edge detector, Hough transform and RANSAC parameters (ransac_iterations, ransac_threshold and ransac_ratio) can be tuned based on the use cases. The values I use work well for a small set of images but may perform undesirable for other images.