Computer Vision and Autopilots — Car Dash Camera Calibration

Dmytro Brazhnyk
21 min readFeb 2, 2024

--

The problem, that we trying to solve here is programmatically finding the alignment between car orientation and camera. In best way it will be represented as point on camera video streaming frame, where the car velocity is directed to.

Typically such problem can be easily solved with use of machine deep learning and neuron networks. However I’ve got insight that this problem could be solved with non-trivial but very elegant solution, my knowledge in mathematics led me to use Least Square Optimization for Optical Flow representation of captured video stream, without need to train algorithm with real videos. This approach could automatically adopt to any environment as much as optical flow could provide accurate output.

Problem Statement

Let’s say your are developing car autopilot and using some sort of dash camera for computer vision and navigation, and due to lack of standardized approach to install that camera, each camera installation can have different alignment between camera and car orientation, and the car dash camera calibration is important steps to for car auto-piloting in general.

Before we will jump in all the hassle of mathematical formula and computer algorithms, in this youtube playlist you can see 10 video samples of algorithm execution result :

Solution Overview

To solve the problem, I’ve mainly relied here on Optical Flow with combination of Least Square Optimization that will detect a point on video frame where car is moving to, considering that typically most of the time car moving straight forward, detecting car velocity direction from camera video stream would represents car and camera alignment in best way, and it will tell actuall how the camera oriented in relation to car.

The challenge here, is that car is not moving all the time straight, also due to rocking and shaking the actual velocity could be time-to-time chaotic and random around the actually searched values.

So once we can get a stream of car velocity points over time, by reading it from car dash came video stream, next step is to filter out outliers and find some good percentile of data close to real positioning.

Optical Flow

So first question: how could we capture car velocity from video? How can we found the point on video streaming where the car velocity is headed to?

To solve that problem, there is very handy representation of visual motion data called optical flow.

Wikipedia explains optical flow as: “Optical flow or optic flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and a scene.”

https://en.m.wikipedia.org/wiki/Optical_flow

I’ll try to rephrase it quite a bit to make it more understandable, on video stream we typically capturing how object animated in real a life. So with use of certain algorithm by processing two or more consecutively video frames, we able to calculate a vector for every pixel of video frame it, that represent how and where objects are moving behind that pixel on that video.

For example let’s say we have static camera, that recording street and there is car moving from left side of video to the right side, by feeding two consecutively video frames, all the optical flow vector of every pixel of the car will be directed to the right, and the magnitude of that vectors will be speed of car of real 3D world velocity projected onto 2D video frame image, as much as 2D pixel processing algorithm can extract that motion.

What will happen if camera is not static, but camera is moving itself, for example like car dash camera. Intuitively we can assume that all optical flow vector will be headed from distant point, where our velocity headed to, and optical flow vector will look backward to car motion, like in Star Wars movie when starship entering hyperspace.

The reason why it happens so: when we driving straight to very distant point far away, all object are moving relatively backward to us following straight parallel lines, and according to perspective projection as real camera do, all parallel line of real 3D world in 2D perspective projection will not be parallel anymore:

  • On one side all these parallel lines of 3D will be connected in single point on 2D perspective projection , where this parallel line directed to far away in 3D space.
  • Other side of that 3D parallel line will go outside of camera clipping 2D space.

Please see illustration of perspective projection below, so you can see this 2D projection of intersection of parallel lines in 3D space, very clearly:

So optical flow just represent, this relative parallel backward motion of objects we passing by, starting at distant point where our car movement directed to.

In python it is handy to use OpenCV library to capture video streaming and convert two consecutive frames into optical flow:

import cv2

cap = cv2.VideoCapture(self.video_file)

prev_gray = None

while True:
ret, frame = cap.read()
if not ret:
break

gray = cv2.cvtColor(self.frame, cv2.COLOR_BGR2GRAY)
flow = None

if prev_gray is not None:
flow = cv2.calcOpticalFlowFarneback(self.prev_gray, gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)

# flow is ready for further processing:

prev_gray = gray

cap.release()

This picture is visualization of optical flow vector that has largest magnitude from the video capturing that I’ve used to test an algorithm, and I think you should see enough optical flow vectors to calculate point of car velocity.

You can see this tree on the right and on the left side, that it is directed to move outside of camera view-space as car keeps move forward.

Least Square Optimization

Now we learned what’s an optical flow, and to solve our problem, we need to find a point where all lines projected by optical flow vector is intersected, since it is a point where car is headed.

In most simple case if we have just to find an intersection of two straight non-parallel lines projected by two non-coliniar vectors — this is very simple mathematical problem.

However in our case we have entire picture of such vectors, where each pixel has it’s own optical flow vector and corresponding projected line, and the challenge that due to noise and optical flow calculation accuracy — not every vector will look into the same point.

So the problem that we trying to solve now, is to find the point that will be in most close position to all straight lines project by all optical flow vectors.

I’ve mentioned that we will use Least Square Optimization(LSO) for that, and for LSO we need to define evaluation function. To find closest point to all projected lines, as evaluation function will provide a distance between searched point and projected lines, and we will try to minimize value of that function with use of LSO.

To define the distance between point and line, we will use formula below:

ax + by + c = d

Where

  • d — is a distance between point (x, y) to line
  • (a,b) — is unity vector perpendicular to the line
  • c — is distance from center coordinate point (0,0) to the closest point on the line

And it should be obvious that any point that satisfies this equation: ax+bx+c=0 lies on that line.

In this article, I don’t have a purpose to go too much into details of linear algebra and geometry, however if you want to learn more about finding distance between point and line, you can refer to wikipedia with URL bellow:

https://en.wikipedia.org/wiki/Distance_from_a_point_to_a_line

Meaning of unit vector explained here:

Let’s say if we know optical flow vector and and its position, so we can find (a,b,c) in following way

For least squared optimization we need to define our function that we trying to optimize, so we need to find a point where distance to all lines projected by optical flow vectors would be minimal, so as optimization function we will take a sum of squared distances between point and lines, with following formula, and we want to find point (x, y) where LSO be minimal, where a, b, c are known arguments calculated from optical flow vectors.

To find x, y where LSO will take the minimum value, it can be done by finding roots of derivative for that equation. Please refer to wiki to learn more what’s derivatives are:

https://en.wikipedia.org/wiki/Derivative

As result, will have easily solvable system of linear equations:

It can be described in form of matrix, and solved by means Gaussian elimination or any other method:

https://en.wikipedia.org/wiki/Matrix_(mathematics)

https://en.wikipedia.org/wiki/System_of_linear_equations

https://en.wikipedia.org/wiki/Gaussian_elimination

And of course, Python has nice library functions to solve such equations:

https://numpy.org/doc/stable/reference/generated/numpy.linalg.solve.html

As result (x,y) - will be point on the screen where car is directed.

Precise car move detection

For more sophisticated scenario where car moving forward and spinning around one of its axes, needs to solve a little more sophisticated formula with few extra variable (u,v) in addition to (x,y). Where (x,y) will represent the point where car is directed, (u,v) will represent car spinning motion.

However in our case to find camera alignment, we need only few good frames where car is moving forward, any spinning or changing of driving direction, it will just create a noise for camera alignment calculation.

Bellow I will described few tricks, how to filter-out noise from calculation in order to calibrate camera.

LSO Code

How to capture video stream and calculate optical flow with use of Open CV library, I’ve provided above.

Here is the code to calculate intersection point of optical flow projected lines, which is also car velocity point by using formulas described above:

def extract_flow_points_tf(flow):
"""This method converts 3D array flow calculated by Open CV
into set of 1D arrays where:
(qvx[i], qvy[i]) - is i-th optical vector
(qx[i], qy[i]) - is pixel coordinate of i-th optical vector
"""
h, w, _ = flow.shape

flow_tf = tf.constant(flow, dtype=tf.float64)
qy, qx = tf.meshgrid(tf.range(h), tf.range(w), indexing='ij')

qx = tf.cast(tf.reshape(qx, (-1, )), dtype = tf.float64)
qy = tf.cast(tf.reshape(qy, (-1, )), dtype = tf.float64)
flattened_flow = tf.reshape(flow_tf, [-1, tf.shape(flow_tf)[-1]])
qvx = flattened_flow[:, 0]
qvy = flattened_flow[:, 1]

return {
"qx": qx,
"qy": qy,
"qvx": qvx,
"qvy": qvy
}

def list_of_lines_tf(q):
"""Calculating (a,b,c) per formula described above"""
l_a = q["qvy"]
l_b = -q["qvx"]
l_c = -(l_a * q["qx"] + l_b * q["qy"] )

return {
"l_a": l_a,
"l_b": l_b,
"l_c": l_c
}


def find_move_direction_by_lines_tf(l):
"""Finds point (x,y)"""

cnt = np.linalg.solve(
np.array([
[tf.math.reduce_sum((l["l_a"] * l["l_a"])), tf.math.reduce_sum((l["l_a"] * l["l_b"]))],
[tf.math.reduce_sum((l["l_a"] * l["l_b"])), tf.math.reduce_sum((l["l_b"] * l["l_b"]))],
]), np.array(
[-tf.math.reduce_sum((l["l_a"] * l["l_c"])), -tf.math.reduce_sum((l["l_b"] * l["l_c"]))]
)
)

return {
"c_x": cnt[0],
"c_y": cnt[1]
}

You may notice here, that I didn’t done normalization for optical flow vectors to make it unit vectors, the same way as I’ve described straight line function. I’ve mentioned that (a, b) — should be unit vector, however the reason why I didn’t normalized vector (a,b) — will be described with more derails in next chapter about Noise Suppression.

Noise Suppression

The sample optical flow visualization, that I’ve shared above, is actually representing only few optical flow vectors, not all of them, however the optical flow provides vector for each pixel on the screen. To make this visualization representative to describe optical flow in best way, I’ve drawn only that optical flow vectors which magnitude was high enough above certain threshold.

Obviously that vector with small magnitude, are not good to describe car velocity. For example you can see this optical flow vectors in the bottom of the screen, which belongs to car dashboard, the car dashboard is static and don’t have any motion relative to car camera, and its small noisy vectors randomly looking in different directions, and the magnitude of these vector is small and barely breaking threshold.

On the other side, we can see quite high magnitude for road marking and trees on road sides, so faster car moving — higher magnitude this objects has, and particularly these objects represent car motion in best way.

If I will draw optical flow and normalize their vector size, you will see picture like this:

Obviously optical flow vector magnitudes matter however, it is not the only single parameter that we should take into account to filter the noise.

Let’s enumerate which attribute of optical flow we can use to measure the quality of each optical flow.

Quality of model evaluation list

First we need to remember what parameters we have:

  1. At this point we are able to find a point (x,y) — where car is moving to
  2. We can keep a history of series of points (x, y) calculated in the past
  3. Each optical flow vector has magnitude
  4. Each optical flow vector has direction
  5. Each optical flow vector is applied to certain pixel-position on video frame

Using this argument above we can use certain calculation to evaluate the quality, which I’ve split into two categories:

  • We can evaluate quality for each particular optical flow vector, and drop that optical flow vectors from calculation, which seems are outliers, and just breaking accuracy of our model
  • The series of calculated car direction points (x, y) is also matter of study for quality evaluation.

Optical flow vector magnitude

The first and obvious to take into consideration is to use magnitude.

I’ve tested couple approach here:

  1. One approach is to introduce some threshold to drop any optical vector which magnitude is bellow threshold
  2. Another approach, is to quantify magnitude and use it as weight so longer optical flow will impact more calculation than smaller one.

Of course I’ve started with 1st approach as most obvious one. I’ve started dropping vectors which didn’t pass certain threshold, dropping such vector is actually very good for visualization, so my assumption was, If I can see it, I can calculate it, however when I’ve run few more test, I didn’t see a lot of differences either I am doing magnitude cut-off or not.

However if you remember this function, which can calculate the distance between any point on screen and optical flow projected line that used in LSO:

ax + by + c = d

I found very significant difference, when my argument (a,b, c) were normalized and when my argument (a,b,c) were proportional to optical flow vector magnitude. And normalization of (a,b,c) significantly reduced calculation accuracy comparing to proportional approach even with cut-off.

The rational behind — if you will multiply all argument to some coefficient “m”: it will be, like

This will still represent same straight line, however distance evaluation used by LSO will be increased proportionally to argument “m”, so LSO at first place will take into consideration larger (a, b, c), and only than smaller.

If you remember that code sample above, when I’ve calculated (a,b) — I didn’t made a normalization for that vector intentionally, and as result vector (a,b) length is equal to length of optical flow vector length, so longer optical flow vector — it has more vote power in the LSO calculation.

Optical flow vector magnitude threshold

Regarding cutting off optical flow vector based on magnitude, it might seem good idea, however during my tests in most cases I didn’t found significant difference, however in one case I’ve noticed a difference, when I had a lot of road traffic moving in opposite direction to me, and these traffic generated many optical flow vectors with high magnitude, so many smaller vector with right direction were not taken into account even that it was majority of them. Since we already using optical flow vector magnitude as vote power in calculation, the cutoff threshold is already redundant.

And regarding that opposite directed traffic, to suppress that noise, the best we can do is to utilize Law of large number, which states that the average of the results obtained from a larger number of independent and identical random samples converges to the true value, if it exists. So to increase accuracy instead of cutting of, I’ve decided to increase number of optical flow vector in calculation, and didn’t do any cutoff at all.

https://en.wikipedia.org/wiki/Law_of_large_numbers

Initially I’ve tried to normalize all optical flow vector, and the result were unsatisfying, almost on all video, since number of outliers is quite significant, even if it is not majority, it was still impacting accuracy of calculation.

However when I used optical flow vector magnitude for calculation power, without any cutoff, the accuracy for calculation increased, and essentially for that video samples with opposite directed traffic.

Further I’ve found few more approaches, how I can increase accuracy of calculation, and it will be described in next chapter bellow.

Law of large numbers and Gaussian binomial distribution

I’ve already touched the topic of Law of large numbers let me present the squared gaussian binomial distribution from my calculations. The histogram bellow, represent data from the first pass of Optical Flow LSO. And to build buckets for histogram, I’ve used squared distance of projected line by optical flow vector to found point.

https://en.wikipedia.org/wiki/Law_of_large_numbers

Axis of the chart are described bellow, and to calculate x-bucket of values, I’ve used “d” variable from the formula bellow.

ax + by + c = d

  • x — axis is our deviation in squared pixels, it is like if we have an optical flow vector, it is the closed squared distance between lines projected by that optical flow vector to that point that we just found with the LSO method.
  • y — axis is our percentage, like frequency of how many such deviated optical flows we have.

As you can see from that chart bellow, the highest percentage of the data has smaller deviation and amount of such data decreasing with higher deviation, that proves that Law of large numbers is applicable here. Typically Gaussian binomial distribution has recognizable bell shape, in my LSO I am using squared value, so we can observe only positive half of that bell and it is squared.

Optical flow angle deviation

Since we looking for an intersection of many lines projected by optical flow vector, one of the metrics to calculate if that optical flow vector is outlier, is to calculate an angle between optical flow vector and vector directed from correspondent pixel to point of projected lines intersection. If you remember optical flow are directed in opposite direction to where car is moving to, so ideally the angle should be closer to 180 degrees.

The easiest way to get angle is to use dot vector product. Normalization for vectors needed to get angle accurately.

https://en.m.wikipedia.org/wiki/Dot_product

The code for filtering outlier by angle is bellow.

Meaning of variables:

  • (qvx[i], qvy[i]) — is i-th optical vector
  • (qx[i], qy[i]) — is pixel coordinate of i-th optical vector
  • (c_x, c_y) — point of optical flow projected lines intersection

The calculation briefly explained:

  1. To calculate angle accurately, I am normalizing my vectors first
  2. As 1st vector — I am taking optical flow vector
  3. As 2nd vector — I am taking vector from optical flow pixel to point of line intersection
  4. As threshold — instead of angle, the cos of angle is used, that is also result of dot product.
  5. Function is parametrized with percentiles, instead of filtering to some hardcoded angle, it will filter out certain percentile of deviated data for further calculation
def vector_len(v_x, v_y):
return tf.sqrt(v_x*v_x + v_y*v_y)

def vector_normalize(v_x, v_y):
v_len = vector_len(v_x, v_y)
return (v_x/v_len, v_y/v_len)

def filter_wrong_directed(q, c, percentile):

v1_x, v1_y = vector_normalize(q["qvx"], q["qvy"])
v2_x, v2_y = vector_normalize((q["qx"] - c["c_x"]), (q["qy"] - c["c_y"]))


direction_cos = (v1_x * v2_x + v1_y * v2_y)

direction_cos = tf.boolean_mask(direction_cos, tf.math.is_finite(direction))

threshold = tfp.stats.percentile(direction, percentile).numpy()

if plot_debug_mode:
print("Direction histogram")
plt.hist(direction_cos.numpy(), bins=50, color='skyblue', edgecolor='black', density=True)
plt.show()

accuracy_index = tf.where(direction_cos > threshold)

return {
"thr": threshold,
"qx": tf.gather_nd(q["qx"], accuracy_index),
"qy": tf.gather_nd(q["qy"], accuracy_index),
"qvx": tf.gather_nd(q["qvx"], accuracy_index),
"qvy": tf.gather_nd(q["qvy"], accuracy_index)
}

Cycle of diversified models for outlier reduction

I always excited about philosophical concept proposed by Hegel about thesis–antithesis–synthesis, and ancient wise Chinese philosophy noticed same nature of meta-physics by expression this in Yin and Yang concepts.

This concept is often used in engineering to create motion or to describe cycles of evolution, and in my case I also have found an application for that to increase calculation accuracy by iterating through evaluation cycles with use of different alternating models like Yin and Yang.

I’ve used two different mathematical model to describe same mathematical problem, where both models could be used to describe deviation and mean in different way, and then I’ve used them by alternating one approach after another in a cycle, the precision is greatly increasing with each evaluation.

Initially I was trying to use single model by taking mean and deviation only from that formula:

ax+bx+cx=d

However using same formula for mean and deviation to sort-out outlier, mean almost never improved with such evaluation.

However when I’ve started using second model, by using angle for deviation and outlier sorting-out, each time mean with use of LSO and ax+bx+cx=d accuracy increased with every cycle of evaluation.

The way how to filter-out outliers with angle deviation, I’ve described before:

def filter_wrong_directed(q, c, percentile):
....

Code that handles evaluation cycles provided bellow, and listing of reused method you can find above:

def find_move_direction_by_flow_tf_v2(
flow,
direction_percentile = 30,
stop_dot_value = 0.95
):
q = extract_flow_points_tf(flow)
l = list_of_lines_tf(q)
c = find_move_direction_by_lines_tf(l)

r = None

while True:
q = filter_wrong_directed(q, c, direction_percentile)
l = list_of_lines_tf(q)
c = find_move_direction_by_lines_tf(l)

r = {
"c": c,
"q": q
}

thr = q["thr"]
if thr > stop_dot_value:
break

return r

As you could notice here, instead of using angle threshold, I am using percentile here, so my angle threshold is dynamic, and with every new evaluation of a cycle by taking percentile, the angle threshold is decreased narrowing down calculation only to good optical flow vectors. And with use smaller percentile by running more numbers of iterations, I was able to increase calculation accuracy significantly.

Empirically I’ve found that with filtering out only 10% — I was able to get the best accuracy, so no reason to go less than 10%, however with 30% — it required smaller number of iteration, and 30% offers best balance between performance and calculation accuracy. If something will not be accurate with 30% comparing to 10% — this noise will still be removed during post-processing phase described bellow.

Here is series of video frames, for each iteration of that cycle. This visualization presents which optical flow remains on each iteration.

1st iteration, we considering every optical flow for calculation

2nd iteration, 30% of optical flow vectors were dropped

4th iteration — keep dropping little bit more of optical flow vectors

Here we done with all iteration, cleaned almost every optical flow vector noise keeping only quality one, and 6th is the last iteration:

Camera calibration

And now the last part for noise suppression.

Since we processing video, we can imagine that we will have stream of point calibrations. And the way how good car direction point is calculated depends on many factors, like:

  • Car speed
  • Street brigthness and video contrast
  • Amount of visible object that forms optical flow vectors
  • Accuracy of optical flow algorithm
  • Object that has it’s own velocity — like other cars
  • Car rocks and shakes
  • Change of car direction, road turns.

As part of camera calibration, while camera is most of the time mounted statically, the stream of calculated point needs to be post-processed to get actual car camera alignment.

The logic for post-processing is quite straight forward:

  1. All the time we keeping couple second of best data
  2. To get the calibrated value, we take 30 percentile of best quality data and taking mean of that data
  3. Once new data point is arrived, we replacing the worst value with new value.
  4. In the result — history will keep only the best quality data and mean of 30 percentile will be used as result for cutting-off all outliers.
def sort_by_deviation(data, mean = None):
if mean is None:
mean = np.mean(data)

deviation = (data - mean)
deviation = deviation * deviation
indices = np.argsort(deviation)
sorted_data = data[indices]
return sorted_data
def smooth_calib(data):
result = np.copy(data)
prev_data = sort_by_deviation(data[:100])
result[:100] = np.mean(prev_data[:30])
for i in range(101, len(data)):
prev_data[-1] = data[i]
prev_data = sort_by_deviation(prev_data)
result[i] = np.mean(prev_data[:30])
return result

So here you can see few different example of car point calculated, before and after post-processing

1st sample: best quality achieved during LSO

Before post processing, noise in range was: 0.0 to 0.1

After post processing, noise range reduced x3 times: 0.030 to 0.035

2nd sample: Car turning to the right and then to the left somewhere between 30 and 35 seconds.

Before post processing: noise range was: 0.0 to 0.2

After post processing, noise range reduced x10 times: 0.02 to 0.04

3nd sample: Worst quality captured by LSO:

Before post processing, noise range was: -0.6 to 0.4

After post processing, noise range reduced x50 times: -0.01 to 0.03

Averaging

Averaging of data is another way to post-process the data. The ideas is to take a mean of last n-points.

def smooth_data(data, n):
result = np.copy(data)

if n <= 1:
return result

for i in range(len(data)):
arr_st = i - int(n /2)
if arr_st < 0:
arr_st = 0
arr_end = arr_st + n
if arr_end > len(data):
arr_end = len(data)
arr_st = arr_end - n
if arr_st < 0:
arr_st = 0
if arr_st == arr_end:
break
result[i] = np.sum(data[arr_st: arr_end]) / (arr_end - arr_st)
return result

It is not that good for camera calibration, however it quite accurately can say about direction of the car essentially when car changing it direction of movement, please see chart and video bellow:

In the beginning of the video you can notice, that direction cursor slightly shifted to the left, if you will take close look precisely where left lane overlapped by car hood pixels, you would notice that driver is slightly shifting his car to the left for couple seconds before he makes a right turn on the ramp, during the right turn direction cursor is moved to the right, so cursor in averaging mode is quite accurately demonstrates where car wheels turning the car on that video.

Conclusion

At this point we done with all four steps for camera calibration:

  • We generated optical flow from video frames
  • We used LSO to find car motion direction
  • We used angle deviation to drop some optical flow vectors outlies, to increase accuracy of LSO calculation.
  • And finally we used post-processing to do the actual calibration

As you can see the approach is very logical and mathematically grounded. It provides certain level of calculation accuracy, and I’ve explained few places where it can be improved even more.

Besides theoretical grounding — provided video also demonstrates feasibility of approach.

ML or not ML

Typically with Machine Learning and Neural Network image processing capabilities can solve this camera calibration problem. And NN itself is very accurate but slow way to calculate Optical Flow.

Which approach is actually better ML or this elegant solution describe above?

There are pros and const for both.

The main disadvantage of Neuron Network, which is actually an advantage at the same time, it could learn only from what it’s seen before. The advantage that, if there is any problem with NN model, fixing problem is as simple as retraining by feeding few more video where NN made a mistake.

Using SLO with Optical Flow — it doesn’t requires any video to learn, this algorithm supposed to work in any condition, since it doesn’t requires learning at all, which is good. As a cons each algorithm has it’s own limitation. With proper dedication it could be improved, however it might not be as that simple as just by feeding more videos with NN. However this algorithm can show much better resilience for very unusual condition with NN, since it uses very definite logic for calculation.

For reliability it worth considering redundancy, if system built of multiple difference approaches, where evaluation done by feeding results of multiple models, it would lead to increasing of the overall system resiliency and it could be better than just depending on single approach.

About Author

https://dmytrobrazhnyk.wordpress.com

--

--