On May 1, 2017, I asked myself the question: Can I learn the necessary computer science to build the software part of a self-driving car in one month?

On May 22, 2017, after 26 hours of learning and coding, I found out that the answer was yes.

During the month of May, I documented my entire learning process in a series of 31 daily blog posts, which are compiled here into a single narrative. In this article, you can relive my month of insights, frustrations, learning hacks, and triumphs, as I strive towards monthly mastery.

Today, I start a new month and a new challenge: Can I learn the necessary computer science to build the software part of a self-driving car in one month?

Defining success

To fully build a self-driving car, I would need to build 1. Self-driving car software, and 2. Self-driving car hardware. The software would use sensor and input data to algorithmically generate driving instructions, and the hardware would execute these driving instruction within the actual car.

Since much of the technical challenge is actually in the software part (and also because I don’t own a car), I will be exclusively focusing on the software part for this month’s challenge.

In particular, I want to build self-driving software that can do two things:

  1. Based on video input of the road, the software can determine how to safely and effectively steer the car.
  2. Based on video input of the road, the software can determine how to safely and effectively use the car’s acceleration and braking mechanisms.

I may attempt to tackle each of these pieces separately or together. I’m not sure yet. I’m also not completely sure the specific details of each of these sub-challenges, but I will flesh these details out more seriously once I’ve done some initial research.

There may be other important considerations for self-driving car software that aren’t included in these two buckets, but these are the items that I will be focusing on this month. From these items, it’s clear that I’m mainly looking to learn how to use machine learning/deep learning to solve computer vision problems (I will explain what this all means in a future post).

My starting point

Building self-driving car software clearly requires some amount of computer science knowledge, and, in this regard, I’m not starting from scratch.

Firstly, my degree from Brown is in math, which is quite helpful for this particular branch of computer science. I also have a general coding/computer science background, which will certainly help me.

Most interestingly, last summer, I published a few pieces of fan-fiction that I generated using a relevant machine learning technique — like this AI-written Harry Potter chapter — but this was mostly a testament to the availability of high-quality open-source code, and not my machine learning knowledge.

Last summer, I also took a math-based course on deep learning (deep learning is a sub-category of machine learning… which is a subcategory of artificial intelligence… which is a subcategory of computer science). This course was interesting, but it was purely theoretical, not practical.

This month, I’m heavily focused on application, not theory, and on computer vision, which is something I have zero experience with.

Basically, I have some foundational experience, but not enough experience to know where to start (this will require some research over the next few days).


Anyway, I have no reasonable estimate of how hard this is actually going to be, so this will certainly be a fascinating month. I’m excited to get started…

The Linear Method of Learning

When attempting to understand a broad field of study (like the underlying computer science of self-driving cars), it’s often difficult to know where the right entry point is.

As a result, most people assume that the best path forward is to start with the basics, build up a general foundation of knowledge, and then continue towards finer and finer levels of detail.

I call this the Linear Method of Learning.

Using the linear method to learn the computer science of self-driving cars would look something like this:

  1. Learn multivariable calculus
  2. Learn linear algebra
  3. Learn basic computer science fundamentals
  4. Learn about general machine learning concepts
  5. Learn about computer vision concepts
  6. Learn how to code in Python (a coding language commonly used for machine learning)
  7. Learn how to use TensorFlow (a special machine learning library for Python)
  8. Learn how computer vision is applied to create self-driving car software
  9. Learn how to write Python and TensorFlow code to build the relevant programs
  10. Etc…

While this method may eventually work, it’s inefficient and probably not effective.

Firstly, if I start by learning multivariable calculus, how do I know which parts of multivariable calculus are relevant to self-driving cars and which parts aren’t? I don’t. So, I’ll have to learn all of it. Same for linear algebra, and computer science fundamentals, etc.

In other words, if I start with the most general pieces of knowledge, I have no way to prioritize what I’m learning, and so, I’m ultimately forcing myself to learn everything just in case.

Additionally, because I’m first learning the foundational concepts in a general, abstract sense, it’s much harder for me to relate what I’m learning to things I already know. Since effective learning is essentially figuring out how to attach new pieces of information to currently existing knowledge in a meaningful way, the Linear Method of Learning also fails in this regard.

So, while most people approach learning in this linear fashion, it’s a pretty poor method to actually learn anything in a reasonable timeframe.

The V-Method of Learning

Instead, I use a different method, which I call the V-Method of Learning.

Here’s how the V-Method of Learning works:

  1. I start with a specific, well-documented example of my end goal
  2. I try understand how this example works
  3. For everything I don’t understand about the example, I research the underlying concepts
  4. If I don’t understand the underlying concepts, I research the underlying concepts of the underlying concepts, until I feel I’ve exhausted this path (either by reaching understanding or by reaching a point of diminishing returns)
  5. Eventually, I pigeonhole down enough different paths to start seeing patterns in the important underlying concepts
  6. I study these relevant underlying concepts, slowly working my way up the knowledge chain, until I’m back at the level of detail of the original example
  7. Finally, I reproduce the example based on my new hierarchical knowledge

I call this the “V-Method” because I start at the finest level of detail, dive deep towards the directly-applicable foundational concepts, and then work my way back up towards the finest level of detail — a conceptual V.

The V-Method is much more effective than the Linear Method because I’m able to 1. Learn in the order of relevance to my ultimate goal, 2. Learn the foundational concepts in the context of something tangible, and 3. Build and organize my knowledge in a hierarchical, interrelated way.

As a result, this method is much more efficient, effective, and engaging.

So, here’s how I plan to apply the V-Method to this month’s challenge:

  1. Look for sample, open-source self-driving car code on Github (Github is a popular repository for code, which basically means I can find a lot of other people’s software projects there)
  2. Work my way line-by-line through the code
  3. For every line of code I don’t understand at an intuitive level (which will be most of them), begin my descent through the layers of underlying concepts
  4. Identify patterns in what I’m constantly looking up / researching and determine the most important foundational concepts
  5. Study these foundational concepts
  6. Work my way back up the layers of underlying concepts until I can effectively explain to myself each line of code from the sample Github project

If this still sounds a bit confusing, hopefully it will start making more sense once I actually start.

My first step is to search Github for a good sample project…

Yesterday, I introduced the primary method I use for learning new technical skills, which I call the V-Method. Using this method, I start my studies with a highly-specific example (that should closely simulate my desired end result), and use this as an entry point to learn the relevant underlying concepts in a tangible, organized, and hierarchical way.

Therefore, today, my goal was to get some code running on my computer that I may be able to use for my self-driving car.

Finding some code

After Googling around a little bit, I found a project on Github that suited my needs well. The code takes an input image of the road, and attempts to identify where the lane lines are.

So, from this…

To this…

My goal for today was to try to replicate this result with the code running on my own computer.

Getting set up

Before I could run any code, I needed to make sure my computer was set up with the appropriate software libraries. In particular, I needed to install the numpy, matplotlib, and OpenCV libraries for Python.

After getting oriented in Terminal (the command line on Mac) and finding some instructions online, I ran into my first error…

Rather than trying to figure out exactly what this error means or how to fix it myself, I used the most effective debugging technique I know: I copied and pasted the entire error message into Google.

I clicked on the third link and found this answer:

After running these few commands (by copying and pasting them into Terminal and clicking “Enter”), everything seemed to work properly.

I was officially all set up (at least for now).

Running the code

Now that I was set up, it was time to run the code. After using Google again to augment my limited Terminal knowledge, I got the code to run, and nothing seemed to break.

I got this output…

Cool! So, these numbers are essentially the mathematical representation of the two lane lines.

So far, so good. But, where are the visuals?

In the Github project I was trying to replicate, the code also outputted these nice plots…

As well as the image with the red overlays…

Sadly, my code wasn’t outputting either of these, nor was it saving any images to my local directory.

So, once again, I turned back to Google, and searched “save image python”, in hopes of figuring out how to save an image of the output.

Google nicely told me to use the function cv2.imwrite(), so I did, and it worked. And by “worked”, I mean… I was able to save a gray scale image of the photo with the lane lines visualized in white.

And here’s another…

And one more…

Now, what?

This is a good start.

Basically, once I can identify the lane lines effectively, I can use this info to teach my self-driving car how to steer (within the lane lines). Also, since video is just a collection of many photos, processing video should work in the same way (as long as I can figure out how to break apart a video into photos in real-time for processing).

Tomorrow, since the code is more or less working, I will try to go through the project line-by-line and start uncovering how it actually works.

Until then, the lesson is this: If you are willing to accept that you often don’t have all the answers, but are willing to Google around and experiment a little bit, you can make progress anyway.

Sure, I don’t have a strong conceptual understanding yet, but I now have a functional example that I can use as my starting point.

Yesterday, I figured out how to identify lane lines in a forward-facing image of the road. Well… I at least figured out how to run code that could do this.

The output from yesterday

Truthfully, I didn’t understand how the code actually worked, so today I tried to change that.

Below is the main block of code I used yesterday. In particular, I’ve copied the primary function, which is called “draw_lane_lines”. Basically, a function is a block of code that takes some input (in this case a photo), manipulates the input in some way, and then outputs the manipulation (in this case the lane lines).

This primary function uses some other helper functions defined elsewhere in the code, but these helper functions are mostly just slightly cleaner ways of consuming the pre-made functions from the libraries I downloaded yesterday (like OpenCV, for example).

def draw_lane_lines(image):
imshape = image.shape

# Greyscale image
greyscaled_image = grayscale(image)

# Gaussian Blur
blurred_grey_image = gaussian_blur(greyscaled_image, 5)

# Canny edge detection
edges_image = canny(blurred_grey_image, 50, 150)

# Mask edges image
border = 0
vertices = np.array([[(0,imshape[0]),(465, 320), (475, 320),
(imshape[1],imshape[0])]], dtype=np.int32)
edges_image_with_mask = region_of_interest(edges_image,
vertices)

# Hough lines
rho = 2
theta = np.pi/180
threshold = 45
min_line_len = 40
max_line_gap = 100
lines_image = hough_lines(edges_image_with_mask, rho, theta,
threshold, min_line_len, max_line_gap)
# Convert Hough from single channel to RGB to prep for weighted
hough_rgb_image = cv2.cvtColor(lines_image, cv2.COLOR_GRAY2BGR)

# Combine lines image with original image
final_image = weighted_img(hough_rgb_image, image)

return final_image

The bolded comments are descriptions of the main parts of the image processing pipeline, which basically means these are the seven manipulations performed sequentially on the input image in order to output the lane lines.

Today, my goal was to understand what each of these seven steps did and why they were being used.

Actually, I only focused on the first five, which output the mathematical representation of the lane lines. The last two manipulations just create the visuals so us humans can visually appreciate the math (in other words, these steps aren’t necessary when a self-driving car is actually consuming the outputted data).

Thus, based on my research today, I will now attempt to explain the following sequence of image processing events: Input image → 1. Greyscale image, 2. Gaussian Blur, 3. Canny edge detection, 4. Mask edges image, 5. Hough lines → Lane line output

Input image

Here’s the starting input image.

It’s important to remember that an image is nothing more than a bunch of pixels arranged in a rectangle. This particular rectangle is 960 pixels by 540 pixels.

The value of each pixel is some combination of red, green, and blue, and is represented by a triplet of numbers, where each number corresponds to the value of one of the colors. The value of each of the colors can range from 0 to 255, where 0 is the complete absence of the color and 255 is 100% intensity.

For example, the color white is represented as (255, 255, 255) and the color black is represented as (0, 0, 0).

So, this input image can be described by 960 x 540 = 518,400 triplets of numbers ranging from (0, 0, 0) to (255, 255, 255).

Now that this image is just a collection of numbers, we can start manipulating these numbers in useful ways using math.

1. Greyscale image

The first processing step is to convert the the color image to greyscale, effectively downgrading the color space from three-dimensions to one-dimension. It’s much easier (and more effective) to manipulate the image in only one-dimension: This one dimension is the “darkness” or “intensity” of the pixel, with 0 representing black, 255 representing white, and 126 representing some middle grey color.

Intuitively, I expected that a greyscale filter would just be a function that averaged the red, blue, and green values together to arrive at the greyscale output.

So for example, here’s a color from the sky in the original photo:

It can be represented in RGB (red, green, blue) space as (120, 172, 209).

If I average these values together I get (120 + 172 + 209)/3 = 167, or this color in greyscale space.

But, it turns out, when I convert this color to greyscale using the above function, the actual outputted color is 164, which is slightly different than what I generated using my simple averaging method.

While my method isn’t actually “wrong” per se, the common method used is to compute a weighted average that better matches how our eyes perceive color. In other words, since our eyes have many more green receptors than red or blue receptors, the value for green should be weighted more heavily in the greyscale function.

One common method, called colometric conversion uses this weighted sum: 0.2126 Red + 0.7152 Green + 0.0722 Blue.

After processing the original image through the greyscale filter, we get this output…

2. Gaussian Blur

The next step is to blur the image using a Gaussian Blur.

By applying a slight blur, we can remove the highest-frequency information (a.k.a noise) from the image, which will give us “smoother” blocks of color that we can analyze.

Once again, the underlying math of a Gaussian Blur is very basic: A blur just takes more averages of pixels (this averaging process is a type of kernel convolution, which is an unnecessarily fancy name for what I’m about to explain).

Basically, to generate a blur, you must complete the following steps:

  1. Select a pixel in the photo and determine it’s value
  2. Find the values for the selected pixel’s local neighbors (we can arbitrarily define the size of this “local region”, but it’s typically fairly small)
  3. Take the value of the original pixel and the neighbor pixels and average them together using some weighting system
  4. Replace the value of the original pixel with the outputted averaged value
  5. Do this for all pixels

This process is essentially saying “make all the pixels more similar to the pixels nearby”, which intuitively sounds like blurring.

For a Gaussian Blur, we are simply using the Gaussian Distribution (i.e. a bell curve) to determine the weights in Step 3 above. This means that the closer a pixel is to the selected pixel, the greater its weight.

Anyway, we don’t want to blur the image too much, but just enough so that we can remove some noise from the photo. Here’s what we get…

3. Canny edge detection

Now that we have a greyscaled and Gaussian Blurred image, we are going to try to find all the edges in this photo.

An edge is simply an area in the image where there is a sudden jump in value.

For example, there is a clear edge between the grey road and the dashed white line, since the grey road may have a value of something like 126, the white line has a value close to 255, and there is no gradual transition between these values.

Again, the Canny Edge Detection filter uses very simple math to find edges:

  1. Select a pixel in the photo
  2. Identify the value for the group of pixels to the left and the group of pixels to the right of the selected pixel
  3. Take the difference between these two groups (i.e. subtract the value of one from the other).
  4. Change the value of the selected pixel to the value of the difference computed in Step 3.
  5. Do this for all pixels.

So, pretend that we are only looking at the one pixel to the left and to the right of the selected pixel, and imagine these are the values: (Left pixel, selected pixel, right pixel) = (133, 134, 155). Then, we would compute the difference between the right and left pixel, 155–133 = 22, and set the new value of the selected pixel to 22.

If the selected pixel is an edge, the difference between the left and right pixels will be a greater number (closer to 255) and therefore will show up as white in the outputted image. If the selected pixel isn’t an edge, the difference will be close to 0 and will show up as black.

Of course, you may have noticed that the above method would only find edges in the vertical direction, so we must do a second process where we compare the pixels above and below the selected pixel to address edges in the horizontal direction.

These differences are called gradients, and we can compute the total gradient by essentially using the Pythagorean Theorem to add up the individual contributions from the vertical and horizontal gradients. In other words, we can say that the total gradient²= the vertical gradient²+ the horizontal gradient².

So, for example, let’s say the vertical gradient = 22 and the horizontal gradient = 143, then the total gradient = sqrt(22²+143²) = ~145.

The output looks something like this…

The Canny Edge Detection filter now completes one more step.

Rather than just showing all edges, the Canny Edge Detection filter tries to identify the important edges.

To do this, we set two thresholds: A high threshold and a low threshold. Let’s say the high threshold is 200 and the low threshold is 150.

For any total gradient that has a value greater than the high threshold of 200, that pixel automatically is considered an edge and is converted to pure white (255). For any total gradient that has a value less than the low threshold of 155, that pixel automatically is considered “not an edge” and is converted to pure black (0).

For any gradient in between 150 and 200, the pixel is counted as an edge only if it is directly touching another pixel that has already been counted as an edge.

The assumption here is that if this soft edge is connected to the hard edge, it is probably part of the same object.

After completing this process for all pixels, we get an image that looks like this…

4. Mask edges image

This next step is very simple: A mask is created that eliminates all parts of the photo we assume not to have lane lines.

We get this…

It seems like this a pretty aggressive and presumptuous mask, but this is what is currently written in the original code. So, moving on…

5. Hough lines

The final step is to use the Hough transform to find the mathematical expression for the lane lines.

The math behind the Hough transform is slightly more complicated than all the weighted average stuff we did above, but only barely.

Here’s the basic concept:

The equation for a line is y = mx + b, where m and b are constants that represent the slope of the line and the y-intercept of the line respectively.

Essentially, to use the Hough transform, we determine some 2-dimensional space of m’s and b’s. This space represents all the combinations of m’s and b’s we think could possible generate the best-fitting line for the lane lines.

Then, we navigate through this space of m’s and b’s, and for each pair (m,b), we can determine an equation for a particular line of the form y = mx + b. At this point, we want to test this line, so we find all the pixels that lie on this line in the photo and ask them to vote if this is a good guess for the lane line or not. The pixel votes “yes” if it’s white (a.k.a part of an edge) and votes “no” if it’s black.

The (m,b) pair that gets the most votes (or in this case, the two pairs that get the most votes) are determined to be the two lane lines.

Here’s the output of the Hough Transform…

I’m skipping over the part where, rather than using the formula y = mx + b to represent the line, the Hough transform instead uses a polar coordinates- / trigonometric- style representation that uses rho and theta as the two parameters.

This distinction isn’t super important (for our understanding), since the space is still being parametrized in 2-dimensions, and the logic is exactly the same, but this trigonometric representation does help with the fact that we can’t express completely vertical lines with the y = mx + b equation.

Anyway, this is why rho and theta are being used in the above code.

The Final Output

And we are done.

The program outputs the two parameters to describe the two lane lines (interestingly, these outputs are converted back to the m, b parametrization). The program also provides the coordinates of each lane line’s end points.

Lane line 1

Slope: -0.740605727717; Intercept: 664.075746144

Point one: (475, 311) Point two: (960, 599)

Lane line 2

Coef: -0.740605727717; Intercept: 664.075746144

Point one: (475, 311) Point two: (0, 664)

Overlaying these lines on the original image, we see that we’ve effectively identified the lane lines, using basic math operations.

Yesterday, I deconstructed a piece of code that identifies lanes lines in forward-facing images of the road.

Like this…

Perhaps more interestingly than the picture, this block of code generates the mathematical representations of the lane lines using only very basic mathematical operations (essentially a string of functions that find weighted averages):

Lane line 1 = Slope: -0.740605727717; Intercept: 664.075746144

Lane line 2 = Coef: -0.740605727717; Intercept: 664.075746144.

Going through this exercised helped me better intuit the underlying mechanics of a self-driving car, which don’t seem quite as mystical anymore.

Based on what I’ve experimented with so far, it seems like there are two main steps to creating self-driving car software:

Step 1: Manipulate the input image into a set of useful numeric representations of the driving environment (i.e. lane lines, other cars, traffic signs, pedestrians, etc.)

Step 2: Feed this numeric representation of the world into a function that, based on these inputs, computes the correct steering angle and acceleration.

I suspect as the function in Step 2 improves (as increasing amounts of computational power become available in the future), the pre-processing and object categorization in Step 1 become less and less important.

Thus, it seems I should shift my attention away from Step 1, which focuses largely on object recognition and digitization, and instead focus the majority of my attention on the function from Step 2, which maps inputs from the world into driving instructions. I’ll call this function the “Magic Function”.

This seems like a reasonable plan, especially because the image processing techniques I described yesterday were invented decades ago, where as the Magic Function seems to have only recently become feasible.

I’m not sure if this analysis is right, or this path is optimal, but this is my intuition right now based on what I’ve researched so far and on the exercise from the past few days.

Starting tomorrow, I will start exploring how a Magic Function might work and how I might be able to build it.

Yesterday, I realized that I needed to shift my focus and devote my efforts to building some version of the Magic Function, which is the mathematical algorithm that maps input images of the road into explicit driving/steering instructions for the self-driving car.

Today, I found a 223GB dataset that was open-sourced by Udacity (a company that makes online courses about technical topics, including self-driving cars), which contains exactly what I need:

Udacity took a video from a dash cam, broke it down into individual frames, and then labelled each frame with the corresponding steering angle that the human driver was actually executing.

For example, frame: 1479425444933328937; steering angle: -0.346924703940749

As a result, this dataset is perfect to create and test the Magic Function — I can ask the Magic Function to guess the steering angle based on the frame of video, and then I can compare the guess to the actual recorded steering angle.

If the guess is far from the actual value, I can update the function and retest it. Theoretically, I can then use the computer to help me iterate on this process thousands of times, and eventually find a reasonable way to accurately steer the car.

Since I only “theoretically” know how to do this right now, I returned to Github today to see if I could find anyone making use of Udacity’s dataset in this way.

I found a few projects, and hoped that I could get them up and running fairly quickly. After about two hours of trying though, I continued to get fatal errors in Terminal — not even from running the code, but just from trying to set up my computer with the libraries needed to run the code in the first place.

A few days ago, I made it seem like whenever Terminal throws an error, it’s trivially easy to just Google the answer and proceed forward. While this was my experience a few days ago, this experience probably isn’t the norm, and it certainly wasn’t today.

So despite all my efforts, I effectively made zero progress today, which is frustrating but also part of the game.

Tomorrow, I will continue to play this game and see if I can get my computer to cooperate. For now, I’ll remain hopeful…

Sometimes, I think building software is the cruelest form of punishment. Today was one of those days.

Unlike other pursuits, where, even when things are tough, I can make slow steady progress (or at least learn something valuable from my failures), trying to get my computer to successfully run a few lines of code is like banging my head against the wall, hoping that, if I do it for long enough, something will eventually change.

When venturing into new coding territory, here is the common plot arc: 1. Spend two hours trying every possible permutation of code / environment setup to see if some random combination will work, 2. Nothing works and you realize that you’re further away from your goal than when you started (because you messed around with so much stuff), 3. Keep trying for another hour, 4. Magically, the stars align and some inexplicable combination of things makes everything work.

The problem with this model is that… you need to grit your teeth and suffer through the pain for an unknown amount of time until you magically reach nirvana on Step 4. You don’t know when nirvana is coming nor do you believe that it will ever come, but you have to keep going on the off-chance that it might. There is no gradual pay-off, just soul-sucking frustration until it magically works.

And then, when it works, nothing is more blissful. All the pain is worth it for this one moment, assuming this one moment comes.

Today, I was barely hanging on.

I’m deep into an area I have little grounding in, and as a result, I felt lost and helpless for much of the day.

In these cases, it’s often useful to rely on video tutorials or online courses, but I haven’t found an accessible tutorial (or any tutorial) yet that would help ground me.

So instead, I spent around 2.5 hours today drowning in the Coding Pit of Despair. It was brutal and confidence-threatening, and I wanted to admit defeat many times today, but I was able to hold on.

Somehow, I found my way towards the end of the day, and it seems I’ve made tangible progress. I can’t say for sure though until tomorrow morning: My computer is currently working away, and will probably keep working through the night.

Tomorrow, I hope I wake up to a successfully run program.

Last night, after drowning in the “Coding Pit of Despair” for a few hours, I finally made some forward progress.

As a reminder, over the past few days, I’ve been trying to find and run code that can generate steering instructions (for my self-driving car) based on input images of the road.

After many hours of searching and playing around, I found an academic paper written by researchers at NVIDIA (a company that makes self-driving car hardware and software, amongst other things).

As described in the Abstract of the paper, their devised system can “map raw pixels from a single front-facing camera directly to steering commands”.

This is exactly what I need!

I then found a TensorFlow implementation of NVIDIA’s system on Github, and after a few attempts, I was actually able to “train the model” based on some of NVIDIA’s data.

(Update: This isn’t actually NVIDIA’s data, but rather a dataset produced by Sully Chen. He collected the data by duct-taping a webcam to the windshield of his car and capturing the ‘drive data’ from his car’s CAN-BUS port).

A quick aside to clarify some terms: 1. “The model” is the function that describes how to convert pixels into steering instructions, and 2. “To train the model” means to iteratively improve this function using machine learning techniques. I’ll explain this process in more detail in a future post.

Anyway, NVIDIA’s dataset includes 25 minutes of video broken down frame-by-frame, where each frame is labelled with the true steering angle (i.e. the steering angle the human driver was using).

Frame 45,522

In Terminal, I ran the program to train the model, and, despite some warnings, it started working:

In this screenshot, a “Step” describes each time a portion of the data is fed through the system for training. An “Epoch” is just a broader category that contains multiple Steps.

To train this model, I used 30 Epochs with a few dozen Steps per Epoch.

In the screenshot, the “Loss” describes how accurate the model (or function) is. Conceptually, to calculate Loss, the true steering angle is compared with the steering angle predicted by the model. The larger the difference, the greater the Loss.

Ultimately, when training the model, the program uses a few mathematical tricks (which I’ll describe in a future post) to try to reduce the Loss via each iterative step.

Thus, “training the model” is just “reducing the Loss”.

Here’s a plot with Steps on the X-axis and Loss on the Y-axis. (Last night, using something called Tensorboard, my computer plotted this while it was training).

At Step 126, for example, the Loss had a value of 5.708.

While, at Step 3,241, almost six hours later, the Loss had a significantly better value of 0.1615.

Now that the training has officially completed, the model is now theoretically ready to steer a car, which is super cool.

When I got home today from work, I tried to test the model (a.k.a. “steer a car”) and see how it performed. Sadly, when I tried to run the program, I got this error…

I spent a good hour trying to overcome this problem, but it seems like some of the code I downloaded can’t be run on a Mac (and if it can, I could not find a way to make it work).

I have an idea for a workaround, but it will have to wait until tomorrow.

In the meantime, I can celebrate the fact that I (likely) have a reasonably functional, already trained self-driving car model ready to be used. Hopefully, tomorrow, I can figure out how to actually use it…

Yesterday, I figured out how to train my self-driving car, but I struggled to confirm that the training was actually effective.

Today, I quickly realized that the part of the program that wasn’t working only had to do with visualization. In other words, I was able to delete all the visualization code, and still successfully output the real-time steering commands for the car.

While these numbers are enough to get a self-driving car to execute the instructions, as as human, this output is challenging to appreciate.

Luckily, a few days ago, I figured out how to save individual processed frames to my local machine.

So, I decided to output from the program the individual frames of input video and the predicted steering wheel animation.

I then combined the individual frames, overlaid the two videos, and pressed play. Here’s the result…

I’m really excited about this!

To clarify what’s happening: The program watched the low-quality dash cam footage and then autonomously animated the steering wheel based on the self-driving model I trained yesterday. In other words, the computer is completely steering this car, and doing a pretty solid job.

The next step is to learn more about the underlying code, optimize it for general use, and then see how it performs on different datasets (i.e. on different roads). I’ll start with the Udacity data set.

I’m still not quite sure I’m ready to sleep in the back of my self-driving car just yet, but today definitely marks a big step forward.

Now that I have working self-driving car code (see the video from yesterday), over the next few days, I plan to deconstruct the code and try to understand exactly how it works.

Today, I’ll be looking specifically at “the model”, which can be consider the the meat of the code: The model defines how input images are converted into steering instructions.

I don’t have too much time today, so I won’t be describing fully how the code works (since I don’t yet know and still need to do plenty of research). Instead, I’ll make some hypotheses about what the lines of code might mean and then document the open questions that I’ll need to further research.

This will set me up to learn the material in a structured way.

Here’s the code for the self-driving model in its entirety. It’s only 50 lines of code plus comments and spaces (which is pretty nuts, since it’s driving a car and stuff…)

import tensorflow as tf
import scipy
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
def conv2d(x, W, stride):
return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding='VALID')
x = tf.placeholder(tf.float32, shape=[None, 66, 200, 3])
y_ = tf.placeholder(tf.float32, shape=[None, 1])
x_image = x
#first convolutional layer
W_conv1 = weight_variable([5, 5, 3, 24])
b_conv1 = bias_variable([24])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, 2) + b_conv1)
#second convolutional layer
W_conv2 = weight_variable([5, 5, 24, 36])
b_conv2 = bias_variable([36])
h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2) + b_conv2)
#third convolutional layer
W_conv3 = weight_variable([5, 5, 36, 48])
b_conv3 = bias_variable([48])
h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 2) + b_conv3)
#fourth convolutional layer
W_conv4 = weight_variable([3, 3, 48, 64])
b_conv4 = bias_variable([64])
h_conv4 = tf.nn.relu(conv2d(h_conv3, W_conv4, 1) + b_conv4)
#fifth convolutional layer
W_conv5 = weight_variable([3, 3, 64, 64])
b_conv5 = bias_variable([64])
h_conv5 = tf.nn.relu(conv2d(h_conv4, W_conv5, 1) + b_conv5)
#FCL 1
W_fc1 = weight_variable([1152, 1164])
b_fc1 = bias_variable([1164])
h_conv5_flat = tf.reshape(h_conv5, [-1, 1152])
h_fc1 = tf.nn.relu(tf.matmul(h_conv5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#FCL 2
W_fc2 = weight_variable([1164, 100])
b_fc2 = bias_variable([100])
h_fc2 = tf.nn.relu(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
h_fc2_drop = tf.nn.dropout(h_fc2, keep_prob)
#FCL 3
W_fc3 = weight_variable([100, 50])
b_fc3 = bias_variable([50])
h_fc3 = tf.nn.relu(tf.matmul(h_fc2_drop, W_fc3) + b_fc3)
h_fc3_drop = tf.nn.dropout(h_fc3, keep_prob)
#FCL 4
W_fc4 = weight_variable([50, 10])
b_fc4 = bias_variable([10])
h_fc4 = tf.nn.relu(tf.matmul(h_fc3_drop, W_fc4) + b_fc4)
h_fc4_drop = tf.nn.dropout(h_fc4, keep_prob)
#Output
W_fc5 = weight_variable([10, 1])
b_fc5 = bias_variable([1])
y = tf.mul(tf.atan(tf.matmul(h_fc4_drop, W_fc5) + b_fc5), 2)

Line-by-line commentary

Now, I’ll work through the code in chunks and describe what I think each chunk means/does.

import tensorflow as tf
import scipy

The first two lines are straightforward.

We’re importing the TensorFlow library (which we will refer to as “tf” elsewhere in the code) and the SciPy library. TensorFlow is a python library written by Google, which will help abstract away most of the ground-level machine learning implementations. SciPy will help with the math stuff.

Nothing new to learn here.

def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.1)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)

Okay, so here I think we are defining new objects, which basically means we can use the notion of “weight_variable” and “bias_variable” elsewhere in our code without having to redefine them ever single time.

In machine learning, the function we are trying to solve is typically represented as Wx+b = y, where we are given x (the list of input images) and y (the list of corresponding steering instructions), and want to find the best combination of W and b to make the equation balance.

W and b aren’t actually single numbers, but instead collections of coefficients. These collections are multidimensional and the size of these collections corresponds to the number of nodes in the machine learning network. (At least, this is how I understand it right now).

So, in the above code, the weight_variable object represents W and the bias_variable object represents b, in the generalized sense.

These objects take an input called “shape”, which basically defines the dimensionality of W and b.

These W and b objects are initiated with a function called “normal”. I’m pretty sure this means that… when a collections of W’s and b’s are initially created, the values of the individual coefficients should be randomly assigned based on the normal distribution (i.e. a bell curve) with a standard deviation of 0.1. The standard deviation more or less defines how random we want the initial coefficients to be.

So, surprisingly, I think I mostly understand this code. At first glance, I wasn’t sure what was going on, but writing this out helped me collect my thoughts.

What I still need to learn: I need to learn more about the Wx + b = y structure, why it is used, how it works, etc., but I understand the fundamentals of the code.

def conv2d(x, W, stride):
return tf.nn.conv2d(x, W, strides=[1, stride, stride, 1], padding='VALID')

I believe that this conv2d thing is a function that performs a kernel convolution on some input. Kernel convolutions are a more general class of the image manipulations I learned about a few days ago.

As far as I’m concerned, a kernel convolution manipulates the image to highlight some characteristic of that image, whether that is the image’s edges, corners, etc.

This particular characteristic is defined by “the kernel”, which seems to be defined using strides=[1, stride, stride, 1] from above. Although, I don’t know what strides means or exactly how this works.

It seems like there are three inputs to this image manipulation function: 1. The kernel/strides (to say how to manipulate the image); 2. x (which is the image itself); and 3. W (which I guess is a set of coefficients that are used to blend different image manipulations together in some capacity).

I have to learn more about W’s role in all of this.

At a high-level though, this function is manipulating the image in some way to automatically reduce the image into distinct features that are more conducive to training the model.

What I still need to learn: How exactly is the convolution function being defined mathematically, and how does W play a role in this?

x = tf.placeholder(tf.float32, shape=[None, 66, 200, 3])
y_ = tf.placeholder(tf.float32, shape=[None, 1])
x_image = x

These next few lines seem pretty straightforward. Once again, we refer back to the equation Wx + b = y.

Here we are essentially defining placeholders for the x and y variables. These placeholders set the variables’ dimensions (remember: these variables represent a collection of values, not just a single number).

We are setting up x to expect to receive an image of certain dimensions, and we are setting up y to expect a single number as an output (i.e. the steering angle).

We then rename x to “x_image” to remind ourselves that x is an image, because… why not.

Nothing new to learn here.

#first convolutional layer
W_conv1 = weight_variable([5, 5, 3, 24])
b_conv1 = bias_variable([24])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1, 2) + b_conv1)

Okay, we are now onto our first convolutional layer.

We define W_conv1, which is just a specific instance of the weight_variable I explained above (with the shape [5, 5, 3, 24]). I’m not sure how or why the shape was set in this particular way.

We then define b_conv1, which is just a specific instance of the bias_variable I explain above (with the shape [24]). This 24 likely needs to match the 24 from the W_conv1 shape, but I’m not sure why (other than this is going to help make the matrix multiplication work).

h_conv1 is an intermediate object that applies the convolution function to the inputs x_image and W_conv1, adds bconv1 to the output of the convolution, and then processes everything through a function called relu.

This relu thing sounds familiar, but I can’t remember exactly what it does. My guess is that it’s some kind of “squasher” or normalizing function, that smooths everything out in some capacity, whatever that means. I’ll have to look into it.

While I can read most of the code, I’m not exactly sure why a “convolutional layer” is set up in this way.

What I still need to learn: What is a convolutional layer, what is it supposed to do, and how does it do it?

#second convolutional layer
W_conv2 = weight_variable([5, 5, 24, 36])
b_conv2 = bias_variable([36])
h_conv2 = tf.nn.relu(conv2d(h_conv1, W_conv2, 2) + b_conv2)
#third convolutional layer
W_conv3 = weight_variable([5, 5, 36, 48])
b_conv3 = bias_variable([48])
h_conv3 = tf.nn.relu(conv2d(h_conv2, W_conv3, 2) + b_conv3)
#fourth convolutional layer
W_conv4 = weight_variable([3, 3, 48, 64])
b_conv4 = bias_variable([64])
h_conv4 = tf.nn.relu(conv2d(h_conv3, W_conv4, 1) + b_conv4)
#fifth convolutional layer
W_conv5 = weight_variable([3, 3, 64, 64])
b_conv5 = bias_variable([64])
h_conv5 = tf.nn.relu(conv2d(h_conv4, W_conv5, 1) + b_conv5)

We proceed to have four more convolutional layers, which function in the exact same way as the first layer, but instead of using x_image as an input, they use the output from the previous layer (i.e. the h_conv thing).

I’m not sure how we decided to use five layers and how and why the shapes of each W_conv are different.

What I still need to learn: Why five layers, and how do we pick the shapes for each?

#FCL 1
W_fc1 = weight_variable([1152, 1164])
b_fc1 = bias_variable([1164])
h_conv5_flat = tf.reshape(h_conv5, [-1, 1152])
h_fc1 = tf.nn.relu(tf.matmul(h_conv5_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
#FCL 2
W_fc2 = weight_variable([1164, 100])
b_fc2 = bias_variable([100])
h_fc2 = tf.nn.relu(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
h_fc2_drop = tf.nn.dropout(h_fc2, keep_prob)
#FCL 3
W_fc3 = weight_variable([100, 50])
b_fc3 = bias_variable([50])
h_fc3 = tf.nn.relu(tf.matmul(h_fc2_drop, W_fc3) + b_fc3)
h_fc3_drop = tf.nn.dropout(h_fc3, keep_prob)
#FCL 4
W_fc4 = weight_variable([50, 10])
b_fc4 = bias_variable([10])
h_fc4 = tf.nn.relu(tf.matmul(h_fc3_drop, W_fc4) + b_fc4)
h_fc4_drop = tf.nn.dropout(h_fc4, keep_prob)

Next, we have four FCLs, which I believe stands for “Fully Connected Layers”.

The setup for these layers seems similar to the convolution steps, but I’m not exactly sure what’s happening here. I think this is just vanilla neural network stuff (which I write as to pretend I fully understand “vanilla neural network stuff”).

Anyway, I’ll have to look more into this.

What I still need to learn: What is a FCL, and what is happening in each FCL step?

#Output
W_fc5 = weight_variable([10, 1])
b_fc5 = bias_variable([1])
y = tf.mul(tf.atan(tf.matmul(h_fc4_drop, W_fc5) + b_fc5), 2)

Finally, we take the outputs of the final FCL layer, do some crazy trigonometric manipulations and then output y, the predicted steering angle.

This step seems to just be “making the math work out”, but I’m not sure.

What I still need to learn: How and why is the output being calculated in this way?


Done.

That took longer than expected — mostly because I was able to parse more than I expected.

It’s sort of crazy how much of the implementation has been abstracted away by the TensorFlow library, and how little of the underlying math knowledge is necessary to build a fully capable self-driving car model.

It seems like the only important thing for us to know as model constructors is how to set the depth (e.g. number of layers) of the model, the shapes of each layer, and the types of the layers.

My guess is that this might be more of an art than science, but likely an educated art.

I’ll start digging into my open questions tomorrow.

I’ve made sizable progress on this month’s challenge over the past eleven days, so today, I decided to take a break.

This break mostly included watching YouTube videos on Convolutional Neural Networks and kernel convolutions, which still helped me get closer to my goal, but in a more passive and restful form.

In particular, I watched a bunch of videos from the YouTube channel Computerphile, which is personally my favorite channel on topics related to computer science.

Here is the Computerphile video on Convolutional Neural Networks…

And here’s the first Computerphile video in a series about kernel convolutions…

While less relevant to this month’s challenge, Computerphile has a sister YouTube channel called Numberphile, which discusses topics in number theory and math in general.

Here’s one of my favorite Numberphile videos — it features a super cool way to approximate the value of pi.

Anyway, I enjoyed my break today, and will be getting back to work tomorrow.

Two days ago, I went through the meat of the self-driving car code and identified the open questions that I had about how it works.

As I’ve been digging into these questions, I’ve begun to realize that, from a purely application standpoint, it’s not super necessary to understand most of the underlying math / computer science to build an effective deep learning system. Naturally, I’m interested, so I’ve been learning as I go, but it’s certainly not essential at all.

Basically, here are the important things to know:

  1. In a Convolutional Neural Network, at the top of “the stack”, there are convolutional layers that learn which convolution operations (i.e. image manipulations) highlight the best features to learn on.
  2. Then, there are some Full Connected Layers that try to learn how to produce the correct output based on the features highlighted by the convolutional layers.
  3. In between, there are some mathematical tricks (like downsampling and rectifiers) that speed up the learning process.
  4. Determining the shapes and parameters of the layers is usually done empirically (i.e. try different options and pick the configuration that produces the best results).

In regards to #4, in Nvidia’s paper, which describes the deep learning system I’m using, they explain that “the convolutional layers were designed to perform feature extraction and were chosen empirically through a series of experiments that varied layer configurations.”

So, here’s the punchline: The secret to setting up an effective Convolutional Neural Network is to try a bunch of different combinations of the normally used components and see what works the best.

Basically, as long as you understand the basic components, building an effective self-driving car (Nvidia’s is 98% autonomous) is not much more than “guess and check”. (It could be argued that if we want to get the car from 98% to 100% autonomous, we must do a little bit more than guess and check, which is true today, but will probably become less true over time as we ever-increasingly apply more processing power to the problem).

Of course, under the hood, the implementations of all these “basic components” are more complex, but luckily the TensorFlow library has essentially abstracted away all of this work. Plus, we’ve gotten to a point where hobbyists are publishing full open-sourced self-driving car models on GitHub.

I predict that in 18–24 months we get to a level of abstraction where a self-driving car can be created in one-line of code — which I guess means that this month’s learning challenge isn’t going to age very well.

In this regard, this month’s challenge seems to point to the following lesson: Sometimes “learning hard things” is just peeling away the ambiguity or intimidation, and finding that, once you are oriented, things aren’t too challenging.

Since I’m now oriented, I won’t spend any more time digging into the theory (at least for the next week or so). Instead, I’ll shift my focus back to pure application and see if I can expand the capabilities of my self-driving car.

A few days ago, I was able to run some code and get my self-driving car to successfully steer through the streets (at least virtually, based on NVIDIA’s video footage).

Today, I wanted to take the next step, and see if I can use the self-driving car model on a different set of a data. After all, a self-driving car should be able to drive on any road, not just the roads it was trained on.

Thus, I decided to see if the model I trained last week could perform on Udacity’s dataset, which features a different set of roads.

The first step was to format the Udacity dataset, so that it could be processed by my self-driving car model.

Unlike, the NVIDIA dataset, which contains a video clip nicely broken down into sequentially numbered frames (0.jpg, 1.jpg, 2.jpg, 3.jpg, etc.), the Udacity dataset is a collection of confusingly numbered images: There are odd, inconsistently-sized gaps between the numbers, and the dataset starts counting at 1479425441182877835.

Udacity probably had some rationale for this numbering scheme, but I can’t seem to figure out what that rationale is.

Therefore, my first step was to rename all of these files to conform to the NVIDIA-style numbering scheme (i.e. start at zero and count up by ones).

At first, I thought about putting in my headphones, turning on an audiobook, and manually renaming each file. Then, I realized that there were over 5000 files, and that I should probably figure out how to automatically do it with some code (after all, I’m trying to improve my coding skills this month).

After about 12 minutes of noodling around, I was able to write a small Python script to rename all the Udacity files.

import os
def rename(directory):
i = 0
for file_name in os.listdir(directory):
new_file_name = str(i) + '.jpg'
old_file_name = file_name
os.rename(old_file_name, new_file_name)
i += 1
PATH = os.path.abspath('/Users/maxdeutsch/Desktop/nvidia/udacity_data')
rename(PATH)

I ran the script, and within a few moments, all the files were renamed (I’m glad I didn’t do it by hand).

Next, with the dataset prepared, it was time to test the model on the new dataset, output the graphics of the autonomously controlled steering wheel, and then overlay the graphic on top of the original footage to see how the car performs.

Sadly, the performance is very bad: If the self-driving car actually followed these instructions in real-life, it would crash almost immediately.

Tomorrow, I’ll experiment with the model and figure out how to better generalize my self-driving car so it doesn’t crash every time it’s introduced into a new driving environment.

Yesterday, I tried to drive my self-driving car on new roads (from the Udacity dataset). However, the car wasn’t prepared for this, and virtually crashed repeatedly.

Clearly, the model as trained on the NVIDIA dataset isn’t suited for the Udacity dataset.

One reasonable explanation for this discrepancy is that, while both datasets feature forward-facing videos of the road, the videos noticeably differ in regards to viewing angle, framing, and lens distortion. As a result, a model trained from one view shouldn’t work well on the either (since the camera has been moved, zoomed in, and reframed).

Here’s a still frame from the NVIDIA dataset:

And here’s a still from Udacity’s dataset:

Clearly, these views of the road are different enough, where a model trained on one dataset isn’t usable from the other. (If both cars were shooting video from the same vantage point, theoretical the model would then be usable across both sets of data).

Thus, to continue forward, I must train the self-driving car model also on a section of the Udacity data, and then see how the model performs when tested on the rest of the new data.


Today’s task was to reformat Udacity’s dataset so that it can be used for training.

Yesterday, I completed part of this reformatting when I renamed the Udacity JPEGs from long, random numbers to sequentially-labelled images.

From this…

To this…

With the images prepared, I next needed to create a text file called “data.txt” for the Udacity dataset, which lists the correct steering angle next to the name of the corresponding image.

Here’s what NVIDIA’s data.txt file looks like:

This isn’t the most compelling screen grab, since the first 22 steering angles are zeroed.

Yet, the Udacity data was compiled in a spreadsheet like this:

So, to create the data.txt file for the Udacity dataset, I needed to accomplish two things: 1. Rename the frame_ids to match yesterday’s numbering scheme; 2. Figure out how to convert a spreadsheet into a text file without any of the table-style formatting.

Rather than using a Python script, I tried to figure out how to do both steps within Google Sheets (there is likely a more efficient way).

First, I typed out a small sequences of consecutive numbers, and then pulleddown the sequence to populate the other ~5000 cells.

Next, I used the built-in functions TO_TEXT and CONCATENATE to convert the integers into strings and then concatenate those strings with the file extension .jpg.

On to Step 2 — Converting the table into an unformatted text document.

I used CONCATENATE again to combine the image names and the steering angles into a single cell (with a one space separator).

Then, I concatenated each cell with char(10), which is the character code for a line break. And lastly, concatenated all the cells into a single cell.

I copied the contents of this single cell into a text document, and my data.txt file was ready to go.

The one weird thing is that Udacity’s steering angle numbers seem very different than NVIDIA’s steering angle numbers.

It appears that the companies are using significantly different steering angle representations, but I’m not exactly sure how the two methods mathematically relate.

I may be able to successfully train the model without ever finding out this mathematical relationship, but I’m skeptical that’s the case.

I’ll explore this more tomorrow, and then start training the model on the Udacity data.

Yesterday, I finished formatting the new dataset, so, when I got home today from work, I was all ready to start training the model.

Yet, when I executed the train command in Terminal, my computer stalled for a second and then spit out an error. Specifically, a “list index out of range” error.

Typically, you get this kind of error when the program is expecting a list of things that is longer than the actual list of things.

Since the NVIDIA dataset is longer/larger than the Udacity dataset, I figured that the value for list length must be hardcoded, and I could adjust this value accordingly.

However, after looking through the code, I couldn’t find the problem. Everything seemed like it should work.

So, I added in a few print statements to the code, which would help me see what’s going on under the hood and exactly where the program is breaking.

I ran the program with the print statements, and got this output:

The program successfully iterated through all the lines of actual data, and then seemed to attempt to parse an extra line of data that doesn’t exist.

So, I opened the data.txt file, and sure enough… I had accidentally copied a few empty lines at the end of file.

I deleted these three empty lines, and reran the program in Terminal.

It worked, and the model started training.


While the model trains (we’ll check in on it tomorrow), I thought I’d share a quick, fun aside:

Today, on my commute into work, I passed a Google/Waymo self-driving car near the Mountain View train station.

Then, on my commute home, a few blocks away from my apartment, I saw two self-driving Ubers in row.

Here’s a slightly clearer picture of the lead Uber: It looks like it is currently being human-driven, likely for training purposes. The Google car was driving itself.

Almost every day during my commute, I see a few self-driving cars, but only thought today about taking and sharing a few photos. The fact that I’m already numb to the site of a self-driving car is pretty crazy — they are clearly not so far away from being a ubiquitous reality (regulation aside).

Anyway, it’s pretty cool to think that the software I’m running on my personal computer is essentially powerful enough to control these actual cars.

Today was a sad collection of mistakes and roadblocks (with no happy ending). As I tried to redeem myself, I kept falling deeper and deeper down the rabbit hole.

It started yesterday — I successfully reformatted the Udacity data and began training the self-driving car model.

After the model finished training, I took a quick look at the Loss graph (Loss measures the “accuracy” of the model — the lower the loss, the better the model… for the most part).

After the 30 epochs of training, the Loss didn’t even dip below 1.00, where as, when I trained the model on the NVIDIA data, the Loss dipped significantly below 1.00, all the way to ~0.16.

I’m not sure why I expected something different to happen — The Udacity dataset I used was only 1/8 the size of the NVIDIA dataset.

This was my first mistake: I accidentally used the testing dataset to train the model. Instead, I should have used the much larger training dataset, and then tested the trained model on the testing dataset.

Not a huge problem: I went to the Udacity Github page and downloaded the larger dataset for training. Or, at least I tried to.

Halfway through the download, my computer completely freaked out.

It turns out that my computer’s local storage / startup disk was completely full. So full that my computer refused to run any programs. Even Finder was unexpectedly crashing.

I plugged in my external hard drive, and started transferring all my Month to Master documentation off of my local machine.

By the way, as an aside, I had to take photos of my computer screen with my phone, since there wasn’t enough space on my computer to take screenshots…

Anyway, the first six months of M2M documentation eclipsed 132GB, 70.8 of which were on my local machine, so, once the transfer finished, I was able to move 70 GB of local stuff to Trash.

Then, upon trying to empty my Trash, my computer froze…

After restarting my computer a couple times, my Trash finally emptied, and, 30 minutes later, I was back in business.

With space now on my computer, I went back to the Udacity Github page to download the training dataset.

The training dataset was actually compressed in a torrent, so I needed to install BitTorrent to download the dataset.

After the torrent downloaded, I unpacked the file. I expected to see a bunch of JPEG images and a data.txt file as we saw before, but instead, I saw this…

Apparently, Udacity thought it would be a good idea to package the data in .bag files. I’ve actually never heard of .bag files before, but they seem to be the native way that self-driving cars (or other “robots” save data).

So, I needed to figure out how to extract the JPEGs and CSVs from the individual .bag files.

There’s a library called ROS (Robot Operating System) that is needed to work with .bag files, so I attempted to install it.

But, here’s what I found on the ROS install page…

In other words, the people who make ROS are basically saying “This isn’t going to work. It’s going to fail. Sorry.”

And they were right, it did fail.

Nevertheless, I spent some more time trying to resolve the error, and eventually it seemed as if I had successfully installed everything I needed. But, then I attempted to the run the extract script and that still failed.

At this point, I had to stop for the night.

Tomorrow, hopefully I will be able to make some forward progress.

Yesterday, I seriously struggled: I was trying to convert Udacity’s ROSbag files into JPEGs and CSVs, so I could use the data for training my self-driving car, but I didn’t have much luck.

Ultimately, I discovered that the Robot Operating System is not compatible with Mac, and so, I couldn’t properly extract the files locally on my computer.

Today, after a lot of trial and error, I was able to figure out how to run Ubuntu 14.04 and ROS on a virtual machine using VirtualBox.

After even more trial and error, I figured out how to use the virtual machine to extract the contents of the ROSbag files…

I was expecting to find a reasonably-sized one-camera set of images, and a CSV for corresponding steering angle.

Instead, the Udacity dataset includes ~33,000 frames of driving video from three different camera angles and all the data for steering, braking, throttle, GPS, etc.

Within the steering CSV, for example, the data includes, not only timestamp and angle, but also torque (rotational force on the wheel) and speed (turning speed).

Anyway, this dataset is super cool, and much more thorough than I expected. I’m excited to see how I can use this data to make a more functional, end-to-end self-driving car.

Yesterday, I cracked the Udacity dataset…

So, today, the plan was to reformat/prepare the data, and then start training the self-driving car model.

I decided to just model steering angle for now, as a first step. If that goes well, I’ll try to expand the model to include both throttle and breaking.

To prepare the data, I needed to create a data.txt file that looks something like this:

Basically, a plaintext file with the name of the frame next to the corresponding steering angle, separated by a space.

This seemed straightforward enough — but, there was a problem:

When I opened the steering.csv file, none of the timestamps in the file matched the timestamps of the JPEG frames. I thought I was perhaps overlooking something…

So, I went through the JPEGs and copied the first couple frame numbers.

Then, I individually searched the CSV for these particular frame numbers, but they didn’t exist…

This was a problem.

If I couldn’t match the images to the driving data, the dataset becomes completely useless.

Luckily, I had the not-so-brilliant idea of opening the other CSVs in the Udacity dataset (just to see what else there was), and eventually opened the interpolated.csv, which features all the data in one place AND matches all this data to the timestamps on the images perfectly.

So, I was back in luck.

I extracted the data I needed, created the data.txt file, and started training the model.

This will likely take all night, so we will check in on it tomorrow.

Yesterday, I started training the self-driving car model based on Udacity’s large dataset.

The Loss value (the measure indirectly proportional to the model’s accuracy) started at 6.14783.

Many hours later, the model finished training, reducing the Loss to only 0.000377398.

Here’s the plot of Loss over time:

This reduction in Loss is quite striking.

Even more strikingly though is the difference in Loss when using 5,000 datapoints (as I did a few days ago), with a Loss of ~1.00, and when using Udacity’s 33,000 datapoints, with a Loss of around 0.000377398.

In other words, by increasing the size of the dataset by a factor of ~7, the loss was reduced by a factor of ~2500. Clearly, this isn’t a linear relationship: With a little bit more data, the model becomes ridiculously better.

And this is why Google can afford to give away all/most of its machine learning algorithms and libraries via TensorFlow: Quantity of data is the differentiator, and Google has the most (from search, email, photos, YouTube videos, etc.)

In fact, it’s in Google’s best interest to opensource its algorithms, allowing a larger community of developers to improve the algorithms more quickly. Then, Google can take these improved algorithms, feed them its proprietary data, and have the best machine learning models by a significant margin.

Today really helped me appreciate the value of data as a competitive advantage.


Talking about data… I’ve reached a milestone in the creation of my own dataset: Today marks the 200th day in a row that I’ve written a blog post as part of my Month to Master project.

My dataset of daily entries now totals around 85,000 words. Maybe, once I’m done with all 365 posts, I’ll figure out something interesting to do with this dataset…

Yesterday, I finished training my self-driving car model on the Udacity dataset. Today’s job was to try to visualize the result:

While I expected the results to be good, the predicted steering is quite natural and not too jittery. I’m very pleased with the outcome.

A few things to note:

  • Originally, I thought there were two Udacity datasets: A training dataset (which I used yesterday) and a testing data (which I accidentally used for training a few days ago). Today, I realized that the testing dataset is actually a subset of the training set, so I decided to instead use some of the Nvidia data for the testing. The important thing here is that the model was trained on the Udacity dataset and tested on completely new terrain from the Nvidia dataset. In other words, the model works well on roads outside of its training set (this is very important if you want a universally functional self-driving car).
  • In order to properly simulate the output of the Udacity model, I needed to do two things: 1. Map the Udacity data into a range of values usable by the Nvidia simulator (the Nvidia model uses degrees as units, while the Udacity dataset ranges from -1 to 1), and 2. Perform some minor pre-processing to the Nvidia testing set (i.e. cropping) to test the Udacity model.
  • During testing, the script rendered images of the steering wheel rotated based on the predicted steering angle, and then I overlaid these renders on top of the original, uncropped Nvidia footage for a slightly wider view.
  • At around 40 seconds into the video, the car comes to a full stop, and then makes a sharp right turn. It seems like the car starts turning before the visuals indicate that it’s meant to go right (the car could go straight after all), so I’m not really sure how this happens. The Udacity dataset doesn’t have any knowledge of this particular turn. The only reasonable explanation is that the model recognized it’s in the turn lane, or the model is just more predicative than a human. Either way, this was a bit surprising, but pretty cool to see.

I’m nearly done with this month’s challenge: I just need to train the model on throttling and breaking, which, I suspect will be virtually identical to the way the model was trained on steering angle (after all, steering angle, throttling, and breaking are all just represented by arbitrarily-defined numbers).

For most of this month, I’ve been focused on creating a steering angle predicator for my self-driving car.

However, to finish off my car, it’s important that I understand how to build a system to control the accelerator/throttle and the brake of the car (i.e. “the pedals”).

While this may sound like a completely new challenge, it turns out that I can build my pedal system without writing any new lines code.

After all, my steering model was constructed by 1. Showing the computer a bunch of images, 2. Assigning a numeric value to each image, 3. Asking the computer to figure out how the pixels of the images relate to the numeric values, 4. Using this relation to predict the numeric value assigned to other images.

In the case of the steering predicator, this numeric value represented steering angle. But, it could have just as easily represented the throttle or braking amount.

Thus, to build the pedal system, all I needed to do was re-assign each image a numeric value that corresponds to the correct throttle amount or braking amount, and then ask the computer (in the same exact way) to determine the relation between the pixels and the numeric values.

So, that’s what I did.

For example, in the case of throttle, I prepared the data.txt with 10,000 images matched to their corresponding throttle amount, and then ran this file through the exact same machine learning model.

And, as expected, the model learned over time, reduced Loss, and determined an effective way to predict throttle amount.

I did the same for the braking system.

In other words, the machine learning model that I’ve been using is completely generic. It doesn’t matter if the inputs are steering angles, throttle values, or braking values. The model doesn’t discriminate.

This is perhaps anticlimactic, in some sense, but in another sense, quite an amazing thing: Machine learning technology has gotten to a point where we don’t need to create highly specialized models for each application. Instead, a single model is fairly extensible across many domains — and this will just keep getting better.

The point is that I don’t really have to do much work to get throttling and braking for free. It’s a pretty good deal…

This month, I challenged myself to build the software part of a self-driving car. In particular, I wanted to build two main things: 1. The car’s steering system and 2. The car’s pedal system (i.e. the throttle and brake).

Two days ago, I finished building the steering system — which can accurately predict steering angle based on a forward-facing video feed of the road.

For the steering system to be considered a success, I set two sub-goals:

  1. I needed to adapt and use the self-driving car model on a dataset that it wasn’t specifically designed for. Since I used a model based on NVIDIA’s research paper applied to a dataset provided by Udacity, I fulfilled the requirements for this sub-goal.
  2. Secondly, I needed to train the model on one dataset (i.e. set of roads), and the have it perform well on a completely different dataset (i.e. set of new roads). Since I trained the model on the Udacity dataset, and then successfully tested the model on the NVIDIA dataset, I also fulfilled the requirements for this sub-goal.

Then, yesterday, I used the same model, with modified data inputs, to successfully create the throttle and braking systems, which can accurately predict the throttle amount or braking amount based on a forward-facing video feed of the road.

With all these pieces assembled, this month’s challenge is officially complete!

Yesterday, I completed this month’s challenge, successfully building the software part of a self-driving car.

Now, the question is… What is left to do/build in order to actually get my car out on the roads of California?

The best answer comes from George Hotz — who pioneered the DIY self-driving car movement.

18 months ago, Bloomberg published a video about 26-year-old Hotz, who had built a fully-functioning (more or less) self-driving car in his garage:

In the video, George explains that he only needed to build two things to create the car: 1. A system that could output driving instructions based on input data, and 2. A system that could control the physical actuators of the car (i.e. the steering wheel, throttle, brake) based on digital inputs.

At the beginning of this month, I called #1 “the software part” and #2 “the hardware part”, focusing my energies on the software. So, with #1 completed, to finish developing my car, I would need to address #2.

As Hotz explains in the video, in order to control all the physical actuators of the car, he simply plugged his computer into the car’s debugging port (just as a mechanic would do). Of course, he then needed to figure out how to get his software system to properly send instructions to the car through this port in real-time, but he didn’t actually have to do too much hardware hacking: The car is already designed to be controlled digitally in this way.

Not that I know too much about cars, but this feels very approachable (as long as I have access to the Internet and YouTube). If I had another month, and, more importantly, a car, this would be the natural next step.

Basically, it seems that the end-to-end self-driving car isn’t that mythical after all. George Hotz helped show everyone, including myself, that the self-driving car space isn’t only accessible to companies like Google, but to reasonably casual hobbyists.

Since the release of the Bloomberg video, many small teams are now working on self-driving cars and related services, and this is likely to speed up significantly in the next year or so.

Hopefully, I’ve played a very minor role in this story, continuing to demonstrate the accessibility of this technology…

This month’s challenge is a bit different than the six previous Month to Master challenges. Specifically, it isn’t going to age very well…

In February 2017, I landed a backflip. If it was instead February 2020, it wouldn’t have made a difference.

In December 2016, I learned to draw realistic portraits. If it was instead December 2036, it wouldn’t have made a difference.

This month, May 2017, I built the software part of a self-driving car. If this was May 2020, this challenge would be completely different.

Well, of course, this is a guess, but still… In three years, it’s likely that everything I’ve built this month will be accomplishable with one line of code (with everything else being abstracted away).

Not only that, but my personal computer will be able to process much more data, much more quickly, outputting an exponentially better model for the self-driving car.

Basically, what I’m saying is that this month’s challenge isn’t going to be a challenge in a few years.

Not that that’s such a big deal, but it’s interesting to think about the obsolescence of skills and how the “cutting edge” quickly becomes the “dulling middle”.

Perhaps, I should consider this month’s challenge performance art, where the performance is “casual hobbyist builds self-driving software in a world where people still drive cars”. Then, this month’s set of blogs posts might stay more relevant/interesting over time. After all, art typically gains in value over time.

Anyway, just an observation that I thought was worth sharing.

Today, I’m flying to Seattle for a five day vacation, which is why I worked to finish up this month’s challenge a bit early.

While I’m in Seattle, I’ll continue writing my daily posts (since I still have much to reflect on as far as my process and learning techniques/insights are concerned), but I don’t plan to work on any more software.

This is why it’s important to set measurable goals: Since I’ve definitively accomplished what I set out to accomplish, I can feel good about going on vacation.

Many people often define their goals in more ambiguous terms — “I want to learn how to speak Spanish”, “I want to learn how to draw”, “I want to learn how to play the piano”. In these cases, it’s never possible to complete the goal, nor is it possible to truly know how you’re progressing.

A more defined goal is much more preferable.

It could be argued that an open-ended goal places more emphasis on the pursuit and not on the destination, which I do agree is a more productive mindset when trying to learn. However, without a well-defined destination, it’s not clear what’s even being pursued.

An ambiguous goal, in my opinion, is a bit of a lazy goal, where the goal’s creator doesn’t have genuine underlying intent. After all, with an ambiguous goal, you can’t fail (which is perhaps attractive), but you also can’t succeed.

And… If you do take the pursuit of an ill-defined goal seriously, it’s hard to ever be satisfied or know when to stop (and focus on the other important parts of your life).

But anyway, since I’ve completed my explicitly-defined goal, I’m fully prepared to enjoy my trip to Seattle guilt-free.

A few days ago, I declared that I completed the self-driving car challenge. The result was basically this video of my computer steering a car…

While the output of this month’s experiment was certainly a system that can autonomously control a car, the question remains… What did I personally accomplish this month?

After all, my self-driving car was primary-based on someone else’s open source code (that I lightly adapted and generalized). This open source code was based on a paper written by NVIDIA’s self-driving car research team. The model from the research was based on mathematical techniques (backpropagation, etc.) that were invented outside of NVIDIA’s lab, mainly in university research centers. And I can keep going…

These mathematical techniques are built on top of fundamental insights from calculus, which were invented hundreds of years ago, and so on.

I can also take this in a different direction: The code that I ran was built on top of a machine learning library built by Google, which was build on top of a high-level programming language that was built by others. Additionally, in order to run any of this code, I needed to install the necessary libraries onto my computer. Installing these libraries can be complicated, so I used install services that other people have set up to ease the process.

I can still keep going… but won’t.

So, returning to the question: What did I personally accomplish this month? It’s not exactly clear.

On one hand, I can say that I was able to get a self-driving car system to run locally on my computer, and that I was able to adapt the system to effectively process new sources of data.

On the other hand, I can say that I took the work of a lot of other people and combined it to make a video.

Both are true.

So, did I actually build a self-driving car? Can you say that companies like OnePlus or Xiaomi build smartphones, even if the software is built by Google and the hardware components are built my Samsung, Foxconn, and others?

Does “assembly” counts as “building”, and does “aggregating” count as “learning”?

I would argue yes, but I don’t think it matters.

The more interesting takeaway is this: Sometimes, things that seem challenging or inaccessible are actually much more novice-friendly than they seem. Thus, the difference between “building a self-driving car” and not was my belief that I could figure it out and my attempt in doing so.

In other words, often times, the exclusivity of mastery only exists because most people never pursue “the thing” (based on the assumption that they can’t).

So, I’d like to reframe what I accomplished this month: I didn’t crack the insanely difficult problem of building a self-driving car. Instead, I proved to myself that building a self-driving car (as of today) isn’t actually an insanely difficult problem. At this point, it’s something that I believe most casual hobbyists could figure out.

I think this is a more interesting outcome anyway.

During this month, as I’ve tried to undercover the mysteries of self-driving cars, I’ve received one question more than any other: Now that you have a better understanding of the technology, do you feel more or less safe entering a fully self-driving car?

I would say safer, but it’s complicated: The simplicity of the self-driving car model gives me confidence that, with more data and processing power, the self-driving capabilities will continue to get exponentially better. However, the irrational part of my brain feels weird with the fact that the model is so simple and unspecific to the task of driving.

Nevertheless, the simpler the model, the more generalizable and trainable it is, so I feel pretty safe about the prospect of fully autonomous cars (Well, I feel safe writing that I feel safe. Actually put me in the car, and see if I change my tune…).

One common question I hear in response to my general comfortability is: “Wouldn’t you rather be in control of your life? Even if it’s statistically safer on a whole to use a fully autonomous car, wouldn’t you want to know that you did everything you personally could to keep yourself alive?”

Sort of. But, truthfully, since I don’t own a car, and already trust strangers (Uber/Lyft drivers) with my life, I’m not sure I currently have that control anyway.


As a short addendum: Two days ago, MIT published its yearly research paperon the current state of consumer interest in self-driving cars, and 48% of the 3,000 participants of the study said that they would never purchase a car that completely drives itself.

Not only that, but, last year, 40% of 25–34 year-olds said that they would be comfortable with a fully driverless car, and yet, only 20% of the same group said the same this year.

It’s interesting to see that, as the general public learns more about self-driving cars, their comfortability is waning.

Honestly, after reading the MIT paper, I might agree with the general public (at least, as of today… But, I’m still pretty optimistic).

I’m currently visiting a friend in Seattle, who’s giving me a hard time about the post I wrote two days ago:

In that post, I explain that I was able to leverage the collective knowledge and work of many others to greatly accelerate my self-driving car efforts. In particular, I didn’t need to build the machine learning model from scratch, but instead, I could integrate an already-built model into my system.

As a result, I’m hearing a lot of “If you didn’t actually need to build the core self-driving car model, how was this month’s challenge actually a challenge?”.

In Friday’s post, I do agree that the perceived challenge (of creating a self-driving car mechanism) wasn’t actually the challenge, but that doesn’t mean there was no challenge at all.

In fact, I’d argue that the perceived challenge is never actually the real challenge.

For example, back in November, when I was trying to become a grandmaster of memory by memorizing the order of a shuffled deck of cards, the hard part wasn’t actually remembering. Instead, it was forgetting what I had memorized during previous attempts (so as not to confuse myself).

Back in February, when I was learning to backflip, the hard part had nothing to do with anything physical or athletic, but instead, had to do with the mental blocks and fear associated with self-preservation.

This month, the challenge wasn’t creating the self-driving car model, but instead, it was integrating this model into my system so it would actually work.

This isn’t surprising — most of software engineering isn’t necessarily building completely new components. Usually, it’s figuring out how to take a bunch of pre-existing components, fit them all together, and have the output function as expected.

In my mind, software engineering is a lot like performing an organ transplant: The trick isn’t finding the replacement organ (although, this can still be a long process). The trick is making the body not reject the new organ.

Similarly, software engineering has two main challenges: 1. Finding the ‘organ’ (i.e. an example of the component you want to build), 2. Integrating this ‘organ’ into the overall body (i.e. ensuring your development environment is properly setup; connecting the component to other components and services so that it properly interacts with the rest of the project; modifying the component so it behaves in the expected/needed way).

And so, while this month’s challenge wasn’t difficult because of the perceived difficulty of getting a car to drive itself, I still did spend a lot of time in the coding pit of despair trying to 1. Find a self-driving car model that I could successfully run, 2. Set up my computer’s environment to support the model, 3. Format data so it could be used by the model, 4. Figure out how to output something interesting from the model, 5. Modify the model to work with different datasets and for modeling different behaviors.

Overall, the perceived challenge wasn’t a challenge, but there was still plenty of friction.

There’s actually an important lesson here: Often times, when people are learning something new, they have a preconceived notion of what’s “going to be hard” or “what they are supposed to be learning”. As a result, when there are other difficulties along the way, people perceive these difficulties as roadblocks, instead of additional parts of the learning process.

In other words, when people struggle on the thing they expect to struggle on, they usually can justify persevering. However, if they are struggling on something that seems to be a distraction, people tend to give up (since they perceive that they’re wasting their time).

But clearly, these distractions are often times the actual meat of the learning challenge, and, if cracked, unlock the most progress towards the goal.

As Ryan Holiday explains: “The obstacle is the way”

Your obstacles aren’t preventing you from pursuing your goal, but, in fact, define the pursuit itself.

So, I stand by calling this month’s challenge a challenge. Not that the semantics matter anyway… But, it reveals any important lesson about learning and perseverance.

Tomorrow, I plan to go through my practice logs and calculate how long I spent on this month’s challenge.

But, before I do that, it’s important to note how I spent my time this month, and, more particularly, how my time was distributed.

In previous months, my practice time was fairly uniformly distributed — if I spent 30 hours over 30 days, then I was typically spending one hour per day.

However, this month, the lengths of my sessions were often significantly longer and less uniform. To understand this month’s difference, it’s necessary to understand the two phases of learning and the two types of learning…

The two phases of learning:

  1. The Discovery Phase — In this phase, the learner’s job is to explore the field, discover the best path, and design exercises in line with this best path.
  2. The Training Phase — In this phase, the learner already knows what she needs to practice, and spends all of her time deliberately and intensively focusing her training on this best path.

All learning requires these two phases, but these phases may interact differently within different types of learning.

The two types of learning:

  1. Disjointed Learning — In Disjointed Learning, the Discovery Phase and the Training Phase can be pursued completely separately, or in a disjointed manner. For example, when I was learning to memorize a deck of cards, I was able to first discover the proper technique, and then, separately practice that technique. Disjointed Learning usually allows for short, yet effective training sessions (after the necessary Discovery Phase)
  2. Simultaneous Learning — In Simultaneous Learning, the Discovery Phase and the Training Phase must be pursued simultaneously. Usually, Simultaneous Learning exists when there isn’t yet a clearly-defined path (that can be externally discovered), and therefore, must be uncovered and shaped during training. This process typical requires significantly longer sessions, given that progress is unpredictable and unstructured.

During the previous six months of challenges, I was able to take a fairly disjointed approach (I first researched the best ways to train, and then I trained), allowing me to evenly distribute my training across short, intense sessions throughout the month.

However, during this month’s challenge, I didn’t have the luxury of a clearly defined path. Other than Udacity’s $2400 online course (which I wasn’t prepared to pay for), there wasn’t an explicit plan I could follow to build a self-driving car. Instead, I needed to explore a lot of dead-ends and roundabouts, before finding my way.

As a result, my sessions this month typically lasted at least a few hours, but were less in quantity.

If I tried to build a self-driving car through daily 45-minute sessions, I would have never succeeded. I needed to restructure and redistribute my time to fit the type of learning featured this month (i.e. Simultaneous Learning).

Thus, the general takeaway is this: If you are planning to take on your own personal learning challenge, it’s important to identify which type of learning is required, and to structure and schedule your time accordingly.

Since it’s almost the end of May, it’s time to look back and see just how much time I spent on this month’s challenge.

Most of my time was distributed across nine longer sessions, of the following lengths: 1.5 hours, 2 hours, 2.5 hours, 1.5 hours, 1 hour, 3.5 hour, 2.5 hour, 1.5 hour, 1 hour, 3 hours.

I also spent another hour across four 15ish-minute sessions, and another five hours researching, reading, and writing (which I’m counting in the cases when it was used specifically as a learning tool).

So, in total, I spent 26 hours building my self-driving car.

Before computing this total, I was expecting something around 35–40 hours, so I was definitely a bit surprised with this result. But it makes sense now: There were many days where I felt like I was investing time, but I was actually either letting my computer run on its own or writing a more-detailed-than-usual daily post (i.e. not actually working on the challenge itself).

Interestingly, if everything went perfectly right this month, I likely could have finished the entire challenge in a few hours. Of course, everything didn’t go right, and I didn’t know what perfectly right was from the outset — but still… The majority of my time this month was spent finding my way.

“The way” itself didn’t actually take very long.

Today is the last day of the self-driving car challenge, and I want to address one of the big questions I received throughout the month: “I think you have something here… How are you going to make money from this?”

It’s weird because nobody asked me this when I learned to memorize a deck of cards or land a backflip.

So, this last post is my response to that (although, I’m not actually sure I directly answer the question)…


Most working adults find it difficult to devote time to learning new skills. And those who do find the time are typically focused on acquiring professionally-oriented abilities.

In other words, most adults never pursue mastery just for the sake of it. Instead, their pursuits tend to be commercially-justified or career-focused.

There’s nothing wrong with this motivation. In fact, I’d certainly encourage career-motivated learning over no learning at all. But, there is a bit of a problem with professional skills: They don’t remain relevant or interesting for very long.

Previous to this month’s challenge, I spent my time mastering skills that have very limited commercial value, if any (like Rubik’s Cubing and drawing portraits). The cool thing is… For as long as I live, I’ll be able to enjoy these skills. In other words, my upfront investment yields theoretically infinite returns.

On the other hand, this month, my challenge was very commercial-focused (or, at least, could be). If I wanted to, I could likely package up my newly-found self-driving car skills, and try to become a self-driving car engineer at a major tech company. Or, alternatively, I could attempt to start my own self-driving car startup. Or, I could try to package and sell my curriculum to others who have these career ambitions.

With the skills I acquired this month, I can do a lot of commercially interesting things.

But, here’s the actual thing… In 24 months (if not sooner), all of this commercial value disappears. Not only that, but in 24 months, this challenge will no longer be relevant or intellectually interesting at all: It was only interesting because of the timing, the still-reasonably-meaty technical details, and the current excitement for self-driving cars.

So, since I don’t have any commercial self-driving car aspirations, my fun with self-driving cars will likely end along with the month of May. I definitely had fun conquering the challenge, but I’m not sure that there are any repeat fun possibilities (like there are for the Rubik’s Cube, which I still repeatedly solve over and over every day).

Perhaps, I’m over-dramaticizing, but, compared with commercially-oriented skills, I tend to much prefer timeless skills — skills that can be enjoyed and constantly worked on forever.

I could write a lot more about how I also think that commercial motivations may not be the most productive foundation for learning in general, but I’ll gloss over this point for now.

The main point I want to make is this… Are you currently investing in yourself? Are you trying to learn something new? And, in particular, are you trying to learn something new just because it’s fun (and not for financial reasons)?

If not, maybe it’s worth considering.

Not because I think that timeless learning is somehow better or “more elite” or anything like that. I just think that it’s fun. And, often times, we don’t give ourselves the opportunity to have fun because we don’t think it’s practical.

But practical seems to be short-lived and everchanging anyway, so I’d encourage you to invest a little bit of time in some unpractical fun.

This post is part of my year-long accelerated learning project, Month to Master.

If you want to follow along with my daily posts, make sure to follow this Medium account.

For exclusive content on accelerated learning, discipline, and lifestyle design, subscribe to my once-in-a-while newsletter.

If you’ve enjoyed this detailed breakdown, let me know by clicking the heart below. It’s always hugely appreciated.