So here comes week 2.
Course 3 ends here and course 1 and 2 will be a half-way through.

Again, if you have some basic knowledge of machine learning and deep learning, you probably can finish the 3 courses in 7-day trial. I entered Room of Spirit and Time for this course and able to finish them and organize my note a bit.

Course 1:
Week 2 of course 1 walks you through the idea used in deep learning. There is quite amount of knowledge you have learned if you have taken machine learning course no matter by whom the course was taught. I will try to explain it in my own word again, so feel free to skip it. I won’t blame you.
General idea of machine learning: when we have a model we can always find the loss function to present how the model deviates from real data. Ideally this should be zero, but we have to learn to live with the minimum we can find. Traditionally, we can try to add something to the recipe when we have strong theoretical support, or we can keep making tweaks to a parameter and see whether we should tune it up or down. Fortunately, as long as the loss is reduced, the model usually is more accurate (conditions apply).
We want to approach the minimum of loss function given so many parameters — here split the topic into two parts — Gradient descent and vectorization.
Gradient descent is the technique being used to update parameter and minimize loss function. In order to understand this, you will need to know derivative and its meaning. If you don’t, just think it as the slope of loss function at a given point. We can calculate the partial derivative/slope of a parameter, and use it to update the parameter. Reason being the slope at the minimum point must be zero. To simplify the formula, we are just trying to update new parameter as the following:
new parameter = old parameter + slope
Once we reach the point where slope is zero, the parameter will practically stop updating itself, and this point might be the minimum point for this parameter we are looking for. You may have some doubt here and that’s normal — something called local minima may disturb this.
Vectorization is a useful method when you are handling many features and data at a time. The parameters are stored as vectors and the data is stored as a matrix, we can then create prediction for batch data at a time. This is convenient when presenting information.
Vectorization let us to update parameters at a time and find the total loss function of the new model, and the derivative allow us to identify whether it is possibly the minimum.
In course 1 we know how to update parameter, but there are some parameters of the model we have to tune, and that’s hyper-parameter. You might ask why can’t we use similar method to optimize hyperparameters? Nice catch, if we can automate it then there are fewer things human can do, right? In short, it falls into the trap of local-minima too easily and sometimes it doesn’t even converge to a feasible result. In practical, we have some good reference for hyperparameters, and there are not too many of them to be tuned for now.

Course 2:

Course 2 goes over several hyperparameters and how it helps faster optimization or avoid overfitting.
Mini-batch — using small part of the data to update the model instead of iterating through the whole batch, which takes longer to present the result.
Momentum — is adding previous gradient as a additional factor when updating.
RMSProp — adjusting learning rate according with the square root of the sum of squares of previous gradient. Previous gradient are updated with exponentially weighted average, which means more recent gradient will contribute more than older gradient. Exponentially weighted average has a small issue with gradient at early stage of learning, because there is no previous gradient, or momentum, it tends to be low and slows down the learning process. A technical skill is used here to make it more accurate— dividing it by 1-(beta)**t — the earlier gradient will be divided by a smaller value, therefore become larger.
Adam Optimization ( Adaptive Moment Estimation) — Using momentum and RMSProp together, period.

Course 3:
Further address how to analyze the error and practical issue when working on machine learning project. This part is more valuable and less technical, but I would recommend putting a bit more time than other two courses if you are going to take the specialization.

So, course 3 ends here, but I find the lectures are even more useful than technical courses. One of the reason being there are so many resources teaches technical skills, various tutorials, and many other lectures, but not many well-organized courses that teach you how should you strategically design your projects. This may be helpful for not only machine learning engineer but also project manager/product owner.
Course 1 and 2 are also half-way through, and I again recognize this is a specialization designed for beginner to step in. My hope for more in-depth lecture on CNN and RNN remains although I find that does not likely to come true.

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade