Hi Shi Yan,
I’m glad you found my blog useful!
The algorithms they use vary from tool to tool so I’d suggest you take a look at a few. I personally found http://rll.berkeley.edu/cgt/ to be well documented, and this page http://www.autodiff.org/?module=Tools&language=python has links to autodiff tools from different languages.
A way to approximate gradients (at a point) would be to get the Taylor series expansion of the function, and take its first derivative at the given point.
The iterative method that I mentioned is explained in detail here: https://en.wikipedia.org/wiki/Numerical_differentiation
It’s basically trying to approximate the function as a line and iteratively improving the approximation till you reach a stopping criterion.
The way you are computing the gradient for ReLU is correct, that’s another approximation strategy called subgradient, where you approximate the gradient at a point by a line passing through that point and always below the convex function in question. You’ll usually first see it when training the hinge loss of SVMs using Gradient Descent.
Hope That Helps!