Hi, I have two separate comments.
First, it seems to be important that the output layer has no bias added to it. If I add bias I get much worse accuracy. I observed that with both sine and circle curve. Any idea why that is the case?
Second, I seem to be getting very good accuracy even with a 20 node hidden layer. The only difference is that I use a mini-batch during training. So probably showing more test samples. My code can be found here: https://gist.github.com/bibhas2/dcd4cb801970f55a69bb098e46ef85b7