Great article, helped me alot getting started.
I want to scale it up on lots of timeseries. But, unfortunately I am running into a difficult issue now. Performance on my GPU (CUDA) is worse then on my CPU. The GPU gets utilized at max 10/15%.
I want to debug what is causing this (probably some memory overhead thing). I noticed that with tf.Session you can add FULL_TRACE to RunOptions.
I went through the code but apparently there is no similar option with the Estimator API?
I just also read up a little bit about building it distributed "manually" (i.e. not use the Estimator API) which would provide me with more control. But I also want to keep it simple.
What would be the best approach here?