Unfortunately our datasets can’t be open-sourced, but as we tried the unsupervised methods, it seems that pretty much any corpora of texts might work for you.
Additionally to that you could use RSS feeds from the news sites to grab some content for you and extract categories/texts from there. I’m not sure if such corpus of decent size is already available.
And finally it seems that Gensim library becomes quite popular for topic modeling. You might also want to check it out.