For test set results visualization I used t-SNE, a manifold learning technique for data visualization. t-SNE minimizes KL-divergence between joint probabilities of a low-dimensional embedded data points and original high-dimensional data using quite a notable non-convex loss function. You should definitely read the original paper, it is extremely informative and well-written.