AI can be used to make automated decisions based on high-resolution images, but can we understand those decisions? In this article, I discuss how interpretable multiple instance learning can be used to tackle this problem. — Modern computer vision datasets can contain millions of images. However, these images are often small in size. For example, in ImageNet [1], the average image is only 469 x 387 pixels. But what if each image is over 10,000 x 10,000 pixels in size? Or over 100,000 x 100,000?