I think that the confusion stems from the definition of multi-label classification.
If I understand your questions correctly you believe that the multi-label classification works by locating multiple objects in an image that fit into different classes, but that is not correct in this case.
It means that the image as a whole fits into multiple classes, for example an image of an eagle can be classified both as a bird and an eagle. Or if you classify news articles, an image of a crashed car can fit into categories: car, accident, police etc. It just means that the image has been understood in different contexts and therefore has been assigned multiple different classes.
I hope I made it clearer :)