IS has its shortcoming but I do agree with you that their claims need further elaboration. The IS concerns with mode collapse. So your method sounds. Just need to be careful when comparing results with different methods. Are we holding the same conditions? i.e. we cannot have one measured with object type with CIFRA and then another with genres with WikiArt.