Thank you Olly for the quick and detailed response. I’ve suspected that this may be related to the key-frames which unfortunately I can’t control.
The reason I thought it should be simple was a fact that a friend of mine has achieved the desired effect for iOS using just some AVComposition class available out-of-the-box there (which at a glance looks quite similar to the ExoPlayer’s MediaSources composition).
That’s why my understanding was that this may be one of the common use-cases for the ConcatenatingMediaSource as it knows which MediaSource to play next so it (probably) can prepare it for instant playback in advance (which in case of the ClippingMediaSource would be, as you’ve said, processing the frames between the key-frame and the desired starting point).
Unfortunately my lower-level media playback knowledge is not that extensive, so I’d be very grateful for any ideas or tips on how this can be achieved. Once again, thank you very much for you help.