One orthogonal approach, which I would think would comply with T&C, and certainly ethics, would first to make an analysis of the timing of deletes.
As an initial hypothesis: if a tweet is deleted within 10 minutes it was likely a mistake the tweeter herself discovered. If it is more than an hour but less than a few days it is likely because of adverse reaction to the tweet. If it is more than a week it is likely to clean house.
In any case for a large collection of tweets, find number of deleted tweets and time from publishing to deletion, and other characteristics, “deletion profiles”. Are there “super-deleters”? Are deletions clustered or random in time?
This would give a baseline for all deletions. Now do the same for contentious tweets, from this article “false flag” + “shooter”. Any difference in pattern? How prevalent are deletes?
That should show deletes are used differently and generally if deletes are likely to affect the outcome of studies, and by how much.