I did examine the distribution of Rotten Tomatoes’s audience scores, but I found no reason why I should use it as reference to control for the tomatometer variable.
This is how the distribution for audience scores looks like:
However, you raised indeed a serious issue about the randomization process.
I have thought and still do that each tomatometer rating — especially those for recent movies with few reviews — face a potential randomization problem, and there’s an inherent danger for a skewed, non-representative average.
But I reasoned that that was not my problem, but rather theirs. I collected the data a user can see and easily access on each website. If the ratings are averages of non-randomized samples, then that should somehow be reflected in the distributions.
Those at Metacritic face the same problem with the metascore. Perhaps that’s the main reason they add weighting coefficients to each rating derived from a review: to prevent non-randomization.
For the tomatometer, they don’t do any weighting, not to say that it’s not even a classical rating, as I explained in the article. Perhaps that’s why the distribution of the tomatometer took a totally unexpected shape.
My only concern was for the randomization and representativeness of the sample I have worked with. And the distribution of the 4917-movies sample for IMDB gave me a good reason to consider my working sample representative.
P.S. Thanks for taking the time to write a response!