Alex Olteanu
May 18, 2017 · 1 min read

I really like the mathematical modelling you did to transform IMDB’s distribution.

However, I don’t agree with applying it to this particular case.

The main reason is that you’d have to apply it to the other variables as well if you want to compare them fairly, and this would cause skews into distributions that are otherwise normal. For instance, Metacritic’s would become left skewed.

As a side note to your mathematical modelling, note that the greatest value on x-axis is 8.246 because the axis was adjusted (automatically) to the range of x values. If you set yourself the interval to [0,10] you will notice that the distribution becomes slightly skewed to the left. If you don’t tweak yourself the x-axis’ limits, then you can easily get a normal distribution even for the Fandango variable, simply because the software/the library you use will set as limits the variable’s extreme values.

Regarding Metacritic’s extreme values, the Metacritic team sets a limit for the minimum number of reviews. They also give certain weighting coefficients to each review’s rating depending on factors like a review’s quality. So I wouldn’t put the extreme values on the potential low number of ratings.

Thanks for letting me know what you think! And sorry for the late reply!

    Alex Olteanu

    Written by

    I write data science courses at Dataquest.io

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade