This is a good overview, but there are a lot of generalizations. I’m currently running a cluster with over 1B documents on only 10 data nodes, and it’s snappy. According to your back-of-napkin calculation, I’d need 200 data nodes.
Also, I tend to use master nodes as request nodes together. Having 3 master nodes that do nothing else but sit around and wait for another master to die seems wasteful, but that depends on your risk tolerance v. cost sensitivity.