What do AB tests actually measure?
adam kelleher

I think you’re missing one more layer of self-selection that can muck up the waters even more (because obviously we don’t have enough trouble =D).

You kinda touched on it with the active/power user bias, but instead of just timing, there’s also the aspect that most users don’t use all the features of a product, so they select themselves into using some parts. Fine if you’re focusing on that group, but can be an issue if you’re trying to broaden the user base.

If it’s some semi-obscured feature, this quirky subset its users is also often motivated enough to learn painful UX workarounds, so they’re often poor candidates for generalizing to broader populations.

