Data denormalization is broken
Liron Shapira
28621

Your motivating query should have good performance with the usual FK indices, an index on message on (room_id, timestamp), and a correlated EXISTS subquery (instead of computing the max). In SQL:

select count(1) from room_user ru where ru.user_id = <user_id> and exists (select 1 from message m where m.room_id = ru.room_id and m.timestamp > ru.last_seen_timestamp)

Then you just have a handful of index lookups against room_user and message, which any database should be able to handle, even at scale. I don’t know ReQL but I imagine it can do the same thing.

Now, I realize you’re only using this particular query as an example. But it’s not a good example. Rather than motivating the necessity of denormalization, it highlights the importance of understanding query plans and query optimization.

The idea of automatic denormalization is interesting. But in practice, in most ordinary applications, in most cases, you just need a better query. Denormalizations are kind of a big deal, as you rightfully point out, and should be approached with care.