Thanks for the post!
Jose Armesto
1

Hey Jose,

Yes your understanding is correct. The processor does get invoked periodically (or when new events are added to the queue) and then retrieves items and sets them to claimed.

The race condition we used to handle by using a ‘select for update’ statement, which locks those rows and then causes a re-read on other processors if the first processor has updated the row as claimed. This didn’t work well in practice though, since it could leave a lot of rows locked in case the processor crashed, until the DB session timed out.

So we switched to a more optimistic locking approach, where both processors will read unclaimed items from the table. Then when they perform the ‘update’ statement to mark a row as claimed, that update statement includes another check with ‘claimed = false’. If this update returns that 0 rows where updated (meaning the item had already been claimed by another processor), then the current processor will not process it as well. It does mean we end up with some ‘empty’ reads from time to time but in practice this works better than table row locks.

If the processor fails, then we have a background scheduled task, that will cleanup failed items (old claimed items) from the queue from time to time and log errors.

Cheers,
Andreas