How does candid’s differentiate between jokes or words of endearment as opposed to genuine threats?

For example, I might call my friend “my main bitch”, but because the word “bitch” is contained within that comment string, it does not mean it should be blocked.

This sounds like a recipe for disaster as it requires an understanding of relative context of the conversation, which is really at best understandable by a human moderator. There are all sorts of cultural, country-specific, and group specific terms that are not neccesarily hate speech, but playful expressions among the group.

