Hi Laurae, thanks for this very interesting post.
Iván Gómez Villafañe

LightGBM‘s way of dealing with categoricals is not described here because it would require only one split to get maximum accuracy: each feature’s value has either label 0 or 1 in this example, not both at the same time.

By sorting gradients and accumulating them, you can have the exact categoricals required to best minimize the loss, without brute forcing the combinations (n*(n-1) complexity).

If you rank cardinalities, then you also get maximum accuracy in this example.