Cybersecurity Debasing
In Cybersecurity, we often try to detect sequences of characters or bytes. The formats of our files can vary, and where we might use hex, binary, or Base64. There are many bases we can use, and each is defined by the number of characters they can support:
Base2 [01]
Base3 [123]
Base5 [01234]
Base10 [0123456789]
Base26 [A-Z]
Base32 [A-Z2-7=]
Base45 [0-9A-Z $%*+-./:]
Base58 (bitcoin) [1-9A-HJ-NP-Za-km-z]
Base62 [0-9A-Za-z]
Base64 [A-Za-z0-9+/=]
Base67 [A-Za-z0-9-.!~_]
Base85 (Ascii85) [!"#$%&'()*+,-./0-9:;<=>?@A-Z[\]^_`a-u]
Base91 [A-Za-z0-9!#$%&()*+,./:;<=>?@[]^_`{|}~"]
Base58 is interesting and is used in Bitcoin addresses. With this we see the “[1–9A-HJ-NP-Za-km-z]” character set and which does not have the characters that can be interpreted as another one. These include a lack of a “0” (zero), an “I” (a capital I), an “O” (a capital O), and an “l” (a lowecase ‘l”.
So, let’s try to automatically detect a few: