In the company, I currently work we are dealing with many external APIs. This short story is about API integration that sometimes didn’t work because of a single invisible character. I’ll not provide the name of the company, so let just refer to it as a “Company” :)
In this specific case, the Company knows to send webhooks to our exposed and secured HTTP endpoints.
One day we started to see error logs, meaning we cannot verify the request signature, and that’s where our small journey starts…
Spoiler: “People still copy text from Microsoft Word with all these hidden characters…”
When we receive a webhook, we are required to do sha256 signature verification for security reasons. We must be sure the request was done by the Company and nobody changed it (MITM). Pretty simple stuff that described in Company’s documentation with step by step explanations how to implement it.
So far, so good.
But, we started to see that sometimes we are failing to verify the signature. There were not many failures, but it was an important flow, so we needed to fix it.
We did our first checks, didn’t find anything that can immediately show us the problem. Our verification code looks something like that. What can be wrong there? Especially when 98% of verifications are successful.
We decided to ask Company’s dev support to investigate, why do they send us the wrong signature sometimes.
The Company’s developers did their checks and replied that they don’t see any issues on their side, and they successfully verified the same data we are failing to verify.
After a few ping-pongs, we asked them to show us how do they create and verify the signature for some specific payload.
They send us a working example at Ruby, where we found the tiny difference in the payload they used in the example. The code itself is not relevant, so I don't include it here. What is relevant is the HTTP body payload.
And this is how we saw the payload we sent them.
Our first reaction was like…
Why do they have
\u2028 inside the string? We don’t have such a thing in our payload. Why do they change our example and claim it works? But, maybe it was done automatically…
We decided to see how does our payload looks in hex. Sublime to rescue…
Hmmm… There is an invisible character here. Googling helped us to find out what does it mean. It was a line separator character.
LINE SEPARATOR (U+2028)
Unicode Character 'LINE SEPARATOR' (U+2028) Do not use this character in domain names. Browsers are blacklisting it…
But still, by the end of the day, the payload is just a bunch of bytes. And if we apply the same actions on these bytes, and by actions, I mean generating signature using sha256 function, the result should be the same.
In our authorization middleware, we encode (stringify) body object to get a JSON string that will be used as a payload to the signature generation function.
Here you can see that decoding/encoding using GO, the result string will differ from the source string. GO will automatically escape problematic character.
Quick diving to GO’s sources, and we can see where and why it was done this way. You also can see there is another symbol (PARAGRAPH SEPARATOR) that also will be escaped.
It is logical to assume that the Company’s software works similarly.
We didn’t think we found something new, so by quick googling we found an exact description of the issue, and fixed it by using a simple replace.
People still use Microsoft Word for writing text and then copy it to the web forms. In this case, content created by the user caused failures in some flows.
There’s no one to blame in this specific case. Actually, this is a good example of our reality, when working with APIs. Sometimes API will fail, for any reason. You just need to be ready to deal with it.
For me, it was a tiny and “cool” bug to fix that I really enjoyed in these annoying COVID-19 lockdown days :)