Best Practices to Mitigate JSON Interoperability Vulnerabilities

Claudio Salazar
The Startup
Published in
7 min readMar 1, 2021

An Exploration of JSON Interoperability Vulnerabilities” by Jake Miller was published last week. It’s an interesting research about differences among JSON libraries that could lead to logic bugs and puts this kind of vulnerability into the map when you do threat analysis. In this post I’ll analyze the examples from the original post and try to mitigate the vulnerabilities from a secure development perspective.

We’re going to start analyzing the first lab “Validate-Proxy Pattern” from the original post. The code can be found here:

It’s a bit contrived example because it takes advantage of a malformed JSON dictionary with duplicate keys which is sent to a Python endpoint, the payload is processed in Python but strangely the original payload is sent to a Go endpoint too. The malformed JSON dictionary is not processed in the same way by both Python and Go libraries and the vulnerability lies in this difference. Using the original payload is key to make the vulnerability work and the author states:

From a developer’s perspective, why would you waste computation by re-serializing a JSON object that you just parsed and validated when the string input is readily available? This assumption should be sound.

I disagree with that statement (I’m not the only one) and I can give the following reasons:

  1. The Python endpoint could receive parameters that only make sense for this application and there’s no reason to pass them to secondary services (the Go endpoint).
  2. If I do some sanitization and/or validation on my data, I continue the logic with the validated data and usually discard the initial data as source of truth.
  3. Passing raw user input to secondary services could lead to new scenarios like the user trying to fuzz parameters that trigger some different behavior in the internal microservices.
  4. In terms of performance, dumping a small dictionary doesn’t seem an expensive operation so we can favor a more secure logic. In case of having performance issues, an application in production should be under monitoring so we can profile where’s the expensive logic. A performance issue can be fixed, a security breach not.

To fix the vulnerability we have to use the parsed data (at line 45) instead of the original request body in the subsequent request since data will contain only one entry for qty.

As I mentioned above, it’s a bit contrived example but it may happen. We know how to fix it but we should develop stronger foundations to avoid this kind of scenario in the future. Let’s start with the best practices.

Use a more comprehensive starting point

In my opinion, these days the communication between the backend and its consumers passes through Swagger (OpenAPI), so a good starting point is a framework with built-in Swagger support. I suggest you take a look at FastAPI project.

Returning to the target application, in the Flask domain some years ago I used flask-restplus but for this example I’m going to use flask-smorest since it uses marshmallow.

marshmallow is a library that will help us with our next step: data validation.

Make data validation maintainable

The purpose of schema definition (at line 12) is to validate the user input. In need of defining more attributes, it starts to be lengthy and difficult to maintain.

Additionally, there are some JSON Schema quirks like the default acceptance of attributes not declared in the schema. For instance, the following input would be valid with the current schema definition.

In the current application, the issue with accepting custom attributes is that an attacker would be able to fuzz parameters used on the Go endpoint. In general, the good practice is to accept only what is expected (another good lesson comes from prototype pollution) so let’s take that path.

We’re going to discard JSON Schema for data validation and instead use marshmallow (there are more similar libraries like Pydantic). With marshmallow we can define the expected data in a Pythonic way, which is more maintainable than a dictionary. There are a lot of additional features that you could discover on its documentation.

As I already introduced flask-smorest and marshmallow, we can rewrite the vulnerable application in a more robust way:

At line 18, the previous schema is replaced with a marshmallow’s schema and it supports nested attributes so you can reuse schemas in a clear and easy way. At line 28, the schema will be used to validate the user input. After validation, the validated data will be available in the argument data at line 30.

Let’s hit this endpoint with different inputs. The first one will be the proposed on the original post:

The endpoint replies ok and the variable data has the following value:

{‘cart’: [{‘id’: 1.0, ‘qty’: 1.0}], ‘order_id’: 1.0}

data is a Python dictionary with validated data so the developer only have to care about developing the logic of the endpoint itself. It also makes him less prone to try to use other sources of data (raw request body) since everything needed for this endpoint is in the data variable.

Let’s try adding an additional parameter called not_declared_param to our JSON payload. We send this new payload and the endpoint replies with a code 422 and the following response:

Not bad, the endpoint rejects the request since it contains a not declared attribute. This behavior is enforced by default by marshmallow👏

Pass down only what is expected

It wasn’t considered in our rewrite but it will be addressed here. If you’re sending a request to some endpoint, you know what data the endpoint is expecting so based on that information you could also use a schema to define the data to send. The snippet below shows a general idea and how useful it’s in the case you’re working with a lot of fields and want to send just a subset of them (in this case name and surname ).

The next lab “Validate-Store Pattern” tries to attack handling of special characters in different JSON libraries.

As far as I can see, both character and comment truncation on key names take place before any data validation since it’s handled by the JSON library at parsing phase so there’s nothing to do from our side.

However, a marshmallow schema will be useful to stop passing down dangerous input. All the invalid keys that could cause issues later won’t be accepted.

The main subject of the second lab is that you could introduce a malformed role value that will trigger the vulnerability when it’s parsed by a JSON library (ujson) with a different parsing logic . The scenario expects that role=superadmin\ud888 will be parsed as role=superadmin by ujson.

To address this situation, we will whitelist the role value to only accept ASCII characters plus numbers. This schema should be imposed on the role creation endpoint to create harmless roles.

As seen in the code, it’s easy to reuse the validation logic with other schemas. It’s important to know the meaning of our data and be able to narrow down the choices to a safe input.

Related to “Float and integer representation”, it’s interesting and disturbing the different values that the libraries could interpret for some high values. To address that, you should combine two things:

  1. As in the previous snippet, do some number range validation. If you don’t expect to sell more than 100 items in an order (i.e. because of logistics) you could add a validation that the number of items must be between 1 — 100 .
  2. Add a test suite to test with these high values.

Final words

Even if the examples seem a bit contrived and not so real-world, I could give another example a bit more realistic according to the nature of architectures event-driven nowadays:

You are using Filebeat to send logs to a Kafka topic and this topic is subscribed by two consumers written in two different languages. These consumers applications parse a part of the input as JSON and the vulnerability could trigger.

From my point of view, this research is something to take seriously. In order to mitigate the risk, you should:

  1. Create an inventory of JSON libraries used on your stack. Test the corner cases and try to choose libraries that share the same behavior.
  2. Use schema validation: even if two JSON libraries could have different parsing logic on determining a key, the values still should be valid. Find a way to share this validation definition among microservices if they are receiving data independently.
  3. Be supported by a comprehensive test suite: not only for the numbers case, but for every case reviewed here. Following that advice, you can ensure that the current implementation works according to your security premises. If in the future a developer wants to use another JSON library, your test suite will warn you about some differences in the parsing logic that could affect your application security (I touched that point on my presentation about secure software development)

JSON Schema is useful, in the past I used it to draw custom frontend UIs based on backend logic and share the validation logic between both. However, for data validation use a more maintainable solution closer to the programming language. In the case of trying to support a workflow like the mentioned at the beginning, you could declare your schemas using marshmallow and export them as JSON Schema using marshmallow-jsonschema.

If I’m missing another important point to take into account, please share it in the comments. Thanks for reading!