JSON vs YAML vs XML — when to use what?

Kasturi Kugathas
Black Book for Data
4 min readApr 6, 2022

A practical analysis of the three file formats and where they fit in

Photo by Shahadat Rahman on Unsplash

XML, JSON, YAML — can it get any better? It’s fantastic isn’t it? This luxury of choosing from among the options that we have. Hmmm.. not quite. To be honest, it’s a pain. Before that, let’s take a quick look at what these formats are.

XML

First introduced in 1996, XML was birthed as a Markup Language. According to Wikipedia it was intended for “….for storing, transmitting, and reconstructing arbitrary data.”. It’s the most popular format in use even today due to its variety of usage and its rich schema definitions. XML made message transmission through Internet agreeable and popularised SOAP and XMPP adoption. In the later years, it found its popularity pick up with Java providing inbuilt support and products developed using Java adapting XML as their configuration format.

<Properties>

<Property>
<Name>DataInPath</Name>
<Value>data/input</Value>
</Property>
<Property>
<Name>DataOutPath</Name>
<Value>data/output</Value>
</Property>
</Properties>

JSON

The JavaScript Object Notation format came into picture as a notation to transmit data between web applications and servers. Although most programming languages provide support for JSON, it’s safe to say that Python played a major role in increasing its adaptation. The ease of converting JSON to native Py dictionary is indeed a gift to cherish.

{
"properties":[
{
"name": "DataInPath",
"value": "data/input"
},
{
"name": "DataOutPath",
"value": "data/output"
}
]
}

YAML

Having been in the industry for 20 years, the formerly Yet Another Markup Language (it’s now YAML Ain’t Markup Language) took its own sweet time to get familiar in the tech world. But these days, it has become the favourite for coffee break past time. It’s quite easy to whip out a YAML file. For those familiar with indentations in Python, YAML is a walk in the park.

---
properties:
- name: DataInPath
value: data/input
- name: DataOutPath
value: data/output

Now, going back to the pain I was talking about in the beginning — yes, all the three files do serve as config files, as data exchange files, as serialisation formats. So then how to you decide which one to use for what?

As a config file

YAML and XML work best as config files. Both provide the opportunity to add comments which are crucial when it comes to the usage of config files. XML schema validation allows to ensure that the config is always in the correct intended format which is a benefit.

JSON can also be used as a config file. In fact, many projects still use it as config. The only drawback is the inability to add comments as comments. Instead, you need to add comments as an element sitting on top of the actual config.

Most large scale systems still use XML as a config file in their back-end. This includes AWS, Informatica, Hadoop, Oracle systems, etc.

Comments in XML

<Properties>
<!-- This property defines the input path -->
<Property>
<Name>DataInPath</Name>
<Value>data/input</Value>
</Property>
<!-- This property defines the output path -->
<Property>
<Name>DataOutPath</Name>
<Value>data/output</Value>
</Property>
</Properties>

Comments in YAML

---
properties:
#This property defines the input path
- name: DataInPath
value: data/input
#This property defines the output path
- name: DataOutPath
value: data/output

Comments in JSON

{"properties": [
{
"name": "DataInPath",
"value": "data/input",
"comments": "This property defines the input path"
},
{
"name": "DataOutPath",
"value": "data/output",
"comments": "This property defines the output path"
}
]
}

If above is the way to go, anyone would simply ignore adding comments to json configs. It’s just not right. Having been picked up late, YAML found its way as a config file solving the one problem that JSON always had — comments.

As a Data exchange format

But, don’t get me wrong. I love working with JSON. As a Python enthusiast, JSON is my favourite side dish. I dump the Python native dictionaries I work with in a file whenever I want and read it back in as a native dictionary whenever I want. No questions asked. No extra coding.

You can exchange a piece of data that is simply a record of information in its raw form between two systems using JSON. With REST winning the race against SOAP, JSON found its purpose and set the record straight against XML.

YAML on the other-hand is not very popular for transferring raw data between systems due to the lack of support as a machine readable native format. It’s best at being human-readable.

I hope you found this article helpful. Until next time, ciao!

--

--

Kasturi Kugathas
Black Book for Data

Mother, Techie — In the path of learning and growing myself