What is the difference between HTML vs XML vs JSON?

Omar A
4 min readDec 18, 2018

--

In order to better understand the differences between these terms I think it’s important to first know some of the history behind them. In the early days of the digitization of documents and limited processing power, programmers were tasked with finding ways to make this process faster and more efficient. If we break this problem down we can see that when rendering any kind of document, there are 3 main editing issues that need to be dealt with. These are:

  1. Content: The actual information that needs to be displayed. This is referring to all of the actual words that make up the document and that also portray some kind meaning
  2. Structure: This is how all of the information within a document is organized. Essentially this is the breaking down of the content into smaller bits of information that can be easily read or parsed through
  3. Formatting: This is simply how you want that document to visually appear to someone that is reading it. For example, the title should be a bigger font size than the rest of the text and be bold

Each of these issues would have to be dealt with separately and would require a lot of processing power since the content would need to be loaded again every time one of these processes needed to adjusted. The first attempt to combine these editing issues into one process was the development of GML (Generalized Markup Language)in the 1960’s by three IBM employees that had to deal with this problem: Charles Goldfarb, Edward Mosher and Raymond Lorie. It is also no coincidence that GML is the first initial of each of each of the founders’ last name.

The way that GML attempted to solve this problem was by wrapping tags around the content that would contain within them instructions for defining both the structure and formatting of the content inside the tags. These tags would also have the ability to be nested (tag within a tag) in order to have even more control of the content. Instead of having 3 separate processes with different scripts to do different things, now you had one document with all of the information it needed to be displayed in an organized and reader-friendly way. While this had solved the issue at hand, it then created other issues…

As processing power was rapidly increasing and more people were now involved in development of documents and websites, SGML (the standardized version of GML) was now proving to be difficult to use. While it produced a great result (reader-friendly documents), it had too many rules that needed to be followed or else the whole process would not work. This led to the development of HTML (Hyper Text Markup Language). Just like SGML, HTML would wrap content in tags, but unlike it, the rules regarding which tags could be used when and where were much less strict. While this did not greatly affect the formatting of a document (ex: if something needs to be a bigger font and bold, you put it in a header tag no matter where it is), it did in fact lead affect the structure of information in a document (the less strict the rules of when and where tags can be used, the less structure you can maintain).

At this point the ease of use of HTML was great for developers but the lack of structure provided by SGML was still missing. This led to the creation of XML (Extensible Markup Language). XML would attempt to address this issue directly by maintaining all of the strict rules of SGML, therefore maintaining the structure of information (just as a database does), without worrying about the formatting at all (this was left up to the HTML to address). This would also address another issue which is the vastly increasing amount of content that was available. For example, if you have millions of records that are all structured in the same way, they can be saved somewhere separately and when needed by the document (or webpage) would be called on. The XML would be the format of sending and receiving those records and the HTML would be responsible for formatting and displaying those called upon records. Now we have successfully addressed all three issues involved with displaying a user-friendly and readable document or webpage!

This is in fact the same basic model that webpages use today. While XML is still used today and does have some advantages over other information-structuring techniques, the most widely used format is the analogous JSON (JavaScript Object Notation). This has grown in popularity mainly due to the advancement of web application development, which is basically having websites that do stuff rather than just display stuff (think dynamic vs static websites). Because these web applications are developed using languages such as javascript (not markup languages which just have to do with displaying stuff), it made sense to have a way of structuring information in a way that can more easily and fluently interact with these languages.

--

--

Omar A

Full Stack Web Developer and Software Engineer. Focused and ready to make a difference in the world.