DSL for XML in C++

This is a translation of my Russian post on habrahabr posted in late 2011, so please read it bearing in mind the date when it was written.

What we have now

I’ll start with describing my current project and how we do our development there. Maybe we are not the only ones…

The project is a CRM, we have been developing for customers in one business segment. It is 6 years old and our team consists of 10 members. Language: C++ & PL/SQL.

When we started we have chosen to use Plain Old XML. And now most of our XML documents still have no XSD. What can be added here, if unit-testing practice has been developing here for 2 years (and is still in a lousy state) and Product Manager is constantly barking for the time spent writing tests. Well, lets keep on moving.

I believe: improvements appear when current status becomes rather a PITA. That was our case.

Many people, I think, do a lot of things in a not optimal way and in a poor condition. And we are no exception. The point is that we are keeping on doing. And our XML stuff is just an example.


Our practice

Here is an extract of code for building XML in C++:

Standart practice. Why not to use stream? — Because of the scope elements. But, I agree that stream is the first thing, that comes to mind and for animperative paradigm we are unlikely to get something apart from ‘feeding right-hand object to left-hand object’.

Okay, we got used to it… I can put up with a comma, but can’t bear repeatedly saying the name of the clearNode or clearCreateNode function for every tag. It is not important how short the name can be, even just ‘T’ sounds as nonsense. Like when, in a dialog with somebody, before every your answer you say ‘I answer:’, and before every phrase you say ‘a phrase:’ and before every your word you say ‘a word:’ and so on. Nonsense?

The problem

Somebody will say that XML is itself an excessive language — thanks that we don’t have to close all tags in your C++…

Yep, they are right, but it doesn’t mean we should be axcessive as well. I prefer to chat with a buddy, not a dumbass. When we both understand the context and the semantic of the dialog. When we can use slang and understand eachother. Or when you can set rules only ones and then use them in your communication.

For examlpe in Object Pascal there is a statement:

It is nice when you can say: ‘Here is an Object, and we set his property a to 1 and b to foobar’.

‘No fat!’

Going back to XML, here is how it can be written in DSL, for example in Lisp:

If we ignore parentheses here, then we will see that it has ‘no fat!’. If we do the same with XML — imagine that <, >, /, = and closing tags disappear — then we’ll get the same message.

Or with the pretty print:

Can we get the same meaning with C++?

The result

Here is what I’ve got:

Or with tags list extracted:

…for reuse in other modules:

I can’t made it fully fat-free (you should say xml::Element() at the scope entry point), but we have a real progress here:

  • Not repeating that ‘this is a TAG’
  • We can collect statistics for the whole document being compiled (and perform pretty printing for example)

The findings

We can define the interface of such DSL, that consists of:

  • Semantic operators for class Element:
  • DSL operators for class Element:
  • Interface for class Tag, which is not so interesting to extract because of the absence of scoping in tag names and attributes.

And now we can start figuring out different implementations for the XML compilation of our DSL.

Here is the reference version, for example:

To complete the picture, I’ll show the Tag class implementation:

…and a sample XSLT for transforming XML documents into this DSL:

The conclusion

We see, that we can write a DSL in C++ for the cost of a small amount of our blood.

The next stop will be embedding SQL queries into C++…

Thank you for you time and I hope you liked it.

Like what you read? Give Aleksandr Vinokurov a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.