Recently there have been several articles pointing out that as an LLM output format, YAML is both cheaper and faster to generate. This article considers whether using YAML produces more reliable results than JSON. Figures are presented for generating simple test data via the Anthropic Claude family of LLMs — for other LLMs and use cases you may need to conduct your own testing.
Preparation: Modify your LLM prompts
Perhaps obvious, but for a fair comparison you need to ensure your LLM prompt mentions either YAML or JSON as appropriate, and that any examples sent in the prompt are in the matching format. Otherwise, the LLM will naturally struggle to produce valid output.
Preparation: Remove any LLM ‘patch up’ or ‘clean up’ post processing
In order to fairly compare YAML vs JSON, we should remove any existing code that patches up the LLM response, as otherwise it is hard to make a fair comparison.
Typical ‘patch up’ post-processing steps for JSON include:
- looking for pre and post markers (typically ```json and ```) and stripping out any extra text
- using a more forgiving JSON parser such as json5
- patching the JSON values, for example removing leading…