How to make TinyMCE output clean HTML

I like TinyMCE, I think it’s the best wysiwyg editor you can get. There’s just one thing that bothers me quite a lot. By default TinyMCE outputs really messy HTML code. For instance imagine you want to make an ordinary unordered list.

The output is following:

<ul>
<li>
<span style="font-size: x-small;">
<span style="font-size:10px; line-height:16px;">first</span>
</span>
</li>
<li>
<span style="font-size: x-small;">
<span style="font-size:10px; line-height:16px;">second</span>
</span>
</li>
<li>
<span style="font-size: x-small;">
<span style="font-size: 10px; line-height: 16px;">third</span>
</span>
</li>
</ul>

Why the hell so many spans and styles?

Fortunately there is a simple solution (it just took me 3 hours to find it). If you take a look at TinyMCE Configuration you’ll find two insignificant parameters invalid_elements and extended_valid_elements.

invalid_elements

This parameter allows you to specify which elements you want to exclude from HTML output.

invalid_elements: "span"

This solves our problem but you never know if you really always want to get rid of all span elements. What if sometime you have to have span with class attribute? On the other hand the output is nice and pure HTML:

<ul>
<li>first</li>
<li>second</li>
<li>third</li>
</ul>

extended_valid_elements

I think better way of doing this is extended_valid_elements. This is like the right opposite function to invalid_elements. Unlike to invalid_elements you can specify which elements can have which attributes. The default configuration is quite wild but it doesn't matter.

All we want to do is to say: “remove all spans without class attribute”.

And here’s the solution:

extended_valid_elements : "span[!class]"

HTML output is still nice and if you want to use span with class ... you can.

<ul>
<li>fdgsdfsfdg</li>
<li class="hello">cvbcxvbxcb</li>
<li>dsfgsdfgsdg</li>
</ul>