How to make TinyMCE output clean HTML

I like TinyMCE, I think it’s the best wysiwyg editor you can get. There’s just one thing that bothers me quite a lot. By default TinyMCE outputs really messy HTML code. For instance imagine you want to make an ordinary unordered list.

The output is following:

<ul>
<li>
<span style="font-size: x-small;">
<span style="font-size:10px; line-height:16px;">first</span>
</span>
</li>
<li>
<span style="font-size: x-small;">
<span style="font-size:10px; line-height:16px;">second</span>
</span>
</li>
<li>
<span style="font-size: x-small;">
<span style="font-size: 10px; line-height: 16px;">third</span>
</span>
</li>
</ul>

Why the hell so many spans and styles?

Fortunately there is a simple solution (it just took me 3 hours to find it). If you take a look at TinyMCE Configuration you’ll find two insignificant parameters invalid_elements and extended_valid_elements.

invalid_elements

invalid_elements: "span"

This solves our problem but you never know if you really always want to get rid of all span elements. What if sometime you have to have span with class attribute? On the other hand the output is nice and pure HTML:

<ul>
<li>first</li>
<li>second</li>
<li>third</li>
</ul>

extended_valid_elements

All we want to do is to say: “remove all spans without class attribute”.

And here’s the solution:

extended_valid_elements : "span[!class]"

HTML output is still nice and if you want to use span with class ... you can.

<ul>
<li>fdgsdfsfdg</li>
<li class="hello">cvbcxvbxcb</li>
<li>dsfgsdfgsdg</li>
</ul>