Avoiding XSS via Markdown in React

Using Markdown with Sanitization

Websites often use Markdown to allow their users to create content. It provides a lightweight markup language that can be converted to HTML and other formats by many common libraries.

There has been some recent discussion in the news about Markdown based XSS exploits in major websites like Pastebin.

We also heard about this Markdown XSS issue during a recent presentation at the LocoMoco Security Conference.

The root of the issue it that Markdown specification actively encourages HTML in Markdown, but that isn’t a good default for sites who are worried about code injection attacks.

Markdown Converts to HTML

// Markdown Input
Heading
=======
// HTML Output
<h1>Heading</h1>

Some developers choose Markdown over HTML because they feel that it will help them avoid code injection attacks like XSS. Markdown doesn’t provide any security benefits by default.

In fact the most popular Markdown parsing library on npm, called marked, by default doesn’t sanitize or escape HTML content found within Markdown text when making a conversion. This is also what the standard encourages.

// Markdown Input
Heading
=======
<script>alert(1)</script>
// HTML Output
<h1>Heading</h1>
<script>alert(1)</script>

The JavaScript Markdown conversion library name marked has 1,054,581 downloads in the last 7 days and the npm hosted webpage for the project doesn’t mention security or sanitization. If you click some links and visit the projects home page you will find this message.

Security
The only completely secure system is the one that doesn’t exist in the first place. Having said that, we take the security of Marked very seriously.

Code Injection via Unsanitized Markdown (The Default)

Using the code example from the marked webpage we can add a XSS payload to the innerHTML of an element in the DOM and perform a XSS attack. This is the default behavior of the library. It just forwards on any HTML that is found in the source text and puts it directly into the results after converting the Markdown tags to HTML.

<!doctype html>
<html>
<head>
<meta charset="utf-8"/>
<title>Marked in the browser</title>
</head>
<body>
<div id="content"></div>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<script>
document
.getElementById('content')
.innerHTML = marked('
<img src=x onerror=alert(1)></img>');
</script>
</body>
</html>

Opting in to Markdown Sanitization

Marked knows how to safely handle XSS payloads in the Markdown text, you just need to opt-in to that behavior. It is three clicks from the npm page on the advanced configuration page.

myMarked.setOptions({
 renderer: new myMarked.Renderer(),
 highlight: function(code) {
 return require('highlight.js').highlightAuto(code).value;
 },
 pedantic: false,
 gfm: true,
 tables: true,
 breaks: false,
 sanitize: false,
 smartLists: true,
 smartypants: false,
 xhtml: false
});
// Compile
console.log(myMarked('I am using __markdown__.'));

If you read through the list of options passed into setOptions you can see that the sanitize property defaults to false. If you would like marked to prevent XSS attacks in your Markdown you just have to set that value to true.

marked.setOptions({ sanitize: true })
marked('<img src=x onerror=alert(1)></img>')
// Returns "<p>&lt;img src=x onerror=alert(1)&gt;&lt;/img&gt;</p>"

The fine author of the marked package got the XSS sanitization right, the sanitization works, if you opt-in for it.

Injection Attacks in React Components with Sanitization Turned On

I found a popular React component module on npm called react-marked-markdown. It allows you to make React components that render Markdown. It uses the marked library we talked about earlier under the hood.

The react-marked-markdown module was downloaded 972 times last week. It mentions on the npm page that it used the sanitize: true option from marked by default.

Unfortunately, if you use the react-marked-markdown module you will be vulnerable to XSS attacks even with the sanitize: true option.

import { MarkdownPreview } from 'react-marked-markdown'
const spike = '[XSS](javascript: alert`1`)'
const Post = ({ post }) => (
<div>
<h1>{post.title}</h1>
<MarkdownPreview
markedOptions={{
gfm: true,
tables: true,
breaks: false,
pedantic: false,
sanitize: true,
smartLists: true,
smartypants: false
}}
value={post.content}
/>
</div>
)

This is where things get interesting. It is possible that the marked npm package contains the correct sanitization logic but when the marked package is used by the react-marked-markdown module it is still vulnerable. Let me show you how this can happen. This is the code from the marked library that handles sanitization of links.

// https://github.com/markedjs/marked/blob/master/lib/marked.js#L973
Renderer.prototype.link = function(href, title, text) {
if (this.options.sanitize) {
try {
var prot = decodeURIComponent(unescape(href))
.replace(/[^\w:]/g, '')
.toLowerCase()
} catch (e) {
return text
}
if (
prot.indexOf('javascript:') === 0 ||
prot.indexOf('vbscript:') === 0 ||
prot.indexOf('data:') === 0
) {
return text
}
}
if (this.options.baseUrl && !originIndependentUrl.test(href)) {
href = resolveUrl(this.options.baseUrl, href)
}
try {
href = encodeURI(href).replace(/%25/g, '%')
} catch (e) {
return text
}
var out = '<a href="' + escape(href) + '"'
if (title) {
out += ' title="' + title + '"'
}
out += '>' + text + '</a>'
return out
}

Here is the code from the react-marked-markdown library overriding the render.link method with a custom function that doesn’t do the sanitization anymore.

const renderer = new marked.Renderer()
renderer.link = (href, title, text) =>
`<a target="_blank" rel="noopener noreferrer" href="${href}" title="${title}">${text}</a>`

Securely Adding Markdown Output to a React Component

The react-markdown module on npm takes a different approach to adding the HTML generated from Markdown into React components. They mention that they don’t use dangerouslySetInnerHTML on their npm page. That is a great start.

If you don’t need to render HTML, this component does not use dangerouslySetInnerHTML at all - this is a Good Thing™.

Rather than using strings to string conversions, they convert the Markdown to an Abstract Syntax Tree (AST). Then they use the React library functions, like React.createElement, to make the element tree. Then they let ReactDOM worry about rendering the tree and applying the correct contextual escaping.

They also take extra care in the react-markdown module to handle any dangerous props that could contain XSS exploits. For example they transform all anchor tag’s href attributes by default.

transformLinkUri - function|null Function that gets called for each encountered link with a single argument - uri. The returned value is used in place of the original. The default link URI transformer acts as an XSS-filter, neutralizing things like javascript:, vbscript: and file: protocols. If you specify a custom function, this default filter won't be called, but you can access it as require('react-markdown').uriTransformer. If you want to disable the default transformer, pass null to this option.
https://github.com/rexxars/react-markdown/blob/master/src/uriTransformer.js
'use strict'
var protocols = ['http', 'https', 'mailto', 'tel']
module.exports = function uriTransformer(uri) {

var url = (uri || '').trim()
var first = url.charAt(0)
if (first === '#' || first === '/') {
return url
}
var colon = url.indexOf(':')
if (colon === -1) {
return url
}
var length = protocols.length
var index = -1
while (++index < length) {
var protocol = protocols[index]
if (
colon === protocol.length &&
url.slice(0, protocol.length) === protocol
) {
return url
}
}
index = url.indexOf('?')
if (index !== -1 && colon > index) {
return url
}
index = url.indexOf('#')
if (index !== -1 && colon > index) {
return url
}
// eslint-disable-next-line no-script-url
return 'javascript:void(0)'
}

If you use the react-markdown module and pass in attacker controlled anchor href attribute values they will be replaced with the string javascript:void(0).

const spike = '[xss](javascript:onerror=alert;throw%20"hi")'
ReactDOM.render(
<ReactMarkdown
source={ spike }
/>,
document.getElementById('root')
)

Conclusion

Using Markdown output in the innerHTML of an element will lead to trouble most of the time. If you use Markdown in the dangerouslySetInnerHTML prop of a React component you are also asking for trouble. If you want protection then make sure you enable the sanitization options in your Markdown library. If you are using a third-party module take a peak under the hood and make sure they are correctly setting the Markdown library options and sanitizing values before inserting them into the DOM.