Jaroslaw Porzucek
Dec 20, 2018 · 7 min read

Content Security Policy (CSP) is an effective way of implementing additional layer of defense against common injection attacks like Cross Site Scripting (XSS) or clickjacking. First implemented in Firefox 4, now evolved into W3C standard Content Security Policy Level 2 with Level 3 version on the go. Being very powerful, CSP is quite complex at the same time (just skim through W3C standard if you don’t trust me.) It may be really challenging to implement a policy which only allows the content we trust and not blocking our own resources, especially when the policy is getting bigger.

This post is for people who struggle with understanding how CSP host-source matches URLs to load resources on the website. We’ll go through every point of the Matching Source Expression algorithm taken from W3C standard and explain them in easy to understand way. In the end I’ll check how host-source is implemented in most popular browsers.

NB: It’s assumed that the reader has basic knowledge of CSP and knows how to write simple policies.

Matching Source Expression algorithm

The algorithm is available here but reading through it may be a bit cumbersome. We’ll focus on point 4 of the algorithm which refers to the topic of this post which is host-source expression.

Before we begin, let’s briefly describe what the host-source is:

host-source = [ scheme-part "://" ] host-part [ port-part ] [ path-part ]

Basically, it is a typical URL we know from the address bar. We got a scheme-part which may have a value of http, https, file etc., host-part that is typically a domain or IP address, e.g. example.com or 192.168.1.1, port-part which is nothing more than a port number and path-part what is everything afterwards, for example, /path/to/some/resource or /path/to/image.jpg.

Now that we know what the host-source is, let’s come back to the algorithm. We’ll use the following convention when naming parts of URL that are responsible for loading resources on the website:

To distinguish URLs used in the policy, we’ll use scheme-part, host-part, port-part and path-part respectively.


All the steps to match host-source expression:

1. If url’s host is null, return does not match.

Explanation:
According to the URL specification, url-host is allowed to be null (empty) only with file and any other non-special scheme meaning we’re not allowed to use null with the following schemes: ftp, gopher, http, https, ws, wss.

Example:
The host-source of file:///path/to/some/resource will be ignored by CSP policy engine as its host-part is invalid (empty).

2. Let url-scheme, url-host, and url-port be the scheme, host, and port of url’s origin, respectively.

Note: If url doesn’t specify a port, then its origin’s port will be the default port for url’s scheme.

3. Let url-path-list be the path of url.

Explanation:
This points describes the naming convention we adopted for URLs. It also allows to omit default ports for URLs’ schemes.

Example:
This two URLs are equal in terms of CSP:
http://example.com = http://example.com:80 since HTTP uses port 80 by default.

4. If the source expression has a scheme-part that is not a case insensitive match for url-scheme, then return does not match.

Example:
For CSP the URL http://example.com is the same as HTTP://example.com but doesn’t equal ftp://example.com cause the url-scheme doesn’t match. Exceptions are secure schemes which match their insecure variants. In CSP Level 3 we read:

The URL matching algorithm now treats insecure schemes and ports as matching their secure variants. That is, the source expression http://example.com:80 will match both http://example.com:80 and https://example.com:443.

5. If the source expression does not have a scheme, return does not match if any of the following are true:

- the scheme of the protected resource’s URL is a case insensitive match for HTTP, and url-scheme is not a case insensitive match for either HTTP or HTTPS

- the scheme of the protected resource’s URL is not a case insensitive match for HTTP, and url-scheme is not a case insensitive match for the scheme of the protected resource’s URL.

Explanation:
In short, if there’s no scheme in the host-source expression then the default scheme HTTP is accepted as well as HTTPS.

Example:
A policy with host-source of example.com applies to both http://example.com and http://example.com:80.

6. If the first character of the source expression’s host-part is an U+002A ASTERISK character (*) and the remaining characters, including the leading U+002E FULL STOP character (.), are not a case insensitive match for the rightmost characters of url-host, then return does not match.

Explanation:
Here we have a typical wildcard expression for a domain name so we can include subdomains in our policy. It’s important to note that asterisk (*) has to be the first character in the host expression.

Example:
A policy with host-source of http://*.example.com matches both http://site.example.com and http://my.site.example.com. On the other hand, http://my.*.example.com matches nothing — especially not http://my.site.example.com.

7. If the first character of the source expression’s host-part is not an U+002A ASTERISK character (*) and url-host is not a case insensitive match for the source expression’s host-part, then return does not match.

Explanation:
If there’s no asterisk as the first character in the host-part then the url-host has to much exactly as in the policy.

Example:
The host-part of example.com matches nothing more than example.com. The URLs with url-host of www.example.com, example.com. or site.example.com does not match.

8. If the source expression’s host-part matches the IPv4address production from [RFC3986], and is not 127.0.0.1, or is an IPv6 address, return does not match.

Note: A future version of this specification may allow literal IPv6 and IPv4 addresses, depending on usage and demand. Given the weak security properties of IP addresses in relation to named hosts, however, authors are encouraged to prefer the latter whenever possible.

Explanation:
This statement is ignored in modern browsers as most of them allow host-part to be an IP address.

9. If the source expression does not contain a port-part and url-port is not the default port for url-scheme, then return does not match.

Example:
Policy of http://example.com matches http://example.com:80 but not http://example.com:8080 as 8080 is not the default port for HTTP.

10. If the source expression does contain a port-part, then return does not match if both of the following are true:

port-part does not contain an U+002A ASTERISK character (*)

port-part does not represent the same number as url-port

Explanation:
The url-port has to match the port-part in the policy unless port-part is an asterisk, then url-port may have any value.

Example:
The host-sourceof http://example.com:8080 matches only port 8080 while the policy of http://example.com:* matches every valid port, i.e. 0–65535.

11. If the source expression contains a non-empty path-part, and the URL is not the result of a redirect, then:

1. Let exact-match be true if the final character of path-part is not the U+002F SOLIDUS character (/), and false otherwise.
2. Let
source-expression-path-list be the result of splitting path-part on the U+002F SOLIDUS character (/).
3. If
source-expression-path-list’s length is greater than url-path-list’s length, return does not match.
4. For each
entry in source-expression-path-list:
4.1
Percent decode entry.
4.2
Percent decode the first item in url-path-list.
4.3 If
entry is not an ASCII case-insensitive match for the first item in url-path-list, return does not match.
4.4 Pop the first item in
url-path-list off the list.
5. If
exact-match is true, and url-path-list is not empty, return does not match.

Explanation:
This point seems to be the most complex but is indeed very simple. All we need to remember is that both paths (url-path-list and path-part) match if they’re equal (including the trailing solidus /). Also if there’s the solidus in the end, then all url-path-list like /something match the policy.

Example:
Policy with host-source of http://example.com/a/b/c matches the url-path-list in http://example.com/a/b/c but does not match http://example.com/a/b/c/. On the other hand, http://example.com/a/b/c/ does not match http://example.com/a/b/c but matches both http://example.com/a/b/c/ and http://example.com/a/b/c/d.

12. Otherwise, return does match.


Implementation in popular browsers

I’ve tested the behavior of CSP host-source expression by checking multiple <meta http-equiv="Content-Security-Policy" content="script-src 'unsafe-inline' ..."> and <script src="..." /> pairs in four most popular browsers. In most cases the behavior was correct but there were 3 such pairs where browsers behavior differed.


<meta http-equiv="Content-Security-Policy" content="script-src 'unsafe-inline' http://example.com">
<script src="http://example.com:443"></script>

Behavior to be expected: Block the script tag as the port 443 is not the default port for HTTP protocol.

Chrome (version 69.0): Block
Firefox (62.0): Allow
Safari (12.0): Block
Edge (42.0):
Block

<meta http-equiv="Content-Security-Policy" content="script-src 'unsafe-inline' http://example.com">
<script src="https://example.com:80"></script>

Behavior to be expected: Block the script tag as the port 80 is not the default port for HTTPS protocol.

Chrome: Block
Firefox: Allow
Safari: Block
Edge: Block

<meta http-equiv="Content-Security-Policy" content="script-src 'unsafe-inline' http://example.com:443">
<script src="https://example.com"></script>

Behavior to be expected: Allow the script tag as according to CSP Level 3, the host-source expression with HTTP schema should allow more secure HTTPS as well. Also both ports match because 443 is the default port for HTTPS.

Chrome: Block
Firefox: Allow
Safari: Allow
Edge: Allow


As we can see, only Safari and Edge (sic!) developers did their homework when it comes to implementing host-source in their browsers. It turned out that Chrome and Firefox have issues in some places so bug cases have been opened for those.

Conclusion

Content Security Policy is very powerful but complex at the same time and deploying it on the website is not a piece of cake. There is a lot of space for typos and engineers should check their configuration thoroughly before applying changes in production. Finally, mistakes in 2 most popular browsers prove that interpretation of CSP standard may be troublesome, even for the giant vendor which is Google.

intive Developers

At intive we’re building great digital products for our customers. Day by day. We want to share with you our way of doing things, the challenges we face, the tricks and shortcuts we discover. A little peek behind the scenes — welcome to our intive_dev blog!

Thanks to Mateusz Stahl

Jaroslaw Porzucek

Written by

Security Engineer at Intive

intive Developers

At intive we’re building great digital products for our customers. Day by day. We want to share with you our way of doing things, the challenges we face, the tricks and shortcuts we discover. A little peek behind the scenes — welcome to our intive_dev blog!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade