Intigriti code challenge — World of Contexts

Laur Telliskivi
Axel Springer Tech
Published in
5 min readOct 9, 2023
Intigriti Tweet

I came across this interesting code challenge by Intigriti. If you don’t already follow them in the X app (formerly Twitter) then you should! They create great security content. Back to the challenge. Here is the code snippet:

1. <?php
2. $username = $_GET["username"];
3. ?>
4.
5. <html>
6. <head>
7. <meta charset="utf-8">
8. <title>Dashboard</title>
9. </head>
10. <body>
11. <h1 id="welcomeMsg">Welcome back</h1>
12.
13. <script>
14. var username = "<?php echo "" . htmlspecialchars($username) . ""; ?>"
15. var msg = document.getElementById('welcomeMsg');
16. msg.innerHTML = `Welcome back, ${username}!`;
17. </script>
18. </body>
19. </html>

This PHP script retrieves a username from the URL query parameters using the $_GET superglobal. It then embeds this username into an HTML page using JavaScript. The JavaScript code escapes the username to prevent potential security issues and updates the content of an HTML <h1> element with a welcome message that includes the username.

On the surface, everything looks fine and secure… or does it?

To understand if the code is secure (or not), we have to understand the Data flow of the application:

  1. User Input: The PHP script running on the server retrieves the “username” from the URL query parameters using $_GET["username"]. This input comes from the user via the URL.
  2. Server-Side Processing: The retrieved “username” is stored in a variable which is then passed through the htmlspecialchars function.
  3. Output: The escaped “username” is reflected back to the user (browser) embedded into JavaScript code and then used to update the content of an HTML <h1> element with the welcome message.

If you have worked with writing code in PHP in any capacity, you are probably familiar with htmlspecialchars function. htmlspecialchars function is a PHP function that converts special characters to their corresponding HTML entities. It is commonly used in PHP applications when outputting data to HTML or XML documents to prevent cross-site scripting (XSS) attacks and ensure that the data is displayed correctly without interfering with the structure of the document.

On the surface, this should be solid protection for protecting against cross-site scripting attacks. When auditing code for XSS, key tasks are to identify the context meaning the location within the server response where user-controllable data appears, and to determine if any input validation or other processing is being performed on that data by the application.

In the scope of our small application, there are 3 main contexts that we should investigate. I have listed them here with example attacks:

  • HTML body:
<div>UNTRUSTED DATA </div>

<div><script>alert(document.domain)</script></div> // Example Attack
<div><img src=1 onerror=alert(1)></div> // Example Attack
  • HTML tag attributes:
<div attr="UNTRUSTED DATA">

<div attr="" autofocus onfocus=alert(document.domain) x=""> // Example Attack
<div attr=""><script>alert(document.domain)</script>"> // Example Attack
  • JavaScript:
<script>
...
var input = 'UNTRUSTED DATA';
...
</script>

// Example Attack
<script>
var input = '';alert(document.domain)//'
</script>

Analyzing the data flow in the code, we can identify that the untrusted user input ends up in the JavaScript context, where the $username variable is evaluated (and concatenated to form a larger string) and then the resulting string is written to the HTML context.

2.   $username = $_GET["username"];
14. var username = "<?php echo "" . htmlspecialchars($username) . ""; ?>"
16. msg.innerHTML = `Welcome back, ${username}!`;

Is htmlspecialchars() safe method to make user input safe for javascript context? The htmlspecialchars() function converts some predefined characters to HTML entities.

The predefined characters are:

  • & (ampersand) becomes &amp;
  • “ (double quote) becomes &quot;
  • ‘ (single quote) becomes &#039;
  • < (less than) becomes &lt;
  • > becomes &gt;

So if we try a payload such as <img src=x onerror=alert(1)> , it would be made safe for the HTML context.

htmlspecialchars(<img src=x onerror=alert(1)>) => &lt;img src=x onerror=alert(1)&gt;

Simply put, the special characters such as ‘<’ and ‘>’ are treated as plain text in HTML and the payload would not execute.

Therefore it looks as if the protection in place does a good job at no allowing breaking out of the Javascript string context.

Enter character escape sequence

A character escape sequence is used to represent a character that may not be able to be conveniently represented in its literal form. In JavaScript, the \xHH escape sequence is used to represent a character with a given hexadecimal Unicode code point, where HH is a two-digit hexadecimal number representing the code point[1.]

For example, the \x41 escape sequence represents the uppercase letter 'A' in ASCII encoding, which has a Unicode code point of U+0041. In our context, the template literal will evaluate to letter “A”:

This looks very promising to bypass htmlspecialchars() as we can convert our special chars to \xHH escape sequence:

Conversion:
< => \x3c
> => \x3e

Payload:
\x3cimg src=x onerror=alert(1)\x3e

Trying this now will result in successfully triggering the alert box:

alert box

Preventing XSS

Encoding should be applied immediately prior to writing user-controllable data into a webpage. The choice of encoding depends on the specific context in which the data is being written. When dealing with values within a JavaScript string, a distinct type of escaping is necessary compared to values within an HTML context.

In a JavaScript string context, non-alphanumeric values should be Unicode-escaped:

<  =>  \u003c
> => \u003e

Here is a sample unicode-encoder in PHP that will do just that (courtesy of Portswigger Academy):

<?php 
function jsEscape($str) {
$output = '';
$str = str_split($str);
for($i=0;$i<count($str);$i++) {
$chrNum = ord($str[$i]);
$chr = $str[$i];
if($chrNum === 226) {
if(isset($str[$i+1]) && ord($str[$i+1]) === 128) {
if(isset($str[$i+2]) && ord($str[$i+2]) === 168) {
$output .= '\u2028';
$i += 2;
continue;
}
if(isset($str[$i+2]) && ord($str[$i+2]) === 169) {
$output .= '\u2029';
$i += 2;
continue;
}
}
}
switch($chr) {
case "'":
case '"':
case "\n";
case "\r";
case "&";
case "\\";
case "<":
case ">":
$output .= sprintf("\\u%04x", $chrNum);
break;
default:
$output .= $str[$i];
break;
}
}
return $output;
}
?>

With that, we can improve the code:

<?php
$username = isset($_GET["username"]) ? $_GET["username"] : "";
function jsEscape($str) {
// REDACTED FOR READABILITY
...
}
?>

<html>
<head>
<meta charset="utf-8">
<title>Dashboard</title>
</head>
<body>
<h1 id="welcomeMsg">Welcome back</h1>
<script>
// To escape user input in an HTML context
function htmlEncode(str){
return String(str).replace(/[^\w. ]/gi, function(c){
return '&#'+c.charCodeAt(0)+';';
});
}
var username = "<?php echo "" . jsEscape($username) . ""; ?>"
var msg = document.getElementById('welcomeMsg');
msg.innerHTML = htmlEncode(`Welcome back, ${username}!`);
</script>
</body>
</html>

Note that here we need to apply multiple layers of encoding in the correct order. Because we need to deal with both the Javascript context and the HTML context. Therefore it is necessary to first Unicode-escape the input and then HTML-encode it.

The main takeaway for developers here is to always understand the data flow of the application and the context in which untrusted user input is written.

If you enjoyed this content, follow me on X (former Twitter) for more security content!

--

--

Laur Telliskivi
Axel Springer Tech

Former requirements engineer and a musician. Currently Senior Security Engineer at Axel Springer. Follow me in Twitter: @tell1skivi