XML External Entity (OWASP Top 10)
An XML External Entity (XXE) attack is a vulnerability that abuses features of XML parsers/data. It allows an attacker to interact with any backend or external systems that the application itself can access and can allow the attacker to read the file on that system. They can also cause Denial of Service (DoS) attack or could use XXE to perform Server-Side Request Forgery (SSRF) inducing the web application to make requests to other applications. XXE may even enable port scanning and lead to remote code execution.
There are two types of XXE attacks: in-band and out-of-band (OOB-XXE).
1) An in-band XXE attack is the one in which the attacker can receive an immediate response to the XXE payload (Payload, in simple terms, are simple scripts that the hackers utilize to interact with a hacked system. Using payloads, they can transfer data to a victim system.).
2) out-of-band XXE attacks (also called blind XXE), there is no immediate response from the web application and attacker has to reflect the output of their XXE payload to some other file or their own server.
XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is a markup language used for storing and transporting data.
Why we use XML?
1. XML is platform-independent and programming language independent, thus it can be used on any system and supports the technology change when that happens.
2. The data stored and transported using XML can be changed at any point in time without affecting the data presentation.
3. XML allows validation using DTD and Schema. This validation ensures that the XML document is free from any syntax error. XML validation is the process of checking a document written in XML to confirm that it is both well-formed and also “valid” in that it follows a defined structure.
4. XML simplifies data sharing between various systems because of its platform-independent nature. XML data doesn’t require any conversion when transferred between different systems.
An XML document must have a parent element and some child elements. It is also case-sensitive.
DTD (Document Type Definition)
A DTD defines the structure and the legal elements and attributes of an XML document.
Let us try to understand this with the help of an example. Say we have a file named
note.dtd with the following content:
<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]>
Now we can use this DTD to validate the information of some XML document and make sure that the XML file conforms to the rules of that DTD.
Ex: Below is given an XML document that uses
<?xml version=”1.0" encoding=”UTF-8"?>
<!DOCTYPE note SYSTEM “note.dtd”>
So now let’s understand how that DTD validates the XML. Here’s what all those terms used in
- !DOCTYPE note — Defines a root element of the document named note
- !ELEMENT note — Defines that the note element must contain the elements: “to, from, heading, body”
- !ELEMENT to — Defines the
toelement to be of type "#PCDATA"
- !ELEMENT from — Defines the
fromelement to be of type "#PCDATA"
- !ELEMENT heading — Defines the
headingelement to be of type "#PCDATA"
- !ELEMENT body — Defines the body
elementto be of type "#PCDATA"
NOTE: #PCDATA means parseable character data.
Entity references: An entity reference is an alternative name for a series of characters. You can use an entity in the &name; format, where name is the name of the entity. There are some predefined entities in XML, furthermore you can declare entities in a DTD (Document Type Definition).
The first payload we’ll see is very simple.
<!DOCTYPE replace [<!ENTITY name "feast"> ]>
As we can see we are defining a
name and assigning it a value
feast. Later we are using that ENTITY in our code.
We can also use XXE to read some file from the system by defining an ENTITY and having it use the SYSTEM keyword
<!DOCTYPE root [<!ENTITY read SYSTEM 'file:///etc/passwd'>]>
Here again, we are defining an ENTITY with the name
read but the difference is that we are setting it value to `SYSTEM` and path of the file.
If we use this payload then a website vulnerable to XXE(normally) would display the content of the file