Injection Attacks
Introduction to PDF Generation Vulnerabilities
Many web applications provide a PDF generation functionality, such as for invoices or reports. Most of these PDFs contain dynamic user input. In the next couple of sections, we will discuss misconfigurations and bugs that can result in security vulnerabilities due to HTML injection in the input for PDF generation libraries.
PDF Generation
The Portable Document Format (PDF) is a file format implemented to provide platform-independent document presentation. Since PDF files are widely used for many applications, PDF generation is a commonly implemented functionality in web applications. To enable PDF generation, web applications rely on PDF generation libraries or plugins.
However, misconfigurations, a lack of proper configuration, and outdated versions of these external libraries open the door for vulnerabilities, mainly caused by users feeding malicious input that does not get properly sanitized.
As an example, here are a few PDF generation libraries commonly used in web applications:
Since web applications need to be able to design the layout of the resulting PDF files, these libraries accept HTML code as input and use it to generate the final PDF file. This allows the web application to control the design of the PDF file via CSS in the HTML code. The libraries work by parsing the HTML code, rendering it, and creating a PDF.
Example: wkhtmltopdf
As an example of how PDF generators work, we will look at wkhtmltopdf, for which you can download a precompiled binary here. Note how there is a bold security notice at the top of the page that hints at the vulnerabilities we are about to discuss in the upcoming sections:
Do not use wkhtmltopdf with any untrusted HTML – be sure to sanitize any user-supplied HTML/JS, otherwise it can lead to complete takeover of the server it is running on!
After downloading wkhtmltopdf, we can install it using the following command on Debian-based Linux distributions:
[!bash!]$ sudo dpkg -i wkhtmltox_0.12.6.1-2.bullseye_amd64.deb
Running wkhtmltopdf with the -h option will display the tool's help information:
[!bash!]$ wkhtmltopdf -h
Name:
wkhtmltopdf 0.12.6.1 (with patched qt)
Synopsis:
wkhtmltopdf [GLOBAL OPTION]... [OBJECT]... <output file>
<SNIP>
When providing a URL to wkhtmltopdf, it will automatically fetch the website and convert it to a PDF:
[!bash!]$ wkhtmltopdf https://academy.hackthebox.com/ htb.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
Looking at the resulting PDF, we can recognize the HackTheBox Academy website, although it has been resized to fit on PDF pages:

Furthermore, we can provide the tool with a local HTML file to simulate more closely what a PDF generation library in a web application does. As an example, let us use the following HTML file:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<h1>Hello World!</h1>
<p>This is some text.</p>
</body>
</html>
We can now run wkhtmltopdf on the HTML file to produce a PDF equivalent:
[!bash!]$ wkhtmltopdf ./index.html test.pdf
Loading pages (1/6)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done

To simulate a real-world example of how a web application might use PDF generation, let us consider an online shop that provides PDF invoices to customers after completing an order. This can easily be achieved using a PDF generation library. For example, let us download an open-source invoice HTML template from here. We can then run wkhtmltopdf to generate a PDF invoice from the HTML code with its custom CSS. The generated PDF looks like this:

Analysis of PDF Files
We need to determine which PDF generation library a web application utilizes to target specific vulnerabilities and misconfigurations. Fortunately, most of these libraries add information in the metadata of the generated PDF that helps us identify the library. Thus, we simply need to get our hands on a PDF generated by the web application for analysis. To display the metadata of a PDF file, we can use the tool exiftool, which can be installed like so:
[!bash!]$ apt install libimage-exiftool-perl
Running exiftool with the -h option will display the tool's help information:
[!bash!]$ exiftool -h
Syntax: exiftool [OPTIONS] FILE
Consult the exiftool documentation for a full list of options
When we run exiftool on a generated PDF file, the creator and producer metadata fields give us more information about the PDF's generation library and its specific version, in this case wkhtmltopdf 0.12.6.1 and Qt 4.8.7, respectively:
[!bash!]$ exiftool invoice.pdf
ExifTool Version Number : 12.16
File Name : invoice.pdf
Directory : .
File Size : 18 KiB
File Modification Date/Time : 2023:03:13 20:42:24+01:00
File Access Date/Time : 2023:03:13 20:42:24+01:00
File Inode Change Date/Time : 2023:03:13 20:42:24+01:00
File Permissions : rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.4
Linearized : No
Title : A simple, clean, and responsive HTML invoice template
Creator : wkhtmltopdf 0.12.6.1
Producer : Qt 4.8.7
Create Date : 2023:03:13 20:42:24+01:00
Page Count : 1
This allows us to search for vulnerabilities for a specific version of the PDF generation library. Alternatively, we can also use the tool pdfinfo to achieve the same task:
[!bash!]$ pdfinfo invoice.pdf
Title: A simple, clean, and responsive HTML invoice template
Creator: wkhtmltopdf 0.12.6.1
Producer: Qt 4.8.7
CreationDate: Mon Mar 13 20:42:24 2023 CET
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 1
Encrypted: no
Page size: 595 x 842 pts (A4)
Page rot: 0
File size: 18488 bytes
Optimized: no
PDF version: 1.4
Here is another example output from exiftool on a PDF generated by a different library (dompdf):
[!bash!]$ exiftool file.pdf
ExifTool Version Number : 12.16
File Name : file.pdf
Directory : .
File Size : 1071 bytes
File Modification Date/Time : 2023:03:13 20:45:10+01:00
File Access Date/Time : 2023:03:13 20:45:10+01:00
File Inode Change Date/Time : 2023:03:13 20:45:14+01:00
File Permissions : rw-r--r--
File Type : PDF
File Type Extension : pdf
MIME Type : application/pdf
PDF Version : 1.7
Linearized : No
Page Count : 1
Producer : dompdf 2.0.3 + CPDF
Create Date : 2023:03:13 12:45:05-07:00
Modify Date : 2023:03:13 12:45:05-07:00
Table of Contents
Introduction to Injection Attacks
Introduction to Injection AttacksXPath Injection
Introduction to XPath Injection XPath - Authentication Bypass XPath - Data Exfiltration XPath - Advanced Data Exfiltration XPath - Blind Exploitation XPath Injection Prevention & ToolsLDAP Injection
Introduction to LDAP Injection LDAP - Authentication Bypass LDAP - Data Exfiltration & Blind Exploitation LDAP Injection PreventionHTML Injection in PDF Generators
Introduction to PDF Generation Vulnerabilities Exploitation of PDF Generation Vulnerabilities Prevention of PDF Generation VulnerabilitiesSkills Assessment
Skills AssessmentMy Workstation
OFFLINE
/ 1 spawns left