How to filter a zip file when extracting

Handling file uploads in a web application can cause serious security concerns. They can become a gateway for hackers to exploit your application by uploading malicious files to your server. However with the right know-how these problems can be avoided.

Recently in the framework I’m developing, I made the decision to allow zip files to be uploaded as part of the media management process. This introduced a large concern that the contents of the zip file are unknown, thus potentially making easy for a hacker to upload a script via zip, extract it on the server and cause all sorts of damage.

After a quick search I couldn’t find anything that would allow me to filter a zip extract either by extension(s) or a more advanced regular expression. This lead me to create an extension for the ZipArchive class within php.

Filtering the Zip Extract Process

I’ve created a simple class that extends the ZipArchive (available from PHP 5.2). It works based on the existing extractTo() method. By calling filteredExtractTo(directory, filter) you can specify an array of filters that each filename from within the zip file must match.

Each filter can either be a extension type, or a regular expression. I’ve optimised the filtering process for extension matches, as I would assume this is more often required than regular expressions (for speed, as zip files can potentially contain a lot of files!)

Let me show some code:

class Zip extends ZipArchive
{
const CHMOD = 0755;

/**
* Extract the zip file, only extracting the files that match at least one of the supplied filters
*
* @param string $directory where to extract the zip to
* @param array $filters either extensions, or regular expressions to match against each file name.
* @return boolean
*/
public function filteredExtractTo($directory, array $filters = null)
{
if(count($filters) === 0) {
return $this->extractTo($directory);
}

$this->createDir($directory);

$copySource = 'zip://'.$this->filename.'#';
for($i = 0; $i < $this->numFiles; $i++) {
$entry = $this->getNameIndex($i);
$filename = basename($entry);


if($this->matchFileToFilter($filename, $filters)) {
$base = dirname($entry);
$newPath = $directory.DIRECTORY_SEPARATOR.$base.DIRECTORY_SEPARATOR;
$this->createDir($newPath);

// extract file
copy($copySource.$entry, $newPath.$filename);
}
}
}

protected function createDir($path)
{
if(!is_dir($path)) {
if(!mkdir($path, self::CHMOD, true)) {
throw new Exception('unable to create path '.$path);
}
}
}

/**
* Match the file name to one of the filters
* @param string $filename
* @param array $filters
* @return int array index of matched filter, or false for no match
*/
protected function matchFileToFilter($filename, array $filters)
{
$ext = strtolower(pathinfo($filename, PATHINFO_EXTENSION));
if(in_array($ext, array_map('strtolower', $filters))) {
// one of the filters is an extension, and it matches file extension
return true;
}

foreach($filters as $i=>$filter) {
// remove extension filters
if(!ctype_alnum($filter[0]) && preg_match($filter, $filename)) {
return true;
}
}
return false;
}
}

The first method, filteredExtractTo is the method you’ll use to extract the zip contents. It iterates through each file, checking for a match against the matchFileToFilter method. If a positive match is found, that file is copied to the file system, keeping the zip structure intact as it extracts the files.

The second method createDir is a simple utility method for creating a directory structure based off the provided path. This should reside in some sort of directory utility class, however for the purpose of this example if included it as a protected method.

Finally the method matchFileToFilter, is what handles the filtering logic. I’ve written it this way to allow you to change this process easily. It simply looks at the array of filters and applies either an in_array call for extension filters, or iterates through the regexp filters for a match.

By returning a boolean value it determines if the file should be extracted.

To use the zip class, it’s as simple as using the existing ZipArchive class.

$unzipPath = dirname(__FILE__).'/extract'; 
$zip = new Zip;
if ($zip->open('sample.zip')) { 
$zip->filteredExtractTo($unzipPath, array('jpg', 'png', '/untitled/i'));
$zip->close();
}

As you can see, I’ve restricted the extract to jpg & png files + any file that contains the word untitled.

Like what you read? Give Ben Rowe a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.