Analysis Of Unusual ZIP Files

Intrigued by a blog post from SpiderLabs on a special ZIP file they found, I took a closer look myself.

That special ZIP file is a concatenation of 2 ZIP files, the first containing a single PNG file (with extension .jpg) and the second a single EXE file (malware). Various archive managers and security products handle this file differently, some “seeing” only the PNG file, others only the EXE file.

My zipdump.py tool reports the following for this special ZIP file:

zipdump.py is essentially a wrapper for Python’s zipfile module, and this module parses ZIP files “starting from the end of the file”. That’s why it finds the second ZIP file (appended to the first ZIP file), containing the malicious EXE file.

To help with the analysis of such special/malformed ZIP files, I added an option (-f –find) to zipdump. This option scans the content of the provided file looking for ZIP records. ZIP records start with ASCII string PK followed by 2 bytes to indicate the record type (byte values less than 16).

Here I use option “-f list” to list all PK records found in a ZIP file containing a single text file:

This is how a normal ZIP file containing a single file looks on the inside.

The file starts with a “local file header”, a PK record that starts with ASCII characters PK followed by bytes 0x03 and 0x04 (that’s 50 4B 03 04 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0304. This header is followed by the contained file (usually compressed).

Then there is a “central directory header”, a PK record that starts with ASCII characters PK followed by bytes 0x01 and 0x02 (that’s 50 4B 01 02 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0102. This header contains an offset pointing to the corresponding PK0304 record.

And at the end of the ZIP file, there is a “end of central directory”, a PK record that starts with ASCII characters PK followed by bytes 0x05 and 0x06 (that’s 50 4B 05 06 in hexadecimal). In zipdump’s report, such a PK record is identified with PK0506. This header contains an offset pointing to the first PK0102 record.

A ZIP file containing 2 files looks like this, when scanned with zipdump’s option -f list:

Starting with 2 PK0304 records (one for each contained file), followed by 2 PK0102 records, and 1 PK0506 record.

Armed with this knowledge, we take a look at our malicious ZIP file:

We see 2 PK0506 records, and this is unusual.

We see the following sequence of records twice: PK0304, PK0102, PK0506.

From our previous examples, we can now understand that this sample contains 2 ZIP files.

Remark that zipdump assigned an index to both PK0506 records: 1 and 2. This index can be used to select one of the 2 ZIP files for further analysis. Like in this example, where I select the first ZIP file:

Using option “-f 1” (in stead of “-f list”) selects the first ZIP file in the provide sample, and lists its content.

It can then be further analyzed with zipdump like usual, for example, selecting the first file (order.jpg) inside the first ZIP file for an hex/ascii dump:

Likewise, “-f 2” will select the second ZIP file found inside the sample:

-f is a new option that I added for special/malformed ZIP files, but this is a work in progress, as there are many ways to malform ZIP files.

For example, I created a PoC malformed ZIP file that contains a single file, with reversed PK record order. Here is the output for the normal and “reversed” zip files (malformed, e.g. PK records order reversed):

This file can be opened with Windows Explorer, but there are tools and libraries than can not handle it. Like Python’s zipfile module:

I will further develop zipdump to handle malformed ZIP files as best as possible.

The current version (zipdump 0.0.16) is just a start:

  • it parses only 3 PK record types (PK0304, PK0102 and PK0506), other types are ignored
  • it does minimal parsing of these records: for example, there is no parsing/checking of offsets in this version

And finally, I also created a video showing how to use this new feature:

Article Link: https://blog.didierstevens.com/2020/01/06/analysis-of-unusual-zip-files/