What other security products WON’T tell you about malicious archives

Assemblyline Blog Entry #7

Photo by Leonard Laub on Unsplash

⚠️⚠️⚠️ CAUTION ⚠️⚠️⚠️

This document describes malware analysis in Assemblyline. Malware analysis must be performed in an isolated environment.

In the previous blog entry of the Assemblyline series “One last HackTheBox Business 2023 CTF Forensic Challenge”, my colleague @gdesmar discussed how they used Assemblyline to complete the HackTheBox Business 2023 CTF!

In this post, we will be discussing how a malware campaign centred around archives seen in 2022 triggered large improvements to Assemblyline and CAPE Sandbox!

This article requires an understanding of how to interpret Assemblyline results, which can be gained by reading the documentation.


In the summer of 2022, Microsoft announced that Office macros would be blocked by default, which caused malware authors to pivot to a new attack vector: archives.


In the autumn of 2022, suspicious archive files were spotted in Assemblyline, coming from all over the Government of Canada (GC). At this point in time, Assemblyline had difficulty analyzing archives, leaving GC networks vulnerable. Fortunately, the reverse engineering team alerted me to this lack of detection.

Attack Chain

The attack chain consisted of two stages:

Stage 1

  • A user receives an email with a PDF file attachment. This file sometimes contained a password.
  • The user opens the PDF file attachment which contains a link hidden in its “Annotations” which would automatically open.
  • This link pointed to a web page that contained malicious JavaScript.

Stage 2a

  • The JavaScript writes an archive file to disk, which occasionally was password-protected. This archive file contained at least a Windows shortcut file, an EXE file, and a DLL file.

Stage 2b

  • The user enters the password if required and opens the archive file.
  • Inside the archive, the user sees a Windows shortcut file that uses a fake icon to trick the user into thinking that the file is something else.
  • The user opens the Windows shortcut file.
  • The Windows shortcut file runs a command that starts the legitimate EXE file via a relative path.
  • The legitimate EXE loads a malicious DLL found in the archive file via DLL side-loading.
  • A DLL function is run that beacons out to the command-and-control server for the next set of instructions.
Assemblyline struggled in the attack chain

Assemblyline struggled at multiple points in the attack chain as illustrated by the diagram above, so we added some improvements!


Extract the PDF link

My cyber analyst colleague Andrew Walker researched and identified that there was a link embedded in the “Annotations” section of the PDF. At the time this was a new technique, and we were not extracting the embedded URI when we saw samples that did this. To address this technique, my colleague @cccs-jh improved Assemblyline’s PDF analysis service to utilize the “PikePDF” Python library to extract the URI.

An example of a sample from this campaign:

PDF improvement achieved!

Now we have a link, what do we do?

Download the web page

Back in the autumn of 2022, Assemblyline was very “file-oriented”, and could not automatically fetch files served at suspicious URI locations. This was a large area of improvement that my colleague Ryan Samaroo addressed with a new Assemblyline service called URLDownloader which tries to download all suspicious links found in a submission.

For this sample, we found the suspicious URI, used the URLDownloader service to download the file served at that location, and the HTML file was resubmitted to Assemblyline for further analysis.

Diagram illustrating how a URI would be analyzed in Assemblyline

Nice! It looks like we were able to pull out a malicious JavaScript file from that web page with the Extract service.

Write the archive file

This JavaScript file will now be analyzed by our JavaScript-oriented service, JsJaws. JsJaws uses a Node environment to emulate JavaScript code in a “virtual machine” via an open-source tool called MalwareJail. Let’s look at the code:

Malicious JavaScript code
  1. Just looking at this JavaScript, there is a large array d that starts with a bunch of 5s.
  2. In a for-loop later in the code these array values are all decremented by 5, so it looks like this array contains numbers representing the bytes of a file.
  3. This if statement is looking for whether the value of window.location.pathname starts with a C:, so if the JavaScript file is being run on a Windows machine. I can mock this in JsJaws to enter this if statement.

4 and 5: I found that the Blob object and the saveAs method were not present in JsJaws’ Node environment at the time of the incident, so I updated the MalwareJail tool environment to extract the file after writing it to disk.

With these changes, JsJaws can emulate the JavaScript successfully to extract the archive to disk in the service, and then send that file through Assemblyline for further analysis!

I wrote a few signatures that would flag any JavaScript file that wrote an archive to disk using these methods so that Assemblyline would be able to score this behaviour in the future.


These archive files were not the standard ZIP file format but rather ISOs, UDFs, and VHDs.

A search in Assemblyline for specific archive file typesThe results of the above search


Occasionally, Assemblyline had issues correctly identifying some of the archives, so my colleague @gdesmar fixed this. His solution made it into another open-source project called SFlock via a pull request as well!

Correct identification for UDF files


We are identifying the unusual archives correctly now but a new problem arose, aaargh! The Extract service was unable to extract some of these archives because the tool it uses, 7-Zip, uses the file extension in its determination of the file type and how to extract files from it. Since Assemblyline neglects the submitted file extension, then 7-Zip would always think UDF files are ISO (which they sort of are, but we’re not going to get into that) and it wouldn’t extract the right things. So my colleague @gdesmar came to the rescue and made a special case of ISO/UDF to rename the file for 7-Zip 


Occasionally we saw an archive that was password-protected, and the password was found in one of the previous stages such as in the PDF or on the web page hosting the malicious JavaScript.

I worked with @gdesmar to extract useful text that was visible to the user and parse it in a way where potential passwords could be found. This list of potential passwords could then be sent to the appropriate Assemblyline service to be handled accordingly when analyzing a password-protected archive file.

DLL Side-Loading

Inside these archive files, which Assemblyline was now able to extract, were a series of files that consisted of at least a Windows shortcut file, a legitimate EXE file, and a malicious DLL file.

Let’s cherry-pick an archive file for demonstration:

File tree view of ISO file

By looking at the Characterize service analysis for the Windows shortcut file, we can see that there is a suspicious command line argument involved.

Suspicious command line arguments found in the Windows shortcut

The Characterize service was also able to extract a Batch file from this Windows shortcut:

Batch file extracted from Windows shortcut

If we look at the contents of that Batch file, we can see that when the LNK is run, the OneDriveUpdater.exe is started:

Batch file contents

Something interesting to note is that this ISO file contains two legitimate files.

Legitimate and illegitimate files in ISO

Looking at the LNK file’s command to start OneDriveUpdater.exe, it uses a relative path to access this EXE. Then the illegitimate version.dll is loaded via DLL side-loading. This decrypts the OneDriver.Update file and beacons out for the next payload.

This is a “great” anti-analysis technique for a variety of security products that extract items from archives and analyze each item in a silo. Since these files were dependent on each other, they would not execute correctly in isolation and no suspicious behaviour would be detected.

This is exactly what was happening in Assemblyline , with each file in the archive being dynamically analyzed in a silo. We use CAPE Sandbox for our dynamic analysis.

Extracted files analyzed in silos
  1. The LNK would be extracted and sent to our dynamic analysis service CAPE which would fail to run because it requires the EXE in the same folder (as seen in the Batch file contents).
  2. The EXE is a legitimate file, so nothing of interest here.
  3. The legitimate DLL would do nothing of interest here.
  4. The illegitimate DLL would run in CAPE but does nothing of interest when we try to run all its export entries since it requires the encrypted file OneDrive.Update before it does anything interesting.
  5. We don’t analyze the encrypted file in CAPE since it cannot be accurately identified.


At the time, CAPE only had support for ZIP files, and the ZIP execution module relied on a built-in Python library to perform the ZIP extraction, which was tailored to ZIP files and did not work for all archive files.

To solve this, I bootstrapped some Python code together that could execute 7-ZIP in the detonation environment such that it could extract any archive or password-protected archive it received.

I built on the limited logic that the ZIP execution module used to execute certain files. I looked at certain file extensions to determine what an “interesting file” to execute would be. The ZIP execution module would only run the first interesting file it found but given how the files in this campaign relied on each other so heavily, I thought it would be best if every interesting file was run so that all possible outcomes could be observed.

This feature of running every interesting file in an archive provided redundancy in case the file meant to be the entry point to the execution was not deemed interesting by the module.

Improved CAPE execution

With this new analysis module, we run the DLLs and EXE, which do nothing interesting on their own, and then we run the LNK, which starts the EXE and then loads the DLL export. This export then decrypts the encrypted file in the extracted folder, which allows it to beacon. This is good!

Since sharing is caring, I contributed the general archive support feature back to CAPE so that anyone who uses Assemblyline OR CAPE can benefit!

Initial pull request for generic archive support

With the use of this module, Assemblyline can send any archive file to CAPE for analysis, CAPE will extract the archive file to the disk of the detonation environment and execute all interesting files. The result of this for this campaign is that CAPE can execute the Windows shortcut file in the archive which was the starting point to run the rest of the attack chain successfully.

As the campaign has evolved, new requirements popped up to improve detection, and you can see the contributions I’ve made here.


Improvements to Assemblyline for the attack chain

This campaign provided a bunch of opportunities for the Canadian Centre for Cyber Security to react defensively by improving our tools such as:

  • Extracting links embedded in PDF Annotations
  • Creating a new service to download content hosted at suspicious URLs
  • Enhancing our Node.js emulation environment to extract archives written to disk
  • Add YARA rules to our identification engine for specific archive types
  • Pulled out possible passwords seen in HTML files to be used when extracting files from a password-protected archive
  • Added support for DLL side-loading in CAPE Sandbox via a generic archive module

Shout out to ❤️ our reverse-engineering team and cyber analysts ❤️ for keeping the Assemblyline team up to date on the latest campaign techniques and samples! This allows us to keep Assemblyline at the forefront of automated malware defence.

Stay tuned for our next article!

All images unless otherwise noted are by the author.

Article Link: What other security products WON’T tell you about malicious archives | by Kevin Hardy-Cooper, P.Eng | Nov, 2023 | Medium