One of the most common threats that a threat hunter needs to monitor is malicious files. As we know, thousands of files are either directly infected or created with harmful intent, designed to trigger malicious activity. These objects can be shared, downloaded, and — worst of all — executed by our users. That’s why it’s critical to have proper monitoring and response mechanisms in place to react as quickly as possible when a threat is detected.
However, a significant risk lies in compressed files, which may contain malicious content that can bypass initial detection. These files are often extracted and executed later, posing a hidden threat.
Another common mistake occurs when one of the extracted files is flagged as malicious — the focus is often solely on removing that specific threat. We may overlook the need to trace where it came from, whether it was downloaded from a known location, and if the original source (such as a ZIP archive) still exists within the organization.
On the other hand, when multiple files are extracted from a single archive and only one is flagged as malicious, it’s easy to overlook the rest. The following queries will group all related files so you can identify — and remove — every file associated with the detected threat and source.
In this article, I’ll walk through different scenarios to help track the entire file chain — from the moment a user downloads an object, to how it ends up in our systems.
Correlating File Events to Identify Downloaded or Extracted Files
The key field for grouping events to determine whether a file was downloaded from a website or extracted from another file is InitiatingProcessUniqueId. By grouping events using this field, you can correlate related activity across tables like DeviceFileEvents, DeviceEvents, and others.
To start, I’d like to show you a general-purpose KQL query that you can run to see what I mean. This query groups all files by their originating file — such as a ZIP archive or from a Website — and includes all the files that were extracted from it.
DeviceFileEvents
| extend FileOriginReferrerUrl_ext = extract(@“[^\]+$”, 0, FileOriginReferrerUrl)
| where isnotempty( FileOriginReferrerUrl)
| join kind=inner ( DeviceEvents) on $left.InitiatingProcessUniqueId == $right.InitiatingProcessUniqueId
| extend FileExtension = extract(@“.([a-zA-Z0-9]+)$”, 1, FileName)
| extend Source_Type = case(FileOriginReferrerUrl startswith “https://”, " Web"," File")
| summarize total_Files= dcount(FileName), Files_after_execution= strcat(“️ “,make_set(FileName)),make_set(FileExtension),make_set(ActionType),make_set(FolderPath),SHA256_Group=make_set(SHA2561) by InitiatingProcessUniqueId,AccountUpn = strcat(””,InitiatingProcessAccountUpn), Device = strcat(" “,DeviceName), FileOriginReferrerUrl,Source_Type, OriginalFile=strcat(” “,FileOriginReferrerUrl_ext)
Among the benefits of Correlating File Events Using InitiatingProcessUniqueId:
- Provides full visibility into all files related to a single action (e.g., ZIP extraction).
- Helps detect hidden or secondary malicious files that may not trigger alerts.
- Traces the origin of files — whether downloaded or extracted from another source.
- Strengthens root cause analysis and incident investigation.
- Enhances detection of multi-stage payloads or complex delivery methods.
- Reduces the risk of overlooking related threats during response.
- Builds context around suspicious activity for better decision-making.
- Improves threat hunting efficiency by revealing attack chains clearly.
On the other hand, we can enhance this detection by adding additional checks — for example, identifying when one of the extracted or downloaded files has a high-risk file extension.
| where FileExtension in (“exe”,“cmd”,“bat”,“ps1”,“js”,“vbs”)
DeviceFileEvents
| extend FileOriginReferrerUrl_ext = extract(@”[^\]+$“, 0, FileOriginReferrerUrl)
| where isnotempty( FileOriginReferrerUrl)
| join kind=inner ( DeviceEvents) on $left.InitiatingProcessUniqueId == $right.InitiatingProcessUniqueId
| extend FileExtension = extract(@”.([a-zA-Z0-9]+)$“, 1, FileName)
| extend Source_Type = case(FileOriginReferrerUrl startswith “https://”, " Web”," File")
| where FileExtension in (“exe”,“cmd”,“bat”,“ps1”,“js”,“vbs”)
| summarize total_Files= dcount(FileName), Files_after_execution= strcat(“️ “,make_set(FileName)),make_set(FileExtension),make_set(ActionType),make_set(FolderPath),SHA256_Group=make_set(SHA2561) by InitiatingProcessUniqueId, AccountUpn = strcat(””,InitiatingProcessAccountUpn),Device = strcat(" “,DeviceName), FileOriginReferrerUrl,Source_Type, OriginalFile=strcat(” “,FileOriginReferrerUrl_ext)
Updating FileHash Detection Rules
Nowadays, you can find several KQL queries and Detection Rules that rely on matching file hash values located on endpoint devices against threat intelligence sources such as MalwareBazaar:
let MalwareBazaar = externaldata(MD5: string) [“https://bazaar.abuse.ch/export/txt/md5/recent”] with (format=“txt”, ignoreFirstRecord=True);
let MaliciousMD5 = MalwareBazaar | where MD5 !startswith “#”;
DeviceFileEvents
| join kind=inner ( MaliciousMD5) on $left.MD5 == $right.MD5
This KQL query works well and can be configured as a Detection rule to detect or automatically react — such as isolating a device or blocking a user — when a match is found.
However, one drawback is that once an alert is triggered, it can be difficult to immediately understand if we talk about an isolated file and it origin, since the query returns multiple fields and the key information isn’t always easy to identify at first glance.
To improve this, the following KQL Query (which can be configured as Detection Rule) can enrich the existing data to help quickly pinpoint different aspects, which in turn accelerates our investigation process — because as we know, in cybersecurity, time response is critical.
By enhancing this detection to include full file paths, we can:
- Determine whether the file was downloaded from a website or extracted from another file, and on which device.
- If the file was extracted, trace it back to the original source of the infected file.
- Contact the device owner who downloaded the file and identify where the file — such as a ZIP archive — is hosted, in order to facilitate its removal.
let MalwareBazaar = externaldata(MD5: string) [“https://bazaar.abuse.ch/export/txt/md5/recent”] with (format=“txt”, ignoreFirstRecord=True);
let MaliciousMD5 = MalwareBazaar | where MD5 !startswith “#”;
DeviceFileEvents
| extend FileOriginReferrerUrl_ext = extract(@”[^\]+$“, 0, FileOriginReferrerUrl)
| where isnotempty( FileOriginReferrerUrl)
| join kind=inner ( DeviceEvents) on $left.InitiatingProcessUniqueId == $right.InitiatingProcessUniqueId
| extend FileExtension = extract(@”.([a-zA-Z0-9]+)$“, 1, FileName)
| extend Source_Type = case(FileOriginReferrerUrl startswith “https://”, " Web”," File")
| join kind=inner ( MaliciousMD5) on $left.MD5 == $right.MD5
| summarize total_Files= dcount(FileName),make_set(FileExtension),Actions_File_Source = make_set(ActionType),Actions_with_the_files=make_set(ActionType1),make_set(FolderPath),MD5_Group=make_set(MD51) by ReportId,Timestamp,AccountUpn = strcat(“”,InitiatingProcessAccountUpn), DeviceName,DeviceId, Malicious_File = strcat(“️”,FileName), FileOriginReferrerUrl,Source_Type, OriginalFile=strcat(" ",FileOriginReferrerUrl_ext)
Summary
While we often focus deeply on developing new detections to address evolving threats, it’s equally important to invest time in enhancing and enriching our existing ones. Leveraging the insights gained through experience helps improve their accuracy and effectiveness.
By enriching the results of our queries and detection rules, we gain greater visibility and traceability into the lifecycle of a file from the moment it first appears on any of our endpoints. This, in turn, enables faster response times and strengthens the protection of our users and systems.
Detection Response by tracing File Lineage with KQL Queries was originally published in Detect FYI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Introduction to Malware Binary Triage (IMBT) Course
Looking to level up your skills? Get 10% off using coupon code: MWNEWS10 for any flavor.
Enroll Now and Save 10%: Coupon Code MWNEWS10
Note: Affiliate link – your enrollment helps support this platform at no extra cost to you.
Article Link: Detection Response by tracing File Lineage with KQL Queries | by Sergio Albea | May, 2025 | Detect FYI