Thoughts on Detection

Playing Detection with a Full Deck

The importance of Identification in Detection Engineering

Introduction

As a consultant I have the opportunity to work with clients to help mature their Detection and Response (D&R) process. Sometimes this includes helping with the creation, improvement, or evaluation of detection rules. After helping with many clients with numerous detection rules, I observed one consistent theme that kept popping up, many of the rules were written in a way that seemed to be missing a large portion of the potential detection opportunities. Since this issue seemed to be happening at every client and in popular open source rule sets, I wanted to look into this phenomenon. Eventually, I concluded that the detection engineering process likely includes at least two phases: identification and classification. The identification phase, which is the part of the process that we all seemed to be missing, is focused on identifying all events that should be subject to the detection. If your detection’s goal is to identify malicious scheduled task creation, then you must first be able to identify ALL scheduled task creation. The classification phase is responsible for distinguishing between “benign” scheduled tasks and “malicious” scheduled tasks. This is typically accomplished using a classification rule (in the industry we call this a detection, an analytic, a signature, or an IOC). This post introduces the identification phase and will demonstrate its importance through a simple test that you can implement yourself at the end of the post.

Why Identification?

Imagine you have a deck of playing cards and you’ve been asked to separate them into piles, one for each suit. You begin to iterate through each card to identify its suit (represented by an icon, typically in the corner) and you place each one in the pile that corresponds to that suit. At the end you have four piles (spades, clubs, diamonds, and heart) that each have 13 cards. You just practiced a statistical process called classification. In statistics, the total set of objects to be classified is known as the population, and it is critical to ensure that you accurately identify the objects that should be in the population before you begin classification. What would the results be if we started with only red cards? Instead of ending with four suits, everything would be divided into two suits because we started with half the deck. This could cause us to have an incorrect perspective of a deck of cards, thinking there are only two suits when there are in fact four. It is not an issue to look at the red cards in a deck deliberately, however it is an issue if we think that “a deck of cards” consists of ONLY red cards.

What I noticed during my interactions with detection creation/review is that we don’t put enough focus on making sure we start with a full deck before we begin classifying. While it might be obvious if you are missing all of the spades and clubs, it isn’t quite as obvious when you are only missing two or three cards. When we create detections, it is similarly imperative that we apply our classification rule to the appropriate population of events. If we accidentally exclude half of the relevant events from the population, then we are creating opportunities for false negatives (evasion) before even starting. Luckily, in most cases, establishing the correct population is a solvable problem!

Defining the Problem

To help make this problem more concrete let’s look at a detection rule that I encountered with a client. In this case, the client had inherited a number of old rules and wanted to evaluate whether they were worth keeping. The rule in question was created to “detect malicious service creation” and looked something like the rule shown below:

DeviceProcessEvents
| where FileName =~ sc.exe
| where ProcessCommandLine has "create"

In plain terms, this rule is looking for any process where the process name is sc.exe (the =~ indicates that the search should be case insensitive) and where the process’s command line parameters included the string create. When we began to analyze this query we wanted to understand the thought process that was used to create it. In one sense, sc.exe is a program that is included by default on Windows with the purpose of interacting with the Service Control Manager. One of its features is the ability to create services using the create command line parameter.

Identification

With that understanding our next step was to understand how this rule implemented identification. Our goal was to determine what the population should be for this detection rule. Remember that this rule’s stated goal is to “detect malicious service creation”, so what should the population include? To answer this question, I like to say, “in order to detect malicious service creation, you must be able to identify ALL service creation.” The great thing about that saying is that you can replace “service creation” with almost any other topic that you are trying to detect. Now let’s think about whether the rule accomplishes the goal of “identifying ALL service creation”. Is sc.exe the only way to create services? No! At the very least, Windows has an API function called CreateService that a programmer can use to create an application that creates services. This means that our detection rule likely does not identify every service that is created in our network. In other words, we are probably missing a few cards from our deck. At this point we’ve found that we have identification issues, but how do we find the correct answer?

Finding the Base Condition

The answer to the question posed above is that we must discover an event that must occur every time a service is created regardless of the tool that was used to create it. I call this event the base condition. We know that our current detection rule, which looks for process creation events where the process’s name is sc.exe, is not the base condition, but what event satisfies the base condition? To answer this question, we did some research and eventually realized that every service no matter how it was created must be registered in the registry at HKLM\System\CurrentControlSet\Services\<ServiceName>. With this information, we can reimplement the identification portion of the detection rule to make sure that the classification logic is being applied to ALL services instead of only the subset of services created by sc.exe. While we were fairly confident that by using the base condition we’d find more services, not only do we not have evidence of this being the case, but we have no idea to what degree this might be true. How many services might we be missing? The next portion of this article walks through some simple analysis to demonstrate the importance of a good identification phase.

If you are wondering how we arrived at the base condition, check out my blogpost about Capability Abstraction which describes the process we use. We also offer a new training class called Adversary Tactics: Tradecraft Analysis that teaches this process from a technical perspective.

Confirming my Hypothesis

At this point, we’ve established our skepticism for the rule’s approach to identifying the population that is relevant to detecting malicious service creation. We asserted that it is possible for an attacker to create a malicious service in a way that does not rely on sc.exe to accomplish the task. The question is, how much of the figurative card deck are we missing? Are we missing just a couple of cards (say the four 2s) or are we missing an entire suit (say the hearts). After all, sc.exe is the tool that is built into the Operating System for just this purpose, so it must be used fairly frequently right? At this point, my hypothesis is that services created via sc.exe make up approximately 10% of all services, but I want to run a small test in a real world environment to validate that hypothesis.

Approach

For the test, I wanted to establish a consistent protocol and apply it to both the original search (sc.exe with create in the command line) and a search for the base condition (creation of a registry key at HKLM\System\CurrentControlSet\Services\<ServiceName>). I decided to use a real world Microsoft Defender for Endpoint deployment in a network with ~5000 endpoints and developed a search query for each approach. Next I eliminated per user services, which are a set of services that are created every for every user logon session (the approach I used to exclude these is fairly basic and could be bypassed by an intelligent attacker, but for the purposes of this experiment that is an acceptable risk). I found that per user services are not created by sc.exe, and I felt the delta would more accurately describe what I’m interested in if they were excluded. Lastly, the search was performed over a consistent timeframe (January 1st, 2021 — February 28th, 2021). Let’s look at the results.

I’ve included the search queries below incase anyone wants to replicate this test in their environment.

Original (sc.exe):

DeviceProcessEvents
| where FileName =~ "sc.exe"
| where ProcessCommandLine has "create"
| summarize count(), dcount(DeviceName), dcount(datetime_part("dayOfYear", TimeGenerated))

Base Condition:

let peruserservices = dynamic(["AarSvc","BcastDVRUserService","BluetoothUserService","CaptureService","cbdhsvc","CDPUserSvc","ConsentUxUserSvc","CredentialEnrollmentManagerUserSvc","DeviceAssociationBrokerSvc","DevicePickerUserSvc","DevicesFlowUserSvc","LxssManagerUser","MessagingService","OneSyncSvc","PimIndexMaintenanceSvc","PrintWorkflowUserSvc","UdkUserSvc","UnistoreSvc","UserDataSvc","WpnUserService"]);
DeviceRegistryEvents
| extend Days = datetime_part("dayOfYear", TimeGenerated)
| where ActionType == "RegistryKeyCreated"
| where RegistryKey matches regex "(?i)^HKEY_LOCAL_MACHINE\\\\SYSTEM\\\\(CurrentControlSet|ControlSet\\d{3})\\\\Services\\\\[^\\/\\\\]*$"
| extend ServiceName = split(RegistryKey, @"\")[4]
| extend SubService = split(ServiceName, @"_")[0]
| where SubService !in (peruserservices)
| summarize dcount(Days), dcount(DeviceName), count()

Result

So far this article has introduced the idea of identification as part of your detection engineering process. We hypothesized that services created bysc.exe represent only a small portion of the total services (~10%). While we are fairly confident that this is true, we designed this test to validate our hypothesis. In this section, we will evaluate the results of these tests. First we will run the original query:

As you can see, between January 1st and February 28th, sc.exe was called with the create parameter a total of 29 times. This means that if you used the original detection rule you’d be aware of 29 service creation events.

Let’s run the second test using the base condition to see how it affect the results:

Using the base condition we found 104,683 services (not including per user services) created between January 1st and February 28th. Remember that our hypothesis was that sc.exe was responsible for ~10% of the total services, but the results found that the original detection rule only accounted for .03% (29/104683) of the total service created during the time period! I can honestly say that I thought there would be a discrepancy, but I didn’t expect a discrepancy of this magnitude.

Conclusion

In this article we explored a critical and, in my opinion, often overlooked aspect of detection engineers that I call identification. We found that it is possible to reduce the scope of your detection rule to almost zero if you aren’t careful to ensure that you are applying your rule to the correct population of events. In my experience this is an extremely common mistake for organizations and vendors alike. While not all rules will be affected to the same magnitude that we saw with our malicious service creation example, the risk cannot be quantified without looking into it explicitly. Whether you build all of your detection rules internally, outsource your rule creation to vendors, or fall somewhere in between it is imperative that you reevaluate their efficacy with identification in mind.

At this point you might be wondering how can your team start doing this? I’ve found that the best approach is to answer three questions that will give you the context and perspective necessary to arrive at the correct solution. The questions are shared below:

  1. What type of activity is this rule trying to detect? (Detection Goal)
  2. How does this rule establish the set of events that are relevant to the goal? (Identification)
  3. How does the rule differentiate between “good” and “bad” events? (Classification)

Many of the production rules that I see do not have an answer to one or more of these questions. In fact, some rules don’t have an answer for any of these questions, yet organizations spend resources tracking down alerts that are generated from these ill defined rules.

As with many things, practice makes perfect (well perfect practice makes perfect anyway). If you are interested in becoming more proficient at evaluating how rules implement identification, check out an open source rule repository like the Sigma Project, pick a random rule, and ask yourself how that rule addresses identification. Once you deduce the answer, ask yourself if you think the rule’s solution represents the base condition. Maybe even try to follow the Capability Abstraction process to find the base condition and compare your result to their solution. As with anything, this is a skill that will become easier with repetition and experience, but it is critical that we start paying more attention to it now!

Thoughts on Detection was originally published in Posts By SpecterOps Team Members on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: Playing Detection with a Full Deck | by Jared Atkinson | Posts By SpecterOps Team Members