This post will cover hunt methodology for using standard deviation to detect outliers in PowerShell command line activity. The analytics presented are for Microsoft Defender for Endpoint (MDE) using KQL.
Introduction
PowerShell continues to be one of the most leveraged tools in an attacker’s repertoire given its flexibility and deep integration with Windows systems. However, detecting advanced adversaries can often go beyond monitoring base64-encoded commands, parameters, or Cmdlets. In fact, PowerShell commands can be constructed in a variety of unusual ways via custom functions, variables, and other applicable syntax. Detection becomes increasingly difficult when LOLBins are introduced to further chain commands and obfuscate parent-child process trees. Collectively, these techniques trivialize bypassing static detection signatures and exemplify the need for behavioral heuristics. In the absence of such comprehensive detection, we can still opt to apply statistical methods to uncover advanced adversaries and emerging techniques. Therefore, this post will discuss methodology for applying standard deviation to PowerShell command line data in an effort to hunt for adversarial activity.
PowerShell Obfuscation in the Wild
Before we cover hunt methodology, we will glance at some of the latest PowerShell samples in VirusTotal (VT) to gain exposure to more advanced obfuscation techniques. Then, we will analyze the individual elements of the obfuscation and build analytics for our threat hunt. To begin, we will query VT for samples exhibiting obfuscated PowerShell command lines using the following VT intelligence query:
( behaviour_created_processes:powershell.exe OR behaviour_created_processes:pwsh.exe) attack_technique:T1059.001 attack_technique:T1027 tag:long-command-line-arguments fs:30d+
Of course, it does not take long to find samples employing an array of command line obfuscation techniques. The following three samples were first uploaded to VT in the past month and had relatively low detection scores at the time of writing:



Overall, some of the techniques observed within these samples include:
- Character substitution.
- String substitution and concatenation.
- Encoding, decoding, and decryption functions.
- Command execution via additional LOLBins.
Understanding PowerShell Obfuscation
Although the samples presented may initially seem complex, they all adhere to traditional PowerShell syntax rules to execute successfully. From a threat hunting perspective, we will focus on the command’s syntax, as we know it will maintain some level of consistency across highly obfuscated commands, rather than the logic of the command itself. In other words, we will concentrate on identifying syntax character patterns that may indicate obfuscation attempts, such as excessive variable declarations, functions, concatenations, object references, etc.
In the context of PowerShell obfuscation, the following syntax characters are of particular interest for calculating standard deviation thresholds and identifying outliers later in this post:
- Dollar sign ( $ ) — denotes variables usage.
- Equals ( = ) — assignment operator.
- Pipe ( | ) — used to send output of one command to another.
- Dot ( . ) — used to access properties and methods.
- Comma ( , ) — used to separate elements.
- Colon ( : ) — used in property and method access via object references and as namespace separators.
- Semi-colon ( ; ) — used to separate multiple statements.
- Quotation marks (“ ” or ‘ ’) — denotes literal strings.
- Backtick ( `) — used as an escape character.
- Parentheses ( () ) — used to control order of operations, encapsulating commands, and evaluation orders.
- Braces ( {} ) — denotes functions comprised of one or more PowerShell commands.
- Square Brackets ( ) — denotes arrays and hash tables declarations, type casting, and indexing.
Basics of Standard Deviation
Standard deviation is a key metric for measuring the spread of data, assuming a normal distribution. In cases where the dataset follows a bell curve, thresholds based on one, two, or three standard deviations from the mean (average) can flag outliers. Put simply, the higher the deviation, the more anomalous the data. In large datasets, deviation-based detection is more efficient than z-scores, as z-scores can be too granular to provide meaningful insights. Deviation thresholds allow for rapid, more explicit outlier detection, offering a clearer view of the data.

In this approach, we will explore a practical application against PowerShell syntax characters by calculating the average and standard deviation for each character of interest across all PowerShell command line data in our environment. The deviation threshold will tell us how far a given value deviates from the average occurrence of a particular syntax character.
Note: Standard deviation is based on the assumption that the dataset is roughly normal, though it is better to focus on outlier detection in practical terms. For datasets that are highly skewed, an interquartile range methodology would be more robust.
Hunt Methodology
Now that we have discussed PowerShell obfuscation and standard deviation, we can finally begin defining our hunt methodology. I will provide a general outline of the methodology to apply within your environment and corresponding MDE queries in the next section. A high-level overview of our hunt methodology is as follows:
- Establish a baseline of PowerShell command line data by filtering noisy activity. KQL’s count() aggregation function can quickly identify noisy scripts and commands through stack counting. Verify any activity filtered is benign.
- Exclude command lines that are empty or contain no arguments as they can drastically skew the mean and standard deviation of your dataset. That is, empty command lines could shift the mean towards zero and lead to a misrepresentation of what baseline behavior looks like in your environment.
- Calculate the average and standard deviation for syntax characters of interest across all PowerShell command line data available in your environment using KQL’s avg() and stdev() functions. Results will likely be decimals. Thus, rounding to the nearest whole number when establishing deviation thresholds is recommended as you cannot have partial characters in the command line.
- Identify suspicious activity by assessing the number of deviations from the mean. The thresholds you deem suspicious will be unique to your environment depending on PowerShell use and how well you create a baseline.
Note: The methodology is highly adaptable for hunting anomalous activity on any number of LOLBins such as cmd.exe, wscript.exe, etc., provided you identify syntax character that can be used for obfuscation in the context of the process.
MDE Analytics
Our first MDE analytic will be used to calculate the average and standard deviation of the specified PowerShell syntax character. The calculated values are then dynamically used to identify PowerShell command lines containing syntax characters greater than or equal to the mean whilst also displaying deviation thresholds. In general, the higher the deviation from the mean, the more suspicious the command line might be. Please note, the effectiveness of the query will depend on how well you filter out benign environment-specific activity prior to calculating the average and standard deviation.
Suspicious PowerShell Syntax Character Count
//This analytic is used to find the average/standard deviation for PowerShell.exe/PowerShell_ise.exe/pwsh.exe command lines containing syntax characters of interest. The query then searches for command lines containing syntax characters within calculated ranges for outlier identification.
let window = 30d; //adjust window depending on resource consumption. 1-day queries give the best snapshot of the environment baseline
let syntax_character = “”; //Enter obfuscation syntax character of interest
//define let statements for environment-specific filters here, such as service accounts, benign processes, scripts, etc.
let calc_avg_stdev = ( DeviceProcessEvents
| where Timestamp > ago (window)
| where InitiatingProcessFileName has “powershell” or InitiatingProcessFileName has “pwsh”
//add environment-specifc filters here, prior to calculating avg/stdev
| extend cmd_length = strlen(InitiatingProcessCommandLine)
, syntax_count = countof(InitiatingProcessCommandLine, syntax_character)
| where cmd_length > 0 and syntax_count > 0 //filter empty values to prevent skewing
| summarize avg(syntax_count), stdev(syntax_count) by InitiatingProcessFileName
| project InitiatingProcessFileName, avg_syntax_count, stdev_syntax_count, STDEV_1 = avg_syntax_count + stdev_syntax_count, STDEV_2 = avg_syntax_count + (stdev_syntax_count * 2), STDEV_3 = avg_syntax_count + (stdev_syntax_count * 3)
);
DeviceProcessEvents
| where Timestamp > ago (window)
| where InitiatingProcessFileName has “powershell” or InitiatingProcessFileName has “pwsh”
//add environment-specifc filters here, prior to calculating avg/stdev
| extend cmd_length = strlen(InitiatingProcessCommandLine)
, syntax_count= countof(InitiatingProcessCommandLine, syntax_character)
| where syntax_count >= toscalar (calc_avg_stdev | project avg_syntax_count )
| summarize count()by syntax_count, cmd_length, InitiatingProcessFileName, InitiatingProcessCommandLine
extend STDEV_Calc = case(
syntax_count >= toscalar(calc_avg_stdev | project STDEV_3), “3 standard deviations above mean”,
syntax_count >= toscalar(calc_avg_stdev | project STDEV_2), “2 standard deviations above mean”,
syntax_count >= toscalar(calc_avg_stdev | project STDEV_1), “1 standard deviation above mean”,
“Value between mean and one standard deviation”
| summarize count()by InitiatingProcessFileName, InitiatingProcessCommandLine, syntax_count, cmd_length, STDEV_Calc, avg_syntax_count=toscalar (calc_avg_stdev | project avg_syntax_count), stdev_syntax_count=toscalar (calc_avg_stdev | project stdev_syntax_count), STDEV_1= toscalar (calc_avg_stdev | project STDEV_1), STDEV_2= toscalar (calc_avg_stdev | project STDEV_2), STDEV_3= toscalar (calc_avg_stdev | project STDEV_3)
| order by InitiatingProcessFileName asc, syntax_count asc //order by process name from least suspicious to most suspicious
Sample query output:

The second analytic uses the same methodology, but it will be applied to PowerShell command line lengths. Essentially, we will identify PowerShell obfuscation by using the average length of PowerShell command line activity and identifying lengths exceeding calculated deviation thresholds. Once more, the fidelity of the analytic will depend on filtering benign environment-specific activity.
Suspicious PowerShell Command Line Length
let window =30d; //adjust window depending on resource consumption. 1-day queries give the best snapshot of the environment baseline
//define let statements for environment-specific filters here, such as service accounts, benign processes, scripts, etc.
let calc_cmd_avg_stdev = ( DeviceProcessEvents
| where Timestamp > ago (window)
| where InitiatingProcessFileName has “powershell” or InitiatingProcessFileName has “pwsh”
//add environment-specifc filters here, prior to calculating avg/stdev
| extend cmd_length = strlen(InitiatingProcessCommandLine)
| where cmd_length > 0 //filter empty values to prevent skewing
| summarize avg(cmd_length), stdev(cmd_length) by InitiatingProcessFileName
| project InitiatingProcessFileName, avg_cmd_length, stdev_cmd_length, STDEV_1 = avg_cmd_length + stdev_cmd_length, STDEV_2 = avg_cmd_length + (stdev_cmd_length * 2), STDEV_3 = avg_cmd_length + (stdev_cmd_length * 3)
);
DeviceProcessEvents
| where Timestamp > ago (window)
| where InitiatingProcessFileName has “powershell” or InitiatingProcessFileName has “pwsh”
//add environment-specifc filters here, prior to calculating avg/stdev
| extend cmd_length = strlen(InitiatingProcessCommandLine)
| where cmd_length >= toscalar (calc_cmd_avg_stdev| project avg_cmd_length)
| summarize count()by cmd_length, InitiatingProcessFileName, InitiatingProcessCommandLine
extend STDEV_Calc = case(
cmd_length >= toscalar(calc_avg_stdev | project STDEV_3), “3 standard deviations above mean”,
cmd_length >= toscalar(calc_avg_stdev | project STDEV_2), “2 standard deviations above mean”,
cmd_length >= toscalar(calc_avg_stdev | project STDEV_1), “1 standard deviation above mean”,
“Value between mean and one standard deviation”
| summarize count()by InitiatingProcessFileName, InitiatingProcessCommandLine, cmd_length, STDEV_Calc, avg_cmd_length=toscalar (calc_cmd_avg_stdev| project avg_cmd_length), stdev_cmd_length=toscalar (calc_cmd_avg_stdev| project stdev_cmd_length), STDEV_1= toscalar (calc_cmd_avg_stdev| project STDEV_1), STDEV_2= toscalar (calc_cmd_avg_stdev| project STDEV_2), STDEV_3= toscalar (calc_cmd_avg_stdev| project STDEV_3)
| order by InitiatingProcessFileName asc, cmd_length asc //order by process name from least suspicious to most suspicious
Sample query output:

Hunt Limitations
- KQL in its standard form does not support querying the contents of executed scripts. Therefore, only the command and associated parameters used to run the script will be captured by these analytics.
- The analytics presented rely on calculations made during a predefined window, which results in dynamic values. As such, values can change drastically with the introduction of new tools, automation, etc., into your environment. Consequently, the analytics are not suitable as static signatures but can be used for reoccurring proactive hunts.
- Likewise, analytics may not be suitable for all environments, especially those with less complex usage patterns of PowerShell or low usage across the enterprise.
Summary
In conclusion, I demonstrated how to apply statistical methods to PowerShell command line activity in an effort to hunt for obfuscation. I presented methodology that moved beyond simple rule-based detection by leveraging a more dynamic, data-driven approach. Finally, I presented analytics which can be tailored to varying environments.
References
https://virustotal.readme.io/docs/file-search-modifiers
https://en.wikipedia.org/wiki/Standard_deviation
PowerShell Threat Hunting: Identifying Obfuscation Using Standard Deviation was originally published in Detect FYI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Introduction to Malware Binary Triage (IMBT) Course
Looking to level up your skills? Get 10% off using coupon code: MWNEWS10 for any flavor.
Enroll Now and Save 10%: Coupon Code MWNEWS10
Note: Affiliate link – your enrollment helps support this platform at no extra cost to you.
Article Link: PowerShell Threat Hunting: Identifying Obfuscation Using Standard Deviation | by Manuel Arrieta | Feb, 2025 | Detect FYI