Deobfuscating PowerShell: Putting the Toothpaste Back in the Tube

One lesson that security professionals learn early on is that attackers don’t like to make your job easy. They have a range of techniques to obfuscate location, network traffic, or raw code. This in turn makes it harder to for defenders to detect and block what they can’t find or to understand something that is illegible. In the realm of coding, obfuscation is using the functions and quirks of the language to create a command that is easily machine readable but much harder to recognize by human eyes.

Obfuscation techniques continue to advance, but fortunately defenders are becoming increasingly aware and developing complementary deobfuscation techniques. As I presented earlier this year at BSides Charm, there are some exciting ways to apply machine learning (ML) to combat PowerShell obfuscation. But before getting into some solutions, let’s take a look at a few common obfuscation techniques, specifically focusing on PowerShell use by attackers.  

 

PowerShell Obfuscation by Attackers

PowerShell is powerful. It was designed to automate tasks from the command line and handle configuration management, and for that many important tools were created. The aspects that make PowerShell so effective - such as easy to import modules, access to core APIs, and remote commands - also make it one of the go-to tools for attackers to execute file-less attacks. Living off the land (using native or pre-installed tools to carry out a mission) has grown in popularity, at least partially due to advances in file-based AV systems such as ML engines to detect never before seen attacks.

Fortunately for analysts and defenders, PowerShell commands can be logged and script files can be captured for analysis. This gives us a chance to perform after-action forensics to see what the attacker was up to and if they were successful. Unfortunately, attackers don’t like making this easy, so they will often obfuscate and encode commands to deter and slow down analysts.

Each language will have its own methods, many of which are shared, but for PowerShell some of the most common are:

Some PS variables/names are case insensitive.

The opposite of concatenate. This may or may not be a real word. Let’s say it is.

Typically used for inserting variables into command statements, in this case its used to jumble string components.

Backticks can be used as a line continuation character and sometimes to signify a special character. But, if you use a backtick in the middle of a variable it “continues” the line to the next characters in the same line.

Converts a string into a command operation.

Whitespace is irrelevant in some operations, so adding it just makes reading harder.

This replaces characters with their information representing their ascii codes.

There are also more complicated obfuscations like Variable Creation and Replacement. This is where an obfuscator defines a random variable as all or part of a string and inserts/replaces it in that string's place through the file. There are many ways to implement the replacement. Below are a couple of examples:

 

Format Operator: https://ss64.com/ps/syntax-f-operator.html

Input: {1}PSScriptRoot{0}..{0}PSVersionCompare.psd1 -F  ‘\’,’$’

Output: $PSScriptRoot\..\PSVersionCompare.psd1

 

Replace Function: https://ss64.com/ps/replace.html

Input: (pZyPSScriptRoot\Add-LTUser.ps1).replace('pZy',’$’)

Output: $PSScriptRoot\Add-LTUser.ps1

 

There are more examples that we won’t get into, but it’s a finite list so all these are solvable.

All of these methods and more are made easily accessible with Daniel Bohannon’s Invoke-Obfuscation module. It was our go-to source for all obfuscation and encoding work in our research.

Using Invoke-Obfuscation, we can employ multiple obfuscations at once, for example:

Before:

$packageName = 'kvrt'
$url = 'http://devbuilds.kaspersky-labs.com/devbuilds/KVRT/latest/full/KVRT.exe'
$checksum = '8f1de79beb31f1dbb8b83d14951d71d41bc10668d875531684143b04e271c362'
$checksumType = 'sha256'
$toolsPath = "$(Split-Path -parent $MyInvocation.MyCommand.Definition)"
$installFile = Join-Path $toolsPath "kvrt.exe"
try {
  Get-ChocolateyWebFile -PackageName "$packageName" `
                        -FileFullPath "$installFile" `
                        -Url "$url" `
                        -Checksum "$checksum" `
                        -ChecksumType "$checksumType"

  # create empty sidecars so shimgen only creates one shim
  Set-Content -Path ("$installFile.ignore") `
              -Value $null

  # create batch to start executable
  $batchStart = Join-Path $toolsPath "kvrt.bat"
  'start %~dp0\kvrt.exe -accepteula' | Out-File -FilePath $batchStart -Encoding ASCII
  Install-BinFile "kvrt" "$batchStart"
} catch {
  throw $_.Exception
}
Figure 1: Original Powershell Script Example

 

After:

${P`ACka`Ge`NAMe} = ("{0}{1}" -f 'kv','rt')
${U`RL} = ("{4}{11}{0}{6}{10}{3}{7}{2}{13}{15}{1}{16}{5}{8}{9}{14}{12}"-f'persky-','il','com/dev','bs','http:/','/','l','.','KVR','T/late','a','/devbuilds.kas','T.exe','b','st/full/KVR','u','ds')
${Check`s`UM} = ("{15}{16}{10}{3}{11}{6}{14}{9}{4}{5}{13}{1}{8}{7}{12}{2}{0}"-f 'c362','84143b0','71','79beb31f1d','1','bc','495','e','4','d71d4','e','bb8b83d1','2','10668d8755316','1','8f','1d')
${C`HE`cksu`m`TYpe} = ("{1}{0}" -f'56','sha2')
${T`Ool`s`PATH} = "$(Split-Path -parent $MyInvocation.MyCommand.Definition) "
${instALL`F`i`Le} = .("{0}{2}{1}{3}" -f'J','-Pa','oin','th') ${tOO`lSP`ATh} ("{0}{1}{2}" -f 'k','vrt.e','xe')
try {
  &("{2}{5}{0}{4}{3}{1}" -f'colateyWe','e','Ge','Fil','b','t-Cho') -PackageName "$packageName" `
                        -FileFullPath "$installFile" `
                        -Url "$url" `
                        -Checksum "$checksum" `
                        -ChecksumType "$checksumType"

  
  &("{2}{3}{0}{1}"-f '-','Content','Se','t') -Path ("$installFile.ignore") `
              -Value ${nu`Ll}

  
  ${B`At`C`HSTart} = &("{0}{2}{1}{3}"-f 'J','i','o','n-Path') ${TOol`s`patH} ("{0}{2}{1}" -f 'k','.bat','vrt')
  ((("{1}{2}{3}{4}{5}{7}{0}{6}"-f'ce','start ','%','~dp0{0','}k','vrt','pteula','.exe -ac'))-f [CHar]92) | .("{0}{1}{2}"-f 'Out-','Fi','le') -FilePath ${BA`T`c`hstARt} -Encoding ("{0}{1}"-f 'AS','CII')
  &("{1}{0}{3}{2}"-f'l-','Instal','nFile','Bi') ("{0}{1}"-f 'k','vrt') "$batchStart"
} catch {
  throw ${_}."E`X`CEPtiOn"
}

Figure 2: Powershell Script Example after Obfuscation via Invoke-Obfuscation

 

Encoding Techniques

Text can also be converted into other character mapping schemes to further obfuscate it. In this analysis we’ve only concerned ourselves with two schemes: ascii to hex and ascii to decimal. For example, ‘A’ would map to ‘41’ in hex and ‘65’ in decimal and ‘[‘ would map to ‘5B’ in hex and ‘91’ in decimal.

Fully encoding a script in PowerShell requires some additional logic the interpreter can use to decode the text. An example script encoded to a decimal representation is shown below.

You might notice that even the logic used to decode the sequences is obfuscated in this example. Invoke-Obfuscation can really do a number on scripts.

.((gET-varIAble '*MDR*').nAME[3,11,2]-JoiN'')([chAR[]] ( 36,112, 97, 99,107, 97 ,103, 101 , 78 , 97 ,109, 101 ,32 ,61 ,32 , 39 , 107 , 118,114 , 116 ,39 , 10 , 36 ,117 ,114 ,108 , 32 ,61 , 32,39,104 , 116, 116, 112,58,47 , 47 , 100 , 101 ,118, 98, 117 , 105, 108, 100 , 115,46, 107 , 97 , 115,112, 101,114,115, 107, 121,45,108,97, 98 , 115, 46 , 99 , 111 , 109 , 47, 100 ,101, 118 , 98 ,117,105, 108 , 100,115,47 ,75 , 86 ,82, 84,47 ,108 , 97, 116 ,101, 115 ,116,47 ,102 ,117, 108,108,47, 75 , 86,82 , 84 ,46, 101 ,120 ,101, 39, 10 , 36,99 , 104,101 ,99 , 107 ,115,117 , 109, 32 ,61,32, 39 , 56, 102, 49,100 ,101, 55,57 , 98, 101 , 98,51, 49 , 102,49, 100, 98 ,98,56,98 , 56,51,100, 49, 52, 57, 53 ,49,100,55, 49,100,52 , 49 , 98,99,49 ,48 , 54, 54, 56 , 100 , 56, 55, 53 ,53 , 51,49 , 54 , 56 , 52,49, 52 ,51 ,98, 48 , 52 , 101 ,50 , 55 , 49, 99 , 51 ,54, 50, 39 , 10 ,36, 99,104 , 101, 99 ,107, 115 ,117 , 109,84,121, 112, 101,32, 61,32 ,39, 115 , 104 , 97 , 50, 53,54 ,39 , 10 ,36 , 116 ,111 ,111 , 108 , 115, 80 ,97,116 ,104, 32, 61 , 32 ,34, 36,40 , 83 , 112 ,108, 105 , 116 , 45 , 80,97, 116 ,104, 32,45 ,112, 97 , 114 , 101, 110 ,116 ,32 , 36, 77 ,121 ,73 , 110, 118 ,111, 99 ,97, 116 , 105 ,111,110, 46, 77, 121 ,67 , 111,109, 109, 97, 110 , 100 ,46 , 68, 101 ,102,105 ,110, 105 , 116 , 105, 111, 110,41,34 ,10 , 36,105,110 , 115 , 116 , 97 ,108 ,108, 70 , 105, 108 ,101,32, 61 ,32, 74,111 , 105 ,110 ,45 , 80 ,97, 116, 104, 32 ,36, 116 , 111,111,108, 115 , 80 , 97,116,104,32 ,34,107, 118,114, 116 , 46,101,120,101, 34, 10, 116 , 114 , 121, 32,123 ,10 , 32,32, 71 ,101 , 116, 45, 67,104, 111 , 99 , 111 , 108,97, 116,101 , 121, 87, 101, 98, 70,105, 108,101 , 32 ,45 ,80, 97, 99,107,97, 103 , 101,78,97, 109, 101 , 32,34, 36, 112,97 ,99, 107, 97,103 , 101 , 78 , 97 , 109 ,101,34 , 32 , 96,10,32, 32,32 ,32 ,32 ,32,32,32 , 32, 32,32, 32 ,32,32, 32,32 , 32 ,32, 32 ,32 ,32, 32,32,32, 45,70,105 ,108 , 101, 70,117 , 108 ,108 ,80,97,116 ,104, 32, 34, 36,105 ,110 , 115 , 116 , 97,108, 108 , 70 ,105 ,108, 101 ,34, 32,96 ,10 , 32, 32,32 , 32 ,32,32,32 ,32 , 32,32, 32 ,32,32 , 32 , 32,32, 32,32 , 32, 32 ,32 ,32 , 32 ,32 ,45,85 , 114, 108, 32 , 34,36, 117, 114 , 108,34 ,32 ,96,10 , 32,32,32,32,32 , 32 , 32 ,32 ,32 ,32 ,32 ,32,32 , 32, 32,32 , 32,32, 32 ,32 ,32, 32 ,32 , 32,45,67 ,104 ,101, 99 , 107 , 115 , 117, 109 ,32 , 34 , 36 ,99,104,101, 99 , 107,115, 117 , 109 ,34,32, 96,10,32,32, 32 ,32, 32,32 ,32 , 32 , 32 ,32 , 32 , 32,32 , 32, 32, 32 ,32 ,32 , 32, 32 ,32, 32,32, 32,45,67,104 , 101 , 99 , 107,115 ,117 , 109 , 84 , 121 , 112, 101, 32 ,34,36, 99, 104,101, 99,107,115 , 117 , 109, 84, 121,112, 101, 34 , 10, 10, 32 ,32 , 35 , 32, 99 ,114 ,101 , 97 ,116 ,101 , 32 ,101,109, 112,116 , 121,32, 115 ,105 ,100 , 101 ,99 ,97, 114 ,115 ,32 ,115 ,111 ,32, 115 ,104 ,105 ,109,103 , 101 , 110 ,32,111 ,110,108 , 121,32,99 ,114 ,101 ,97 , 116 , 101, 115 , 32, 111 ,110 , 101 , 32,115 , 104 ,105 ,109, 10,32 , 32,83 , 101 , 116 , 45 ,67,111, 110 ,116 ,101,110 , 116, 32 ,45 ,80 , 97, 116 ,104, 32, 40 ,34,36,105,110, 115,116,97, 108,108, 70 ,105, 108,101 , 46 , 105 ,103, 110,111 , 114 , 101 , 34 ,41, 32, 96 , 10 , 32,32 , 32, 32,32 , 32,32 , 32, 32 ,32, 32,32,32 , 32 , 45,86,97, 108 ,117 ,101 , 32 , 36 , 110 , 117 , 108, 108 , 10, 10 ,32, 32, 35,32 , 99 , 114,101 , 97 ,116 ,101 , 32 , 98 ,97 ,116 , 99,104 , 32 ,116 , 111 ,32,115, 116, 97,114, 116, 32 , 101, 120 , 101 ,99 , 117 , 116, 97 ,98 , 108 ,101, 10,32 , 32 , 36 ,98 ,97, 116 , 99 , 104, 83,116, 97 ,114 , 116 ,32,61, 32 ,74,111,105, 110, 45, 80, 97, 116 , 104,32, 36 , 116 , 111,111,108,115, 80,97 , 116,104,32 , 34 ,107 , 118, 114,116 , 46 , 98 ,97 , 116 , 34 ,10,32, 32 , 39,115 ,116,97 , 114, 116,32,37, 126,100, 112, 48 ,92, 107,118 ,114, 116, 46 , 101, 120,101 , 32 , 45 ,97, 99, 99 ,101 , 112 , 116 ,101 ,117 ,108, 97 ,39 , 32,124, 32 ,79 , 117, 116, 45,70, 105 , 108, 101 , 32 , 45, 70 ,105 ,108 ,101 ,80, 97 ,116,104, 32 ,36,98, 97, 116,99 ,104, 83 ,116, 97 ,114 , 116 ,32, 45 , 69, 110 , 99 ,111 , 100 , 105,110, 103 ,32 ,65 , 83,67, 73,73, 10 ,32, 32 ,73,110, 115, 116 ,97 , 108,108 , 45 , 66,105 , 110,70,105, 108 , 101 , 32,34 , 107,118 ,114, 116 , 34 , 32 ,34,36, 98 , 97 , 116, 99 , 104, 83 , 116 ,97, 114,116 ,34,10, 125 ,32 ,99,97 , 116,99,104 ,32, 123,10 , 32 , 32 ,116 , 104, 114 , 111 , 119 ,32 , 36 , 95, 46, 69 ,120 , 99, 101 ,112 ,116 ,105 ,111 , 110 , 10,125 )-jOIN'') 
Figure 3: Powershell Script Example after Encoding via Invoke-Obfuscation

 

How Do We Deobfuscate?

To solve this problem, we created a series of operations to tackle each of the issues presented.

First, we gathered data and built a classifier to determine if a sample is encoded, obfuscated, or plain text. Samples can be both obfuscated and encoded, so we’ll need to reuse this classifier to make sure our final product is complete. Then we iteratively applied decoding and deobfuscation logic, while checking the output of each application to see if more work is required. Finally, we implemented a cleanup neural network, a new approach to deobfuscation, to fix some of the odd bits of obfuscation that can’t be handled by simple logic alone.

Figure 4: Deobfuscation Logic Flow

 

Find What We're Working With

Our first task is building something that can determine if a sample is encoded, obfuscated, or plain text. To that end, we built a machine learning classifier to automatically make that determination.

 

Building a Status Classifier

The typical machine learning approach for building and training classifiers is to:

  1. Gather lots of samples with labels (e.g. hex encoded, obfuscated, plain text)
  2. Generate numerical features on those samples
  3. Train using your selected algorithm

Figure 5: Classifier Steps

 

Often the hardest part of building a classifier is getting the samples and labels. For this problem, some solutions for finding samples include downloading from file sharing services or scrapping Github. Luckily, after we have a corpus of PowerShell script samples, we can generate the obfuscated and encoded samples on demand with Invoke-Obfuscation!

Next up is generating features for our samples. Text can be a little tricky for a classifier. The classical machine learning approach (e.g., train a logistic regression model) is to hand define and generate summary statistics and other relevant features of your sample, such as:

  • # of characters
  • # of vowels
  • Entropy
  • # of ` marks
  • # of numbers

However, these sorts of features often do not express the relationship between characters well.

Instead, we're going to use a type of neural network called a LSTM.

A LSTM (long-short term memory) network is a specialized RNN (recurrent neural network). These networks are useful because they retain a memory of previous states and use that in combination with the current input to determine the next state. Here is a good explanatory blog on what LSTMs are and how they operate, or some of our own previous research into building an LSTM to detect domain generation algorithms.

Figure 6: LSTM Diagram

 

Getting started with neural networks can seem a little bit intimidating, however high-level frameworks for management make the initial application very easy.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
model = Sequential()
model.add(Embedding(num_encoder_tokens, embedding_vector_length, input_length=sample_len))
model.add(LSTM(100))
model.add(Dropout(0.2))
model.add(Dense(len(classes), activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=epochs, batch_size=64)

In under ten lines we can take our input data, create a simple network, and train it.

 

Decoding

Figure 7: Powershell Script Example after Encoding via Invoke-Obfuscation

 

Decoding can be a relatively straightforward process if you have the encoding mappings and know when to apply the logic. This is exactly what the PowerShell interpreter does, and reimplementing that is a valid approach for the sample above.

However, I'm first and foremost data scientist, and I see a pattern here. You know what's good for patterns? Regex!

ascii_char_reg = r'([0-9]{1,3})[, \)]+'
ascii_chars = re.findall(ascii_char_reg, file_text)
chars = [chr(int(ac)) for ac in ascii_chars]
file_text = ''.join(chars)

With a Regex-based solution we create a decoder in just a few lines of code. It is also robust to the encoder logic obfuscation at the beginning and end of the sample, and it can work outside of a PowerShell script, so it is generalizable.

 

Deobfuscation

The bulk majority of the deobfuscation can be handled by simple logic: concatenating strings, removing `s, replacing variables, etc.

Some of these transformations are easy:

def remove_ticks(line):
    line = line[:-1].replace('`', '') + line[-1]
    return line

 

def splatting(line):
    splat_reg = r"""(&\( *['"]{1}(.+)?['"]{1} *?\))"""
    matches = re.findall(splat_reg, line)
    for match in matches:
        line = line.replace(match[0], match[1])
    return line

 

def string_by_assign(line):
    match_reg = r'(?:(\[[sS][tT][rR][iI][nN][gG]\])([\[\]A-Za-z0-9]+)[\)\,\.]+)'
    matches = re.findall(match_reg, line)
    for match in matches:
        replace_str = match[0] + match[1]
        line = line.replace(replace_str, "'" + match[1] + "'")
    return line

 

Some get a little more complicated. For reordering of strings based on '-f', the format operator, we:

  1. Do char by char string processing to find either '-f' or '-F’
  2. Find all the {[0-9]+} type placeholders before the '-f’
  3. Find all the strings and valid non-string values after it
  4. Replace the placeholders with the values
  5. Iterate since you can do this multiple times in the same line.

It’s a little tedious, and there are multiple ways to do the same thing. But the list techniques is finite, so it is definitely a solvable problem even if we didn't enumerate every solution in our implementation.

After integrating all these deobfuscation techniques and applying them sequentially, we can see how well our code works.

Before:

param
(
    [Parameter(MANdAtORy=${FA`L`SE})] ${dO`m`AiN} = ("{2}{1}{0}{3}" -f 'a','rtr','ai','n.com'),
    [Parameter(MandatOrY=${tr`UE})]  ${Sr`NUM`BER},
    [Parameter(mAnDATORY=${F`AL`SE})] ${targET`p`Ath} = ("{10}{11}{1}{2}{9}{14}{3}{12}{5}{7}{4}{0}{8}{13}{6}" -f'=a','=Airtr','a','ir',',DC','a','C=com','n','i','n','OU=Disab','led,OU','tr','rtran,D',' Users,OU=A'),
    [Parameter(ManDAtOrY=${T`RUe})]  ${us`er}
)

if (&("{2}{1}{0}"-f'Path','est-','T') ${US`eR})
{
    ${USER`li`sT} = &("{0}{2}{3}{1}" -f'Ge','nt','t-','Conte') -Path ${u`SEr}
}
else
{
    ${usER`L`ISt} = ${Us`Er}
}

${c`oNT`AIneR} = ("{3}{11}{4}{8}{5}{0}{7}{10}{6}{2}{1}{9}" -f'ir','irtran,',',DC=a','OU','a',',OU=A','an','tran Users,OU=Air','bled','DC=com','tr','=Dis')
${D`eS`CrIP`TIon} = ('Term'+'ina'+'ted '+'per'+' '+"$SrNumber")

foreach (${uS`eR} in ${U`S`E`RList})
{
    .("{2}{0}{1}" -f'et','-ADUser','S') -Identity ${Us`ER} -Server ${D`OM`AIN} -Enabled ${FA`LsE} -Description ${D`eSCrI`P`TION}
    ${UsE`RHan`dlE} = &("{2}{0}{1}"-f'U','ser','Get-AD') -Identity ${us`eR} -Server ${Do`M`AiN}
    &("{3}{1}{2}{0}" -f't','je','c','Move-ADOb') -Identity ${uSe`Rh`AnD`Le} -Server ${doM`A`In} -TargetPath ${C`O`Nt`Ainer}
}
Figure 8: Obfuscated Sample

 

After:

param
(
    [Parameter(MANdAtORy=${FALSE})] ${dOmAiN} = "airtran.com",
    [Parameter(MandatOrY=${trUE})]  ${SrNUMBER},
    [Parameter(mAnDATORY=${FALSE})] ${targETpAth} = "OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com",
    [Parameter(ManDAtOrY=${TRUe})]  ${user}
)
if ("Test-Path" ${USeR})
{
    ${USERlisT} = "Get-Content" -Path ${uSEr}
}
else
{
    ${usERLISt} = ${UsEr}
}
${coNTAIneR} = "OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com"
${DeSCrIPTIon} = ('Terminated per $SrNumber")
foreach (${uSeR} in ${USERList})
{
    "Set-ADUser" -Identity ${UsER} -Server ${DOMAIN} -Enabled ${FALsE} -Description ${DeSCrIPTION}
    ${UsERHandlE} = "Get-ADUser" -Identity ${useR} -Server ${DoMAiN}
    "Move-ADObject" -Identity ${uSeRhAnDLe} -Server ${doMAIn} -TargetPath ${CONtAiner}
}
Figure 9: Partially Deobfuscated Sample

 

That’s not too bad! But a few errors remain, and most share a pattern and look like: (MAndatoRy=${fAlSe})] ${dOMAiN}

This randomized casing is a different type of problem than we saw before. It makes the text harder to read, but not by applying PowerShell functions as obfuscation. While all of the previous techniques we’ve discussed could be run backwards to get the original input, randomized casing can not. For this, we need a different technique.

 

Reversing an Irreversible Function

This is where things get interesting. To take this one step further we’re going to use a neural network to learn, and sometimes memorize, what variables are supposed to look like.

If you were presented with the example:

MOdULEDiRectORy

based on your knowledge of English and programming, you could probably figure out a configuration of casing that makes sense. Perhaps one of these:

ModuleDirectory

moduleDirectory

moduledirectory

To mimic this cognition, we are going to train a Seq2Seq network. Seq2Seq stands for sequence to sequence. It’s a type of network (or networks) that is often used in machine translation.

Seq2Seq uses LSTMs (see our text classifier earlier) to create an encoder network to transform the starting text, and a decoder network to use the encoder output and the decoder memory. Combining these we are able to feed the input character by character and predict the output. Keras has a nice blog explaining how to create and train one of these networks. Our code generally follows their example.

We initially tried to use this network to translate entire lines. Since a Seq2Seq network builds the output character by character based on input characters and the last predicted output characters we can see how the test progresses along with the input. It started out well:

Input:

Becomes (looking good):

Then (starting to error):

Then finally (way off the rails):

Once bad predictions start, they can get out of hand.

To deal with bad predictions we constrained the problem and picked "words” in each line to consider.

  1. Find the corresponding words in the obf and non-obf files
  2. Grab most variables and keywords that can be random case obfuscated
  3. Use the obf word as input and non-obf word as desired output
  4. Predicts the next char using the previous predictions and new input data

The retrained network had some fun quirks:

But in general performed quite nicely:

 

Putting It All Together

Now that we have a File Status Classifier, a Decoder, a Deobfuscator, and a Cleanup Network, we’re ready to package it all together into one function and test it out.

Again, our steps are as follows:

Figure 10: Deobfuscation Logic Flow

 

Let’s start with a non obfuscated file:

param
(
    [Parameter(Mandatory=$false)] $Domain = 'airtran.com',
    [Parameter(Mandatory=$true)]  $SrNumber,
    [Parameter(Mandatory=$false)] $TargetPath = 'OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com',
    [Parameter(Mandatory=$true)]  $User
)

if (Test-Path $User)
{
    $UserList = Get-Content -Path $User
}
else
{
    $UserList = $User
}

$Container = 'OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com'
$Description = "Terminated per $SrNumber"

foreach ($User in $UserList)
{
    Set-ADUser -Identity $User -Server $Domain -Enabled $false -Description $Description
    $UserHandle = Get-ADUser -Identity $User -Server $Domain
    Move-ADObject -Identity $UserHandle -Server $Domain -TargetPath $Container
}
Figure 11: Original Sample

 

We obfuscate it using a random set of techniques:

param
(
    [Parameter(MANdAtORy=${FA`L`SE})] ${dO`m`AiN} = ("{2}{1}{0}{3}" -f 'a','rtr','ai','n.com'),
    [Parameter(MandatOrY=${tr`UE})]  ${Sr`NUM`BER},
    [Parameter(mAnDATORY=${F`AL`SE})] ${targET`p`Ath} = ("{10}{11}{1}{2}{9}{14}{3}{12}{5}{7}{4}{0}{8}{13}{6}" -f'=a','=Airtr','a','ir',',DC','a','C=com','n','i','n','OU=Disab','led,OU','tr','rtran,D',' Users,OU=A'),
    [Parameter(ManDAtOrY=${T`RUe})]  ${us`er}
)

if (&("{2}{1}{0}"-f'Path','est-','T') ${US`eR})
{
    ${USER`li`sT} = &("{0}{2}{3}{1}" -f'Ge','nt','t-','Conte') -Path ${u`SEr}
}
else
{
    ${usER`L`ISt} = ${Us`Er}
}

${c`oNT`AIneR} = ("{3}{11}{4}{8}{5}{0}{7}{10}{6}{2}{1}{9}" -f'ir','irtran,',',DC=a','OU','a',',OU=A','an','tran Users,OU=Air','bled','DC=com','tr','=Dis')
${D`eS`CrIP`TIon} = ('Term'+'ina'+'ted '+'per'+' '+"$SrNumber")

foreach (${uS`eR} in ${U`S`E`RList})
{
    .("{2}{0}{1}" -f'et','-ADUser','S') -Identity ${Us`ER} -Server ${D`OM`AIN} -Enabled ${FA`LsE} -Description ${D`eSCrI`P`TION}
    ${UsE`RHan`dlE} = &("{2}{0}{1}"-f'U','ser','Get-AD') -Identity ${us`eR} -Server ${Do`M`AiN}
    &("{3}{1}{2}{0}" -f't','je','c','Move-ADOb') -Identity ${uSe`Rh`AnD`Le} -Server ${doM`A`In} -TargetPath ${C`O`Nt`Ainer}
Figure 12: Obfuscated Sample

 

And then encode it:

InVoKe-eXPreSsION ( [STRInG]::join('' , (( 13,10,112 , 97 , 114 ,97 , 109 , 13,10 ,40,13,10,32, 32 ,32 ,32,91 ,80 ,97 ,114 ,97 , 109 , 101 ,116 ,101 ,114,40 , 77, 65, 110 , 100, 97 ,116,111,82, 121,61, 36,123 ,102 , 65 ,96, 108 ,96,83 , 101 , 125,41,93,32, 36 ,123, 100,79,77,96,65 ,96,105,78,125 , 32 ,61 ,32 , 40 ,34 ,123,51, 125,123 ,50,125 , 123,49, 125 , 123, 48,125 ,34 ,45,102,39 ,109, 39, 44 ,39, 97,110 ,46 ,99,111, 39,44 ,39 , 114,39,44, 39 ,97 , 105,114, 116 ,39,41, 44 ,13 , 10, 32, 32, 32, 32,91 , 80 ,97, 114 ,97,109 ,101,116 , 101, 114,40 ,77, 97,78,68 ,65, 84, 111 , 114, 89 , 61 , 36 ,123 ,116 ,82 ,96 , 85, 101 , 125 , 41 ,93 , 32 ,32, 36,123, 83 , 96 , 82 , 96 ,78 , 85 , 96 ,109, 66 ,101,114,125 ,44 , 13,10, 32,32 ,32,32 , 91,80 ,97 ,114,97 ,109 , 101,116, 101 , 114, 40 , 109, 97 ,110 , 100,97,84,79 , 114 ,121,61, 36 , 123 ,70, 65 ,108 , 96,115 ,69,125, 41 , 93 ,32 ,36,123 , 116,65 ,114, 96 ,71 ,69 , 96,84,96,112, 65 , 116 ,72, 125 ,32 , 61 ,32 , 40 , 34 , 123 , 48, 125, 123, 56 ,125 , 123 , 50 ,125 ,123 ,55 ,125,123 ,49 , 51 , 125 , 123,57 ,125,123, 52, 125, 123,51 ,125 ,123 , 49 ,50 ,125, 123 ,49, 49,125,123,49 , 48 ,125, 123 , 54 ,125 , 123 ,53, 125, 123 ,49 ,125,34, 32,45 , 102 , 32,39 ,79 , 85 ,39 , 44 ,39 ,68,67, 61, 99, 111, 109,39, 44 ,39,100,44, 79, 85, 39 , 44, 39 ,85 ,39, 44,39 , 115, 101,114 , 115 ,44 ,79 , 39,44, 39, 116 ,114 , 97, 110 , 44 , 39 , 44 ,39,105,114 , 39 , 44,39,61, 65 ,39, 44, 39, 61 , 68,105,115 , 97 ,98, 108 ,101, 39, 44 , 39,114 , 97,110 ,32, 85,39, 44 ,39 ,61, 97 ,39 , 44 , 39,65 , 105, 114 , 116 ,114,97, 110, 44 , 68 , 67 ,39 , 44 , 39, 61, 39 , 44,39, 105 , 114 ,116 ,39 ,41,44,13 ,10 ,32, 32,32 ,32 , 91 , 80 , 97 ,114,97,109,101 , 116, 101,114, 40,109,97, 78,100 , 97 , 84 , 111 , 82, 121 , 61 ,36, 123 ,84 ,96, 82,117 , 101 ,125,41, 93 , 32 , 32, 36 , 123,85,96, 115 ,69 ,82, 125 ,13, 10 ,41 ,13 ,10, 13 , 10,105, 102, 32 ,40 ,38 , 40,34 ,123 , 50, 125 ,123,49 , 125, 123, 48 , 125 ,34,32, 45 , 102 ,39,116 , 104, 39,44 ,39 ,80 ,97,39,44 ,39 ,84, 101,115, 116 ,45 , 39 , 41, 32 , 36 , 123, 85 ,83 ,96 ,69,114 , 125 ,41,13 ,10 , 123,13 , 10 ,32 , 32,32,32,36 , 123 ,85,96, 115,69 , 114 , 108, 96 ,105 , 83 , 116 ,125, 32 ,61 ,32 , 38, 40 , 34 ,123,50 , 125,123, 48 ,125, 123 , 49 ,125, 34 , 45 ,102 ,32, 39 , 101, 116, 45,67 ,111, 110 , 39 ,44,39 , 116 , 101, 110 ,116,39 , 44,39 , 71,39,41, 32 , 45,80 ,97, 116,104 , 32, 36 , 123,117,96,83 , 69 ,82, 125 , 13,10 , 125,13, 10 ,101 , 108 ,115 , 101 , 13 ,10 , 123,13 ,10 ,32,32 , 32 , 32, 36 , 123 , 117,96 ,115, 96 , 101 , 82 , 76 , 96, 105,115, 84 ,125,32 ,61,32, 36, 123,85 ,96 ,83, 101 ,114,125 , 13,10 , 125 , 13,10,13, 10, 36 , 123 , 99 ,79,96, 78 ,116,97 , 73,110 , 96 , 69 ,82, 125 , 32,61 , 32 , 40 ,34 ,123 ,51,125 , 123, 52 , 125, 123, 48 , 125 , 123 , 54 ,125, 123 ,49,48 , 125,123,57 , 125,123 ,56 ,125,123,49 , 125 , 123,55, 125 , 123,49 , 50 ,125 , 123 ,49 ,49, 125,123 ,50 ,125,123 ,53,125 , 34 ,45 ,102, 39,101, 100 ,39 ,44 ,39 , 115 ,44, 79, 85, 61,65 ,105 , 114, 39 , 44 ,39 , 110 , 44, 39,44 , 39 ,79 ,85, 61 , 68 , 105,115 , 39 , 44,39, 97 ,98, 108 ,39 ,44, 39 ,68 , 67 ,61, 99 , 111,109 ,39 , 44 , 39,44 , 79,85, 39, 44 ,39 ,116,114 , 39 ,44, 39, 114 , 39,44,39,97 ,110 , 32 ,85, 115,101, 39 ,44,39,61, 65 , 105 , 114 , 116, 114 ,39 ,44 , 39, 97, 39 , 44 ,39,97, 110, 44, 68 ,67, 61,97,105 ,114 ,116 ,114 ,39, 41 ,13, 10 , 36 , 123,100 ,69,115,96,67 ,114 , 73, 96, 112,84 , 105 ,96 ,111,78, 125 , 32, 61 ,32 , 40,39 ,84,101,114 ,109 , 105,110 ,97, 39,43 , 39 , 116 , 39 , 43 , 39 ,101, 39 ,43, 39, 100 , 32,39, 43 , 39,112,39,43 ,39 ,101, 114 ,32 ,39 , 43 , 34 ,36, 83, 114,78,117 , 109 , 98 ,101, 114 , 34,41,13 ,10 , 13,10,102 , 111 , 114 ,101 , 97,99 , 104 , 32, 40 ,36 , 123 ,117, 83 ,96,101,114 ,125,32,105 , 110,32,36, 123 , 117,96 , 83 ,69, 114 ,96, 76, 96 , 105, 115 ,84 ,125,41, 13 , 10 ,123 ,13 , 10, 32, 32, 32 , 32,46 ,40, 34 ,123 , 49 ,125, 123 , 51, 125,123 ,48,125,123, 50, 125,34, 45 ,102, 32 , 39 , 101,39,44,39,83 ,39,44,39 ,114 ,39 ,44,39 , 101 ,116 , 45, 65,68, 85 , 115,39,41, 32 , 45,73,100, 101 , 110 ,116, 105 , 116 , 121 , 32 ,36 ,123, 117,96, 83 ,69 , 114, 125 ,32 ,45 , 83 , 101 ,114, 118 , 101 ,114 ,32 ,36, 123 ,68 , 79, 77, 96 ,65, 105 ,110 , 125 ,32, 45, 69 ,110 , 97 ,98,108 , 101,100,32 ,36, 123 ,70 , 65 ,96,76,96 , 115, 69 , 125, 32 ,45 , 68 ,101 ,115 , 99, 114, 105, 112, 116, 105 , 111,110 , 32,36, 123 ,68 , 101,96,115 , 99,82 ,73,96 , 112 , 116 ,96, 73 , 111, 110 , 125, 13,10,32 ,32 , 32 , 32 , 36,123,85,83,96,69 , 114,96, 72, 65 , 78, 68, 96 , 108 ,101,125,32,61, 32 ,46,40 , 34,123,49,125 , 123,50 , 125 , 123, 48,125 , 34 , 45 , 102, 39, 101,114, 39, 44, 39 ,71 ,101, 116,45, 65, 68 ,85, 39,44 ,39,115 , 39,41,32, 45 , 73 , 100 ,101 , 110 ,116 , 105 , 116,121 ,32,36 ,123, 117 , 96 , 115, 101 ,82 ,125, 32 , 45 ,83, 101,114 , 118,101, 114 , 32,36, 123,68,79 ,109 , 96,65,96 ,73,110 ,125 ,13 , 10, 32,32 , 32 , 32 ,38 , 40,34,123,51,125 ,123, 49,125,123,48 ,125 ,123 , 50 , 125,34 ,32 ,45 ,102, 32 ,39, 106, 101 ,39 , 44 ,39 ,68, 79 ,98 ,39 ,44 , 39 , 99,116 ,39 ,44, 39,77,111 , 118, 101 , 45 , 65,39 , 41,32 , 45, 73 , 100,101, 110,116,105,116 ,121, 32,36 ,123,117 ,115, 96 , 101 ,96,82 , 104,97 ,96 ,78 , 100 ,76 , 69 ,125 , 32 ,45 ,83 , 101 ,114 , 118, 101 , 114 ,32, 36 ,123 , 100, 111 , 96,77 ,65,96, 73,78 ,125,32,45 ,84 ,97,114, 103 , 101, 116,80 ,97,116 , 104, 32 ,36,123 ,99 , 79, 96,78,116 , 97,73 , 96 ,78,96,69,82,125 ,13 ,10 ,125 , 13,13 ,10 )| foreACH{ ( [CHAr][iNt]$_) }) ))
Figure 13: Obfuscated and Encoded Sample

 

Now we can run it through our system. This returns two outputs: 1) A partially fixed version that does everything but the cleanup network; 2) A fully fixed version that includes the cleanup network. This is because the cleanup network is still very much experimental and might produce unintended output.

 

Partially Fixed:

param
(
    [Parameter(MAndatoRy=${fAlSe})] ${dOMAiN} = "airtran.com",
    [Parameter(MaNDATorY=${tRUe})]  ${SRNUmBer},
    [Parameter(mandaTOry=${FAlsE})] ${tArGETpAtH} = "OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com",
    [Parameter(maNdaToRy=${TRue})]  ${UsER}
)
if ("Test-Path" ${USEr})
{
    ${UsErliSt} = "Get-Content" -Path ${uSER}
}
else
{
    ${useRLisT} = ${USer}
}
${cONtaInER} = "OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com"
${dEsCrIpTioN} = ('Terminated per $SrNumber")
foreach (${uSer} in ${uSErLisT})
{
    "Set-ADUser" -Identity ${uSEr} -Server ${DOMAin} -Enabled ${FALsE} -Description ${DescRIptIon}
    ${USErHANDle} = "Get-ADUser" -Identity ${useR} -Server ${DOmAIn}
    "Move-ADObject" -Identity ${useRhaNdLE} -Server ${doMAIN} -TargetPath ${cONtaINER}
}
Figure 14: Partially Deobfuscated Sample

 

Fully Fixed:

param
(
    [Parameter(Mandatory=$false)] $domain = "airtran.com",
    [Parameter(Mandatory=$true)]  $srnUmber,
    [Parameter(Mandatory=$false)] $targetPath = "OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com",
    [Parameter(Mandatory=$true)]  $User
)
if (Test-Path $user)
{
    $userList = Get-Content -Path $User
}
else
{
    $userList = $user
}
$container = "OU=Disabled,OU=Airtran Users,OU=Airtran,DC=airtran,DC=com"
$Description = "Terminated per $SRNumber"
foreach ($user in $UserList)
{
    Set-ADUser -Identity $user -Server $domain -Enabled $false -Description $Description
    ${USErHANDle} = Get-ADUser -Identity $User -Server $domain
    Move-ADObject -Identity ${useRhaNdLE} -Server $domain -TargetPath $container
}
Figure 15: Fully Deobfuscated Sample

 

Conclusion

Not too shabby. We were able to obfuscate, encode, and then fix a PowerShell script file. This final output is not yet executable, but with a little work we can get there. Deobfuscation is a hard but not insurmountable challenge. Following the basic steps of collecting data, intelligently cleaning it, and applying ML techniques where appropriate allows us to reliably solve burdensome tasks to improve our workflow. With a little perseverance and helpful math, we can put the toothpaste back in the tube.

Article Link: https://www.endgame.com/blog/technical-blog/deobfuscating-powershell-putting-toothpaste-back-tube