A deeper look at Office documents flat style

Over the last few years I have seen some samples that use the xml style of Word Documents with base64 encoded ActiveMime data.

What started this was a recent Twitter post by HunterMaor @bit_dam Here where he was not able to get the the final payload to download.

Let’s take a closer look at this one.

Xml-1

At the top we can see it is an xml format.

Xml-2

The next thing we notice is the base64 string with a name of “editdata.mso”. Not all samples I’ve looked at use that name but it does seem that most all of them do.

Finally we scroll all the way to the end just past the base64 encoded picture file that gets displayed.

Xml-3

If we look close here we can see that there appears to be a Xml encoded Html/HTA page here.

Using a couple of tools I wrote to aid in getting information out of large xml files and word xml/html encoding  we can get a cleaned up version.

ExtractedScript

html-1

If we clean the output up we can see we do have a html page.

html-2

If we scroll down a bit we see that the base64 string at the top will be split on the string “aGkh”.

html-3

If we split on the string then that will give us 2 base 64 string and then the script control string.

Here we will Concentrate on the larger Base64 string.

html-4

If we base64 decode we can see that the output is reversed. When we flip it around we can see we have a downloader script.

html-5

If we “JS” format this we can see it will call out to a long url and if it get a response of 200 then it will write a file with a jpg extension to the path.

html-6

If we look at the smaller base64 decoded string we find it is not a jpg file but a file being loaded by “regsvr32”.

ActiveMime-1

Going back to the “editdata.mso” base64 string. Once we base64 decode to hex(bytes) we see the header tells us this is an ActiveMime file.

ActiveMime-2

If we look at the bytes at offset 0x32 and 0x33 those are 2 bytes for a Zlib header.

You can extract from here down and then use a Zlib Library to decompress this part. Or skip those 2 bytes and use a Dot Net Decompress function to decompress this to a new byte array.

ActiveMime-3

ActiveMime-4

After we decompress that we now have a “DOCFILE” file that we can use 7Zip to decompress to a folder view.

ActiveMime-5

So in this case our ActiveMime data is a vba project.

ActiveMime-6

And we have 3 scripts in it.

ActiveMime-7

Here are the 3 combined scripts extracted. We can see it will build and run a HTA.

That brings us to the next section of finding more samples.

Looking for more samples in my repository requires me to use yara to parse the files do to the size of my repository now.

Cleaning up my original test rule leaves me this to find all files with the base64 encoded string “ActiveMime”

Yara-1

This rule found 28 files in my repository. The bulk of them were older Emotet samples that did not use the script at the end like this one did. They just used highly obfuscated VBA.

Note: This rule is very broad and will catch anything that uses the base64 encoded “MimeType” not just what we are looking for with these samples.

I also ran the rule against Hybrid Analysis . At last check it has 3 pages of found files. All of files I checked were of this new type though.

The newer versions began being logged on HA in January 2021.

Lets take a look a a different version that is using obfuscation to hide the script better.

This sample can be found Here on  app.any.run.

This starts out the same as the other sample.

Doc-1

Doc-2

Doc-3

But when we get down to where the xml script was in the first sample we see something different.

Doc-4

We can still extract the data from the xml the same way. It is still encoded but how ?

Doc-5PNG

After several years of experience. Without even looking at the VBA code we can make a educated guess on how this is encoded.

If we scroll all of the way to the bottom  we can see right after the “>” the string “tqdkj” and if we highlight every instance of it we can see a patter beginning to emerge in between.

This is just a builder artifact that that will exist no matter what the characters are that will be used.

So we now know that it is just a simple string replacement (Removal).

Doc-6

Doc-7

Now we have the script.

Doc-8

Base64 decode the first string to get the url it is calling out to.

Doc-9

Now we can see where it is calling out to.

Now that we did this the “Hard Way” lets take look at what it looks like in Word.

HiddenText

Here we can see something is there that won’t show up unless you are able to highlight it.

font

Here we see the font color is white and the size is 1.

Font-2

Once we increase the font and change the color then we can better see what it happening.

Just copy paste the text and then do the deobfuscation to get the script.

While trying to find any sample in a sandbox that was able to download the picture file (dll) if found some other information on the url.

(Note: none of the sandboxes were able to download the dll. I checked @HybridAnalysis, @anyrun_app, and @hatching_io  tria.ge.)

We find this is TA551as tagged in this report Here.

Report-1

Multiples samples also mapped to the urls tagged as TA551 so that leads me to believe that this format /builder belongs to them.

That is pretty much it for this one.

Further Reading:

Link to a 2015 Sans ICS Diary
Link to a 2015 Trustwave Post
Link to a “Insecure” archive page with links to other articles including the Trustwave one.

Links From Post:

Link to first document
Link to second document
Link to IOC database.

Article Link: A deeper look at Office documents flat style | PC's Xcetra Support