Ebfuscation: Abusing system errors for binary obfuscation

MalBot · April 12, 2020, 7:30pm

Introduction

In this post I'm going to try to explain a new obfuscation technique I've come up with (at least I have not seen it before, please if there is documentation about this I would be grateful to receive it :D). First of all clarify that I am not an expert in obfuscation techniques and that some terms I use may not be correctly used.

Software obfuscation is related both to computer security and cryptography. Obfuscation is closely related to stenography, a branch of cryptography that studies how to transfer secrets stealthily.

Obfuscation does not guarantee the protection of your secret forever, as does encryption, where you need a key to be able to discover the secret. What obfuscation allows you is to make it more difficult to access your secret, and to gain the time necessary to benefit from your secret.

There are many applications of obfuscation, both for people who are dedicated to doing good and for people who are dedicated to doing evil.

In the case of video games, it allows you not to hack the game during the first weeks after its release (where video game companies get the maximum of their earnings).

In the case of malware, it increases the time it takes for a reverse engineer to understand the behavior of the malware, which means that it can infect more computers until protection measures can be developed for that malware.

Basically, an obfuscator receives a program as input, applies a series of transformations to it, and returns another program that has the same functionality as the input program.

Here are other transformations that can be applied:
- Virtualization
- Control Flow Flattening
- Encode literals
- Opaque predicates
- Encode Arithmetic
...

This technique doesn't pretend to be the best of all obfuscation techniques, it's just a fun way I've come up with to obfuscate data.

What is Ebfuscation?

Ebfuscation, is a technique which can be used to implement different transformations such as Literals encoding, Control Flow Flattening and Virtualization. This technique is based on System's errors. To understand what "based on System's errors" means lets see an example.
The following example is based on Encode literals transformation. (At the end of this post there is a Proof of Concept, where I implemented an obfuscator, for C programs, using Ebfuscation technique for strings.):

Given the following C program

int
check_input(void)
{
    char input[101] = { 0 };
    char * passwd = "password123";

    printf("Enter a password: ");
    fgets(input, 50, stdin);

    if (strncmp(input, passwd, strnlen(passwd, 11)) == 0)
    {
        return 1;
    }
    return 0;
}

int
main(int argc, char *argv[])
{
    if (check_input() == 1)
    {
        char * valid_pass = "Well done!";
        printf("%s\n", valid_pass);
    } else
    {
        char * invalid_pass = "Try again!";
        printf("%s\n", invalid_pass);
    }

    return 0;
}

We want to protect the literal stored in variable passwd ("password123"). To do this, we take each character as byte. And we are going to generate the needed code to generate an error in the systems which corresponds to that byte. Lets see an example for the character "p" from "password123".

"p" -> 112 -> generate_error_112()

The function generate_error_112() is an abstraction since depend on to which system you want to generate the system error 112 the implementation of this function is different.

For example to generate the system error code 3.

On Linux

/* 0x03 == ESRCH == No such process */
void generate_error_3() {     
    kill(-9999, 0);          
}

On Windows

/* 0x03 == ERROR_PATH_NOT_FOUND */ 
void generate_error_3(void) {                         
    CreateFile(              
        "C:\\Non\\Existent\\Directory\\By\\D00RT",
        GENERIC_READ | GENERIC_WRITE, 0, NULL,
        OPEN_EXISTING, 0, NULL
    );                     
}

Then, the previous program after applying the transformation is equivalent to the following program:

int
check_input(void)
{
    char input[101] = { 0 };
    char * passwd[11];

    generate_error_112();         /*p*/
    passwd[0] = get_last_error();
    generate_error_97();          /*a*/
    passwd[1] = get_last_error();
    generate_error_115();         /*s*/
    passwd[2] = get_last_error();
    generate_error_115();         /*s*/
    passwd[3] = get_last_error();
    generate_error_119();         /*w*/
    passwd[4] = get_last_error();
    generate_error_111();         /*o*/
    passwd[5] = get_last_error();
    generate_error_114();         /*r*/
    passwd[6] = get_last_error();
    generate_error_100();         /*d*/
    passwd[7] = get_last_error();
    generate_error_49();          /*1*/
    passwd[8] = get_last_error();
    generate_error_50();          /*2*/
    passwd[9] = get_last_error();
    generate_error_51();          /*3*/
    passwd[10] = get_last_error();
    passwd[11] = 0;

    printf("Enter a password: ");
    fgets(input, 50, stdin);

    if (strncmp(input, passwd, strnlen(passwd, 11)) == 0)
    {
        return 1;
    }
    return 0;
}

int
main(int argc, char *argv[])
{
    if (check_input() == 1)
    {
        char * valid_pass = "Well done!";
        printf("%s\n", valid_pass);
    } else
    {
        char * invalid_pass = "Try again!";
        printf("%s\n", invalid_pass);
    }

    return 0;
}

NOTE: get_last_error function is also dependent to the System, in Windows to retrieve the last occurred error on the system the function GetLastError() is used, instead in linux you can use the global variable errno to retrieve the last error.

Following sections will cover some deeper aspects of this technique such as pros&cons, the basic engine to produce the transformation...

History

This idea came up 2-3 years ago, when I started programming with C for windows systems. As in all the beginnings, I kept getting errors at the time of calling to Window's API functions. At that time I was also analyzing malware families that obscured their strings then I thought it could be a good idea to be able to hide my strings using the system errors that were producing my shitty code.

The idea of being able to create something meaningful out of a series of mistakes fascinated me. This meant something profound to me, as it is a metaphor for my life, in which after many mistakes I finally get things to work the way I want them to.

And it somehow represents all those people who have been rejected for not having the right knowledge, at the right time, but who struggle every day to improve and achieve their dreams.

The art of turning mistakes into success.

Although the idea came up years ago, until recently I didn't start to implement it since I didn't know very well how to start, but finally after some months thinking about how to do it, informing myself, learning and reading, I think I've found the way to make it as clear as possible.

I have a draft of the obfuscator written in Python, but taking advantage of the quarantine I have decided to rewrite it in rust-lang, to reinforce the little knowledge I have about this language

How does it work?

Here is explained briefly how a minimal ebfuscator engine looks like.

Analyzer: The analyzer is going to analyze the code of the provided program, in order to find these parts of the code you want to obfuscate. In the case of literals (The one I have implemented and I provide the tool on my github) the analyzer looks for literal strings on the source code in order to convert each byte into errors.
Error tokenizer: This part is the key. It receives a byte as parameter. Its task is simple, it need to transform that byte into its corresponded system error, based on available system errors for the target platform.

In the best scenario, we should be capable to map all the possible values of a byte to a system error.

byte 0 -> generate_error_0()
byte 1 -> generate_error_1()
byte 2 -> generate_error_2()
..
byte 253 -> generate_error_253()
byte 254 -> generate_error_254()

But this is not always possible. For example in Linux there are only 131 system errors. From 1 to 131... so in case we are capable of generate each error (which is not possible due some limitations like the hardware), how are you gonna get the value 200 if we can't generate that error? easy... combining errors. For example you can add 2 values. If you know how to generate error 100 what you can do is:

generate_error_100();
int aux_1 = get_last_error();
generate_error_100();
int aux_2 = get_last_error();

int value_200 = aux_1 + aux2;

or you can also do the following:

generate_error_100();
int aux = get_last_error();
int value_200 = aux * 2

So here depends on the strategy you implement based on the available errors for the target platform. And at least for me it was the most challenging part.

There are also other limitations for example some errors are too difficult to obtain, or you have to put on risk the system where you are executing the obfuscated program or there are some errors which are not dependent to our program.

For example ERROR_NO_POOL_SPACE on windows which is the error number 62 and means "Space to store the file waiting to be printed is not available on the server.". I can't imagine how to generate this error. So yeah... now to generate an error is an art.

In future posts I'll explain how I implemented my Error tokenizer which allows you to get an ebfuscator only with a few error codes implemented. Ofcourse there are many ways to implement it and probably all of them are better than the one I did.

Pros & Cons

This obfuscation was created for fun, here some of the pros and cons I have found. Probably there are more in both sides.

Pros

- The encoded secret is never stored into de binary, since value is maskared by the operating system.
- It's almost impossible to deobfuscated statically since the errors are dependent to the system and to the computer where the obfuscated program is executed. This means that you can customize the errors to an specific characteristics of a system, like username, processors numbers, folder names, installed programs...
- It creates a lot of code which is too boring for an analyst to analyze it and can realentize much time the reverse engineering process.
- It can break the graph view of some debuggers such as IDA Pro, which shows the message "Sorry, this node is too big to display" (This error is not due the obfuscation itself, is more related to how the error tokenizer is implemented which add much overhead which cause this kind of error/warning).

- It can break the decompiler feature for some decompilers such as the one used by IDA which shows the message "Decompilation failure: call analysis failed" (This error is not due the obfuscation itself, is more related to how the error tokenizer is implemented which add much overhead which cause this kind of error/warning).

Cons

- It produces a lot of overhead. Per each byte to obfuscate it add at least 1 function which can contain many instructions and system api calls.
- You have to implement the code to generate the errors for each platform you want to support.
- The dependence on systems errors. This means, if someday somehow the definition of these errors changes on the system you will need to update them.
- Easy to detect the technique using euristics.

Proof of Concept - Strings literals ebfuscator

I have implemented the first proof of concet which use this technique and is available on my github,. I called it Ebfuscator. I was written in rust lang and by the moment I only published the compiled binary of ebfuscator which allows you to obfuscate strings literals for a given C program. It supports both, windows and linux platforms.
There are not many errors implemented so you can add more easily. You only need to define the function in {ebfuscator_folder}/errors/{platform}/errors.c and declare it into the file {ebfuscator_folder}/errors/{platform}/errors.h. Automatically ebfuscator will use that error too in order to obfuscate the bytes.

For more information about the tool please read the README file in the repository.

On this example I will obfuscate it for linux. So the command line is the following

./ebfuscator --platform linux --source ./examples/crackme_test.c -V passwd

This command takes the program crackme_test.c and obfuscates the variable passwd using this technique. The output for this command is the following:

./output directory is created where you can find the obfuscated charckme_test.c, errors.c and errors.h files which are needed to compile the program. Compiling the program

gcc -o target.bin ./output/ebfuscated.c ./output/errors.c -lm

Now you can see how the program runs as expected

The following images shows the before and after of both original code compiled and obfuscated code compiled in IDA Pro.

In the above image you can see how the basic block increase its size a lot.

From 18 lines to 381. Now the password doesn't appear on the binary and is masked behind the system errors.

Please feel free to do Pull Request whit new errors :)

Finally, I would like to encourage people to implement other transformations such as flow flattening control using this technique.

Article Link: https://www.d00rt.eus/2020/04/ebfuscation-abusing-system-errors-for.html