The increasing popularity of Go as a language for malware development is forcing more reverse engineers to come to terms with the perceived difficulties of analyzing these gargantuan binaries. The language offers great benefits for malware developers: portability of statically-linked dependencies, speed of simple concurrency, and ease of cross-compilation. On the other hand, for analysts, it’s meant learning the inadequacies of our tooling and contending with a foreign programming paradigm. While our tooling has generally improved, the perception that Go binaries are difficult to analyze remains. In an attempt to further dispel that myth, we’ve set out to share a series of scripts that simplify the task of analyzing Go binaries using IDA Pro with a friendly methodology. Our hope is that members of the community will feel inspired to share additional resources to bolster our collective analysis powers.
A Quick Intro to the Woes of Go Binary Analysis
Go binaries present multiple peculiarities that make our lives a little harder. The most obvious is their size. Due to the approach of statically-linking dependencies, the simplest Go binary is multiple megabytes in size and one with proper functionality can figure in the 15-20mb range. The binaries are then easily stripped of debug symbols and can be UPX packed to mask their size quite effectively. That bulky size entails a maze of standard code to confuse reverse engineers down long unproductive rabbit holes, steering them away from the sparse user-generated code that implements the actual functionality.
Hello World source code Mach-o binary == 2.0mbUPX compressed == 1.1mb
To make things worse, Go doesn’t null-terminate strings. The linker places strings in incremental order of length and functions load these strings with a reference to a fixed length. That’s a much safer implementation but it means that even a cursory glance at a Go binary means dealing with giant blobs of unrelated strings.
“Hello World!” string lost in a sea of unrelated strings.Even the better disassemblers and decompilers tend to display these unparsed string blobs confusing their intended purpose.
Autoanalysis output | Manually fixed disassembly |
That’s without getting into the complexities of recovering structures, accurately portraying function types, interfaces, and channels, or properly tracing argument references.
The issue of arguments should prove disturbing for our dynamic analyst friends who were hoping that a debugger would spare them the trouble of manual analysis. While a debugger certainly helps determine arguments at runtime, it’s important to understand how indirect that runtime is. Go is peculiar in allocating a runtime stack owned by the caller function that will in turn handle arguments and allow for multiple return values. For us it translates into a mess of runtime function prologues before any meaningful functionality. Good luck navigating that without symbols.
Improving the Go Reversing Experience
With all those inadvertent obstacles in mind, it’s perfectly understandable that reversers dreaded analyzing Go binaries. However, the situation has changed in the past few years and we should reassess our collective abhorrence of Go malware. Different disassemblers and decompilers have stepped up their Go support. BinaryNinja and Cerbero have improved their native support for Go and there are now standalone frameworks like GoRE that offer good functionality depending on the Go compiler version and can even help support other projects like radare2.
We’ll be focusing on IDA Pro. With the advent of version 7.6, IDA’s native support of Go binaries is drastically better and we’ll be building features on top of this. For folks stuck on v7.5 or lower, we’ll also provide some help by way of scripts but the full functionality of AlphaGolang is unlocked with 7.6 due to some missing APIs (specifically the ida_dirtree API for the new folder tree view). This isn’t the first time brave souls in the community have attempted to improve the Go reversing experience for IDA. However, those projects were largely monolithic labors of love by their original authors and given the fickle nature of IDA’s APIs, once those authors got busy, the scripts fell into disrepair. We hope to improve that as well.
AlphaGolang
With AlphaGolang, we wanted to tackle two disparate problems simultaneously– the brittleness of IDApython scripts for Go reversing and the need for clear steps to analyze these binaries.
We can’t expect everyone to be an expert in Go in order to understand their way around a binary. Less so to have to fix the tooling they’re attempting to rely on. While engineers might be tempted to address the former by elaborating a complex framework, we figured we’d swing in the opposite direction and break up the requisite functionality into smaller scripts. Those smaller digestible scripts allow us to part out a relatable methodology in steps so that analysts can pick and choose what they need as they advance in their reversing journey.
Additionally, we hope the simplicity of the project and a forthrightness about its current limitations will inspire others to contribute new steps, fixes, and additional functionality.
At this time, the first five steps are as follows–
Step 0: Identifying Go Binaries
Go Build ID RegexBy popular request, we are including a simple YARA rule to help identify Go binaries. This is a quick and scrappy solution that checks for PE, ELF, and Mach-O file headers along with a regex for a single Go Build ID string.
Step 1: Recreate pcln Table
Recreating the Go pcln tableDealing with stripped Golang binaries is particularly awful. Reversers unfamiliar with Go are disheartened to see that despite having all of the original function names within the binary, their disassembler is unable to connect those symbols as labels for their functions. While that might seem like a grand coup for malware developers attempting to confuse and frustrate analysts, it isn’t more than a temporary inconvenience. Despite being stripped, we have enough information available to reconstruct the Go pcln table and provide the disassembler with the information it needs.
Two noteworthy points before continuing:
- The pcln table is documented as early as Go v1.12 and has been modified as of v1.16.
- IDA Pro v7.6+ handles Go binaries very well and is unlikely to need this step. This script will prove particularly valuable for folks using IDA v7.5 and under when dealing with stripped Go binaries.
Depending on the file’s endianness, the script will locate the pcln table’s magic header, walk the table converting the data to DWORDs (or QWORDs depending on the bitness), and create a new segment with the appropriate ‘.gopclntab’ header effectively undoing the stripping process.
Step 2: Discover Missing Functions and Restore Function Names
Function discovery and renamingImmediately after recreating the pcln table (or in unfortunate cases where automatic disassembly fails), function names are not automatically assigned and many functions may have eluded discovery. We are going to fix both of these issues in one simple go.
We know that the pcln table is pointing us at every function in the binary, even if IDA hasn’t recognized them. This script will walk the pcln table, check the offsets therein for an existing function, and instruct IDA to define a function wherever one is missing. The resulting number of functions can be drastically greater even in cases where disassembly seems perfect.
Additionally, we’ll borrow Tim Strazzere’s magic from the original Golang Loader Assist in order to label all of our functions. We ported the plugin to Python 3, made it compatible with the new IDA APIs, and refactored it. The functionality is now part of two separate steps (2 and 4) and easier to maintain. In this step, Strazzere’s magic will help us associate all of our functions with their original names in an IDA friendly format.
Step 3: Surface User-Generated Functions
Automatically categorizing functionsHaving recovered and labeled all of our functions, we are now faced with the daunting proposition of sifting through thousands of functions. Most of these functions are part of Go’s standard packages or perhaps functionality imported from GitHub repositories. To belabor a metaphor, we now have street signs but no map. How do we fix this?
IDA v7.5 introduced the concept of Folder Views, an easily missed Godsend for the anal-retentive reverser. This feature has to be explicitly turned on via a right-click on the desired subview –whether functions, structures, imports, etc. IDA v7.6 takes this a step further by introducing a thus-far undocumented API to interact with these folder views (A heartfelt thank you to Milan Bohacek for his help in effectively wielding this API). That should enable us to automate some extraordinary time-saving functionality.
While our malware may have 5,000 functions, the majority of those functions were not written by the malware developers, they’re publicly documented, and we need nothing more than a nominal overview to know what they do.
NOBELIUM GoldMax (a.k.a. SunShuttle)
94c58c7fb43153658eaa9409fc78d8741d3c388d3b8d4296361867fe45d5fa45
By being clever about categorizing function packages, we can actually whittle down to a fraction of the overall functions that merit analyst time. Functions will be categorized by their package prefixes and further grouped as ‘Standard Go Packages’, unlabeled (‘sub_’), uncategorized (no package prefix), and Github imports. What remains are the ‘main’ package and any custom packages added by the malware developers. For a notable example, the NOBELIUM GoldMax (a.k.a. SunShuttle) malware can be reduced from a hulking 4,771 functions to a mere 22. This is the simplest and perhaps the most valuable step towards our goal of understanding the malware’s functionality.
Step 4: Fix String References
Accurately recasting strings by referenceFinally, there’s the issue of strings in Go. Unlike all C-derivative languages, strings in Go are not null terminated. Neither are they grouped together based on their source or functionality. Rather, the linker places all of the strings in order of incremental length, with no obvious demarcation as to where one string ends and the next begins. This works because whenever a function references a string, it does so by loading the string address and a hardcoded length. While this is a safer paradigm for handling strings, it makes for an unpleasant reversing experience.
So how do we overcome this hurdle? Here’s where another piece of Strazzere’s Golang Loader Assist can help us. His original plugin would check functions for certain string loading patterns and use these as a guide to fix the string assignments. We have once again (partially) refactored this functionality, made it compatible with Python 3, and IDA’s new APIs. We have also improved some of the logic for the string blobs in place (either by suspicious length or because a reference points to the middle of a blob) and added some sanity checks.
While this step is already a marked improvement, we are seeing new string loading patterns introduced with Go v1.17 that need adding and there’s definitely room for improved refactoring. We hope some of you will feel inclined to contribute.
Where Do We Go From Here?
Let’s take a step back and look at where we are after all of these steps. We have an IDB with all functions discovered, labeled, and categorized, and hopefully all of their string references are correctly annotated. This is an ideal we could seldom dream of with malware of a comparable size written in C or C++ without extensive analysis time and prior expertise.
Now that we have a clear view of the user-generated functionality, what more can we do? How else can we improve our hunting and analysis efforts?
The following are a series of ideas we’d recommend implementing in further steps–
- How about auto-generating YARA rules for user-generated functions and their referred strings?
- Want a better understanding of arguments as they’re passed between functions? How about automagically setting breakpoints at the runtime stack prologues and annotating the arguments back to our IDB?
- For reversers that follow the Kwiatkowski school of rewriting the code to understand the program’s functionality, how about selectively exporting the Hex-Rays pseudocode solely for the user-generated functions?
- How about reconstructing interfaces and type structs from runtime objects?
Got more ideas? Want to help? Head to our SentinelLabs GitHub repo for all of the scripts and contribute your own shortcuts and superpowers for analyzing Go malware!
We’d like to thank the following for their direct and indirect contributions–
- Tim Strazzere for his original Golang Loader Assist script, which we refactored and made compatible with Python3 and the new IDA APIs.
- Milan Bohacek (Avast Software s.r.o.) for his invaluable help figuring out the idatree API.
- Joakim Kennedy (Intezer)
- Ivan Kwiatkowski (Kaspersky GReAT) for making his Go reversing course available.
- Igor Kuznetsov (Kaspersky GReAT) for his help understanding newer pcln tab versions.
References
Guides and Documentation
- Go v1.2 PCLN Table documentation
- Dorka Palotav’s guide to reversing Go with Ghidra
- Osiris Lab @ NYU Tandon – Deepdive into Reverse Engineering Go (Pt1, Pt2)
- George Zaytsev, ZeroNights 2016 (PDF)
Videos
- Tim Strazzere’s talk on the original Golang Loader Assist
- Ivan Kwiatkowski’s Reversing in Action (Pt1, Pt2)
- Alex Useche’s Anatomy of a Gopher
Tools
- Golang Loader Assist (python v2.7, defunct)
- IDA Golang Helper
- GoRE and redress
Article Link: AlphaGolang | A Step-by-Step Go Malware Reversing Methodology for IDA Pro - SentinelOne