This entry is about how to make the best use of IDA and Hex-Rays with regards to a common scenario in malware analysis, namely, dynamic lookup of APIs via
GetProcAddress (and/or import resolution via hash). I have been tempted to write this blog entry several times; in fact, I uploaded the original code for this entry exactly one year ago today. The problem that the script solves is simple: given the name of an API function, retrieve the proper type signature from IDA's type libraries. This makes it easier for the analyst to apply the proper types to the decompilation, which massively aid in readability and presentability. No more manually looking up and copying/pasting API type definitions, or ignoring the problem due to its tedious solution; just get the information directly from the IDA SDK. Here is a link to the script.
Hex-Rays v7.4 introduced special handling for
GetProcAddress. We can see the difference -- several of them, actually -- in the following two screenshots. The first comes from Hex-Rays 7.1:
<img alt="HR71.png" src="https://images.squarespace-cdn.com/content/v1/53a64cc2e4b0c63fc41a3320/1622591294913-ECKXWT79SLSJJ3Y4P68W/ke17ZwdGBToddI8pDm48kElwKFYL0YQT9O99vXDNcLRZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIpv12JV5_HqNSUeAh1x1m4KZIHAvXLcE2eqIiFRuUsBo/HR71.png?format=1000w" />
The second comes from Hex-Rays 7.6:
<img alt="HR76.png" src="https://images.squarespace-cdn.com/content/v1/53a64cc2e4b0c63fc41a3320/1622591347166-XGTP0AWV9QHDOWFUNB69/ke17ZwdGBToddI8pDm48kOXsg44jnerft-oRije-mOYUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKczr6NSiBIXwOiU-eXJscGJ7ftFtp6al0rWkTdEgrq8MQCqZELMhHrGC-qZv4LjCcE/HR76.png?format=1000w" />
Several new features are evident in the screenshots -- more aggressive variable mapping eliminating the first two lines, and automatic variable renaming changing the names of variables -- but the one this entry focuses on has to do with the type assigned to the return value of
GetProcAddress. Hex-Rays v7.4+ draw upon IDA's type libraries to automatically resolve the name of the procedure to its proper function pointer type signature, and set the return type of
GetProcAddress to that type.
This change is evident in the screenshots above: for 7.1, the variable is named
v7, its type is the generic
FARPROC, and the final line shows a nasty cast on the function call. For 7.6, the variable is named
IsWow64Process its type is
BOOL (__stdcall *)(HANDLE, PBOOL) (the proper type signature for the
IsWow64Process API), and the final line shows no cast. Beyond the casts, we can also see that applying the type signature also changes the types of other local variables:
v5 in the first has the generic type
v5 has the proper type
BOOL in the second.
These screenshots clearly demonstrate that IDA is capable of resolving an API name to its proper type signature, the desirable effects of applying the proper type signature on readability, and the secondary effects of setting the types of other variables involved in calling those APIs.
Relevance to Malware Analysis
Hex-Rays' built-in functionality won't work directly when malware looks up API names by hash, or uses encrypted strings for the API names: the decompiler must see a fixed string being passed to
GetProcAddress to do its magic. Although the malware analysis community seems very comfortable in dealing with imports via hash and encrypted strings, they seem less comfortable with applying proper type signatures to the resultant variables and structure members. Only one publication I'm aware of bothers to tackle this, and it relies upon manual effort to retrieve the type definitions and create
typedefs for them. This is unfortunate, as applying said types dramatically cleans up the decompilation output, but this is understandable, as the manual effort involved is rather cumbersome.
As a result, most publications that encounter this problem feature screenshots like this one (note all of the casts on the function pointer invocations, and the so-called "partial types"
<img alt="AnalysisWithCasts.png" src="https://images.squarespace-cdn.com/content/v1/53a64cc2e4b0c63fc41a3320/1622591586565-RRF2GE1J5AXZ9BGA3GUR/ke17ZwdGBToddI8pDm48kBsWDvr_Uz3epamsEEeOdRVZw-zPPgdn4jUwVcJE1ZvWQUxwkmyExglNqGp0IvTJZamWLI2zvYWH8K3-s_4yszcp2ryTI0HqTOaaUohrI8PIPzmcVhhcLxh2SeL86AMeSixJ66nLfGfpiMxmLWaV-kw/AnalysisWithCasts.png?format=1000w" />
(I chose not to link the analysis from which the above screenshot was lifted, because my goal here is positive assistance to the malware analysis community, and not to draw negative attention to anyone's work in particular. This pattern is extremely frequent throughout presentations of malware analysis; it is immaterial who authored the screenshot above.)
I did not know how to resolve an API name to its type signature, so I simply reverse engineered how Hex-Rays implements the functionality mentioned at the top of this entry. The result is a function
PrintTypeSignature(apiName) you can use in your scripts and day-to-day work that does what its name implies: retrieves and prints the type signature for an API specified by name.
The script includes a demo function
Demo() that resolves a number of API type signatures and prints them to the console. It begins by declaring a list of strings:
<img alt="PyDemo.png" src="https://images.squarespace-cdn.com/content/v1/53a64cc2e4b0c63fc41a3320/1622592018876-M54KYI7EVS4Z5YDGX5FY/ke17ZwdGBToddI8pDm48kHhrDwgnJkVEHiVRRC_ulUFZw-zPPgdn4jUwVcJE1ZvWhcwhEtWJXoshNdA9f1qD7dss-b_TPjOgQSEDNYV-zhc4GkYiOf1HdZD7ACnxAzg8aC6KKP7qxWcbxDIAQQwqBQ/PyDemo.png?format=1000w" />
The output of the script is the type signatures, ready to be copied and pasted into the variable type window and/or a structure declaration.
<img alt="ScriptOutput.png" src="https://images.squarespace-cdn.com/content/v1/53a64cc2e4b0c63fc41a3320/1622592053867-QVSG8K7LETOQZETV0YD0/ke17ZwdGBToddI8pDm48kPto-g4eqOCfYg4Pr4EWCFoUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYy7Mythp_T-mtop-vrsUOmeInPi9iDjx9w8K4ZfjXt2dueTpMPBRJPUATJhQuyAl2ebZHERRIcVKxAYIdX4TuQiCjLISwBs8eEdxAxTptZAUg/ScriptOutput.png?format=1000w" />
A Final Note
One further note: architecturally, there is a discrepancy between how the Hex-Rays microcode machinery handles type information for direct calls versus indirect ones. To summarize, you may still see casts in the output after applying the proper type signature to a variable or structure member. If this happens, right-click on the indirect call and press
Force call type to force the proper type information to be applied at the call site. However, only do this once you have set the proper type information for the function pointer variable or structure member.
<img alt="ForceCallType.png" src="https://images.squarespace-cdn.com/content/v1/53a64cc2e4b0c63fc41a3320/1622592099614-6EBKFS88I6XNX1HTXXN2/ke17ZwdGBToddI8pDm48kDljSOBAFgFXYy4Sr909zcIUqsxRUqqbr1mOJYKfIPR7LoDQ9mXPOjoJoqy81S2I8N_N4V1vUb5AoIIIbLZhVYxCRW4BPu10St3TBAUQYVKcWK7zI6JbkNXV-E_uyDOTFhwe2p9ikPGfeGFQuM_x6UdTpEKoyjOxtqZC4ERUnxeg/ForceCallType.png?format=1000w" />
Mostly I published this because I want to see more types applied, and fewer casts, in the malware analysis publications that I read. Please, use my script!
Article Link: Hex-Rays, GetProcAddress, and Malware Analysis — Möbius Strip Reverse Engineering