I was recently probing into the Visual Basic 6 format trying to see how the
IDispatch implementation resolved vtable offsets for functions.
While digging through the code, I came across several structures that I had not yet explored. These structures revealed a trove of information valuable to reverse engineers.
As we will show in this post, type information from internal structures will allow us to recover function prototypes for public object methods in standard VB6 executables.
This information is related to
IDispatch internals and does not require a type library like those found with ActiveX components.
For our task of malware analysis, a static parsing approach will be of primary interest.
Most VB6 code units are COM objects which support the
IDispatch interface. This allows them to be easily passed around as generic types and used with scripting clients.
IDispatch allows functions to be called dynamically by name. The following VB6 code will operate through
This simple code demonstrates several things:
- VB6 can call the proper method from a generic object type
- named arguments in the improper order are correctly handled
- values are automatically coerced to the correct type if possible (here the string “5” is converted to long)
If we add a DebugBreak statement, we can also see that the string “myFunc” is passed in as an argument to the underlying
__vbaLateMemNamedCallLd function. This allows
IDispatch::Invoke to call the proper function by string name.
From this we can intuit that the
IDispatch implementation must know the complete function prototype along with argument names and types for correct invocation to occur.
We can further test this by changing an argument name, adding unexpected arguments, or setting an argument type to something that can not be coerced too long.
Each of these conditions raises an error before the function is called.
Note that there is no formal type library when this code is compiled as a standard executable. This type information is instead embedded somewhere within the binary itself.
So where is this type information stored, and how can we retrieve it?
If we start probing the runtime at
BASIC_CLASS_Invoke we will encounter internal functions such as
CoerceArg, and so on. As we debug the code, we can catch access to known VB6 structures and watch where the code goes from there.
Before we get deeper, I will give a quick high-level overview of the Visual Basic 6 file format.
VB6 binaries have a series of nested structures that layout all code units, external references, forms etc. They start with the
VBHeader and then branch out, covering the various aspects of the file.
In the same way that the Windows loader reads the PE file to set up proper memory layout, the VB runtime (msvbvm60.dll) reads in the VB6 file structures to ready it for execution within its environment.
Good references for the VB6 file format include:
- Visual Basic Image Internal Structure Format – Alex Ionescu
- Semi-VBDecompiler source code – VBGamer45
- vb-decompiler.theautomaters.com message board
The structure and fields names I use in this document primarily come from the Semi-VBDecompiler source. A useful structure browser is the free vbdec disassembler:
For our work, we will start with the
ObjectTable found through the
ObjectTable->ObjectArray holds the number of Object structures as defined by its
ObjectCount field. An example is shown below:
This is the top-level structure for each individual code object. Here we will find the
ProcCount entries. This array reveals the public method names defined for the object.
Object->ObjInfo structure will also lead us to further information of interest.
ObjInfo->PrivateObject will lead us to the main structure we are interested in for this blog post.
I could not find public documentation on this structure or those below it. What follows is what I have put together through analysis.
Development of this information had me working across four different windows simultaneously.
- disassembly of the vb runtime
- disassembly of the target executable
- debugger stepping through the code and syncing the disasm views
- specialized tool to view and search vb6 structures
The following is my current definition of the
In the debugger, I saw vb runtime code accessing members within this structure to get to the vtable offsets for an Invoke call. I then compiled a number of source code variations while observing the effects on the structure. Some members are still unknown, but the primary fields of interest have been identified.
PrivateObj structure leads us to arrays describing each code object’s public function, variable, and event type information.
Counts for the event and variable arrays are held directly within the
PrivateObj itself. To get the count for the public functions, we have to reference the top-level
Note that there is a null pointer in the
FuncType array above. This corresponds to a private function in the source code. If you look back, you will also see that same null entry was found in the
ProcNames array listed earlier.
Each one of these type information structures is slightly different. First, we will look at what I am calling the
I currently have it defined as follows:
The structure displayed above is for the following prototype:
This structure is where we start to get into the real meat of this article. The number of types defined for the method is held within the
argSize field. The first three bits are only set for property types.
- 111 is a property Set
- 010 is a property Let
- 001 is a property Get (bFlags bit one will additionally be set)
The type count will be the remaining bits divided by four.
bFlags bit one is set, the last entry represents the method’s return value. In all other scenarios, the types represent the arguments from one to arg count. VB6 supports a maximum of 59 user arguments to a method.
argSize of 0 is possible for a sub routine that takes no arguments and has no return value.
vOff field is the vtable offset for the member. The lowest bit is used as a flag in the runtime and cleared before use. Final adjusted values will all be 32-bit aligned.
optionalVals field will be set if the method has optional parameters which include default values.
LpAryArgNames is a pointer to a string array. It is not null- terminated, so you should calculate the argument count before walking it.
After the structure, we see several extra bytes. These represent the type information for the prototype. This buffer size is dynamic.
A mapping of these values was determined by compiling variations and comparing output for changes. The following values have been identified.
The above code, shows that
0x80 bits have been set to represent
Optional specifiers. These are combined with the base types identified in the enumeration. These do not align with the standard
VARIANT VARENUM type values.
The comobj and internal flags are special cases that embed an additional 32-bit value after the type specifier byte.
This will link to the targets
ObjInfo structure for internal objects.
comobj type is encountered, an offset to another structure will be found that specifies the objects library guid, clsid, library name, and dll path. We will show this structure later when we cover public variables.
It is important to note that these offsets appear to always be 32-bit aligned. This can introduce null padding between the type byte and the data offset. Null padding has also been observed between individual type bytes as well. Parsers will have to be tolerant of these variations.
Below is the type definition for the following prototype:
Next up comes a quick look at the
PubVarDesc structure. Here is a structure for the following prototype:
This example shows flag
0x1D to declare an external COM object type. An offset then follows this to a structure which defines the details of the COM object itself.
0x3c in the
varOffset field is where the data for this variable is stored. For VB6 COM objects the
ObjPtr() returns a structure where the first member is a pointer to the objects
Vtable. The areas below that contain class instance data. Here
myPublicVar would be found at
Finally, we will look at an
EventDesc structure for the following prototype:
This structure closely follows the layout of the
PubFuncDesc type. One thing to note, however, is that I have not currently been able to locate a link to the Event name strings embedded within the compiled binaries.
In practice, they are embedded after the strings of the
Note that the class can raise these events to call back to the consumer. There is no implementation of an event routine within the code object that defines it.
In this post, we have detailed several structures which will allow analysts to parse the VB object structures and extract function prototypes for public object members.
This type information is included as part of the standard IDispatch plumbing for every user-generated VB6 form, class, user control, etc. This does not apply to functions in BAS code modules as they are not COM objects internally.
It is also interesting to note that VB6 embeds this type data right along with the other internal structures in the .text section of the PE file. No type library is required, and this feature can not be disabled.
While the structures portion of the .text section can contain various compiler-generated native code stubs, all user code falls into the memory range as defined by the
The framework required to analyze a VB6 binary and resolve the information laid out in this posting can be fairly complex. Luckily open-source implementations already exist to help ease the burden.
Structure and prototype extraction routines have already been included in the free vbdec disassembler. This tool can also generate IDC scripts to apply the appropriate structure definitions to a disassembly.
VB6 binaries are typically considered hard to analyze. Extracting internal function prototypes will be a very welcome addition for reverse engineers.
The post Recovery of function prototypes in Visual Basic 6 executables appeared first on Avast Threat Labs.
Article Link: Recovery of function prototypes in Visual Basic 6 executables - Avast Threat Labs