Rebuilding a PE File From Memory

MalBot · June 16, 2024, 12:41am

Malware often extracts an embedded PE (Portable Executable) file from within itself, and either overwrites its original process image, or starts and overwrites a new process (process hollowing), with the embedded image. What if you want to save a copy of this extracted PE file so that you can analyse it using something other than the debugger that you were running the sample in?

While looking at Tofsee I noticed that it extracted an embedded PE file and overwrote its original process image in memory (at 0x400000) with the extracted PE file. It would be good to save a copy of that so that I can analyse it in Ghidra, or in a different debugger, or to just save the extracted PE as a malware sample without all the packing around it. You never know, we may then find different malware samples that end up unpacking the same embedded PE file.

If you load the malware sample into x64dbg, go to the memory map in x64dbg and select all (hold ‘Shift’ key and left-click) the memory blocks corresponding to the process name (typically the memory block at 0x400000), and the memory blocks corresponding to the .text, .data, .rsrc, and .reloc sections. Right-click and select ‘Dump Memory to File’:

Address	Size	Party	Info	Content	Type	Protection	Initial
00400000	00001000	User	sample.exe		IMG	ER—	ERWC-
00401000	0002E000	User	“.text”	Executable code	IMG	ER—	ERWC-
0042F000	01FEF000	User	“.data”	Initialized data	IMG	-RWC-	ERWC-
0241E000	00028000	User	“.rsrc”	Resources	IMG	-R—	ERWC-
02446000	0000B000	User	“.reloc”	Base relocations	IMG	-R—	ERWC-

Memory blocks corresponding to the loaded PE file image

You’ll notice that that creates a huge file (33,886,208 bytes) when the original PE file was only 398,848 bytes! That is because x64dbg has saved the sections to the file with the same spacing as they have in memory. That is, the address of the first memory block (0x400000 in this case) becomes offset 0 in the file, and the other memory blocks are saved at their memory offset from 0x400000, in the file. For example, the second memory block (.text) is at 0x401000, so we’ll find that at offset 0x1000 (0x401000 – 0x400000) in the file. The .data section is at 0x42f000, so we’ll find that at offset 0x2f000 (0x42f000 – 0x400000) in the file. Hence we end up with a large file with a lot of zero bytes (padding between the sections) in it.

If you remember back to the PE file header, you’ll remember that there is a field in the section table that specifies the memory load address and the file offset of each section. We can use this information to rebuild the PE file from the various memory sections.

If, instead of selecting the multiple memory blocks at once and dumping to a file, we select each one in turn and dump it to a file, then we get the memory of each section in a separate file. We can then put them back together, according to the section table in the file header, using some UNIX^TM jiggery pokery¹. Let’s first dump the PE file header from the first block of memory which was at 0x400000:

$ objdump -h sample_00400000.bin
BFD: error: sample_00400000.bin(.text) is too large (0x2d06c bytes)

sample_00400000.bin:     file format pei-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0002d06c  00401000  00401000  00000400  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .data         00001e00  0042f000  0042f000  0002d600  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .rsrc         00027978  0241e000  0241e000  0002f400  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .reloc        0000a6bc  02446000  02446000  00056e00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

Ignoring that error, which I’m guessing is objdump telling us that the .text section is located past the end of the file (because the file only contains the PE file header(s) and none of the sections), we can see the memory addresses and the file offsets for each of the PE file sections. x64dbg has saved the memory address in the file name when it dumped each memory section, so we can now start putting the separate sections together, using the dd(1) command, to build a PE file.

I’ll explain the command line arguments for the dd(1) command:

if	input file name
of	output file name
bs	block size (default 512 bytes)
seek	number of blocks to skip in the output file. dd(1) will lseek(2)/fseek(3) to this location in the output file before writing to it. This argument is used to start writing the various sections at their correct location in the output (PE) file. Without this argument, dd(1) will start writing at the start of the output file and clobber any existing content

dd(1) command line arguments

I’ll explain the whole block size thing. dd(1) reads and writes blocks (512 byte blocks by default), so we need to convert the offset used in the seek argument to the number of blocks rather than the number of bytes. Well, technically we don’t, because we can just specify a bs of ‘1’ (byte) and then specify the number of bytes in the seek argument.

Using a block size (bs) of ‘1’ (byte) is inefficient though, because dd(1) reads and writes a block at a time and hence a bs argument of ‘1’ will cause dd(1) to read(fd, buf, 1) and write(fd, buf, 1) which means more system calls (511 extra read() calls and 511 extra write() calls for every 512 bytes of data!). This PE file is reasonably small so there probably won’t be any really noticeable difference, but when dealing with large files (USB thumb drive/CD/DVD images for instance), the larger block size you can use, the better. In this case we are restricted by the offset in the PE file where we need to start writing, because we need to give dd(1) an integer number of blocks (rather than a number of bytes) to skip. That is why I change bs from ‘1024’ to ‘512’ in some of the commands below — the offset of 0x2d600 (185856) is not divisible by 1024, but is divisible by 512.

The next obstacle is that objdump(1) gives us the file offsets in hexadecimal (hex), but dd(1) wants decimal values, so we need to convert. You can (if you have to) use bash(1) to do this. I’ve been using UNIX^TM for some time so prefer to use something that is closer to being POSIXLY_CORRECT², like bc(1).

Here, then, are the commands to reconstruct a valid PE file from the memory sections. I’ve done the hex to decimal conversion and shown my working (an ode to maths teachers) in the comments preceding each dd(1) command:

# start with the PE file header(s)
$ cp -a sample_00400000.bin pefile.exe

# add the .text section from 0x401000 to 0x400
# 0x400 == 1024
# 1024 / 1024 (bs) == 1
$ dd if=sample_00401000.bin of=pefile.exe bs=1024 seek=1
184+0 records in
184+0 records out
188416 bytes (188 kB, 184 KiB) copied, 0.00115347 s, 163 MB/s

# add the .data section from 0x42f000 to 0x2d600
# 0x2d600 == 185856
# 185856 / 512 (bs) == 363
$ dd if=sample_0042F000.bin of=pefile.exe bs=512 seek=363
65400+0 records in
65400+0 records out
33484800 bytes (33 MB, 32 MiB) copied, 0.141891 s, 236 MB/s

# add the .rsrc section from 0x241e000 to 0x2f400
# 0x2f400 == 193536
# 193536 / 1024 (bs) == 189
$ dd if=sample_0241E000.bin of=pefile.exe bs=1024 seek=189
160+0 records in
160+0 records out
163840 bytes (164 kB, 160 KiB) copied, 0.00103226 s, 159 MB/s

# add the .reloc section from 0x2446000 to 0x56e00
# 0x56e00 == 355840
# 355840 / 512 (bs) == 695
$ dd if=sample_02446000.bin of=pefile.exe bs=512 seek=695
88+0 records in
88+0 records out
45056 bytes (45 kB, 44 KiB) copied, 0.000521471 s, 86.4 MB/s

If we compare the output file (of), pefile.exe, with the original PE file, sample.exe, loaded into x64dbg, we can see an obvious difference:

-r——– 1 user group 398848 Oct 7 2023 sample.exe
-rw——- 1 user group 400896 Jun 10 10:25 pefile.exe

Should they not be the same size?! Let’s think about what’s going on here. We dumped the memory sections from x64dbg and pasted them back together at the correct file offsets in pefile.exe. However, memory is allocated in pages (of typically 4096 bytes on 80×86 processors), and x64dbg is dumping the whole block of memory which is hence going to be an integer number of pages (that is, an integer multiple of 4096 bytes).

We are, however, specifying the offset where we want the sections to be written in the output file, so the extra padding at the end of the section dumps shouldn’t make any difference because each section will be truncated to the same size as in the original PE file when we place the next section at its correct offset. Except, that is, for the last section which isn’t truncated because we’re not writing another section after it.

The section table is telling us that the last section (.reloc) is 0xa6bc (42,684) bytes, but the file that x64dbg dumped for the last section is 45,056 (11 x 4096) bytes. So, taking that difference (2,372 bytes) into consideration and subtracting it from the size of the rebuilt PE file (400,896 bytes) we get 398,524 bytes. The original PE file (at 398,848 bytes) is larger by 324 bytes, which looks like it could be padding (something that I could have done with when I came off the front of a 36″ unicycle doing around 20 km/hr a few weeks ago). 398,848 / 512 is 779.0, whereas 398,524 / 512 is 778.3671875, so the original PE file may have been padded to the nearest 512 bytes, for some reason, possibly in case it crashes. Padding is handy in a crash.

So there we have it — we’ve reconstructed a PE file from memory. We can now analyse this with Ghidra, and other tools of choice. Let’s do some sanity checking and check the objdump(1) output for both the original sample file, and the reconstructed PE file, we’ll see that the only difference is the file name:

$ objdump -x sample.exe > original
$ objdump -x pefile.exe > reconstructed
$ diff original reconstructed 
2,3c2,3
< sample.exe:     file format pei-i386
< sample.exe
---
> pefile.exe:     file format pei-i386
> pefile.exe

To demonstrate why we need to go to all that trouble to reconstruct the PE file, let’s run objdump(1) on the large (33,886,208 byte) memory dump file that we started off with (the memory dump file containing all of the sections):

There is an import table in .text at 0x42d550

The Import Tables (interpreted .text section contents)
 vma:            Hint    Time      Forward  DLL       First
                 Table   Stamp     Chain    Name      Thunk
 0002d550       004086c3 004086d7 fffffffe 00000000 ffffffd4


PE File Base Relocations (interpreted .reloc section contents)

There is a debug directory in .text at 0x401230

Type                Size     Rva      Offset
  0         Unknown 00000000 00000000 00000000

The .rsrc Resource Directory section:
000  Type Table: Char: 0, Time: 00000000, Ver: 0/0, Num Names: 0, IDs: 0

WARNING: Extra data in .rsrc section - it will be ignored by Windows:
218  Type Table: Char: -622912640, Time: 2520e47f, Ver: 5/0, Num Names: 2, IDs: 0
228   Entry: <corrupt string offset: 0x401cd0>
Corrupt .rsrc section detected!

Notice how objdump(1) fails to get most of the information, like imports, that is contained in one of the sections rather than in the PE header(s). This is because the sections are not at the correct offsets, as specified in the section table, in the PE file. Compare this with objdump(1) output on the reconstructed PE file:

There is an import table in .text at 0x42d550

The Import Tables (interpreted .text section contents)
 vma:            Hint    Time      Forward  DLL       First
                 Table   Stamp     Chain    Name      Thunk
 0002d550       0002d5dc 00000000 00000000 0002dac4 00001014

        DLL Name: KERNEL32.dll
        vma:  Hint/Ord Member-Name Bound-To
        2d806     973  SetEndOfFile
        2d816     313  FindResourceW
        2d826     700  InterlockedDecrement
        2d83e     698  InterlockedCompareExchange
        2d85c       6  AddConsoleAliasW
...

Now my main reason for doing that was to test that I could rebuild a PE file from an unpacked PE file in memory, and hence save the embedded PE file that the Tofsee sample unpacks. I could then load it into Ghidra to see what Ghidra makes of it. Although it is worth noting that you can load the raw memory dumps into Ghidra — you just need to help it out by telling it what the bytes are that you’re giving it (and maintaining valid PE file structure if the memory dump is of a PE file header and sections). So, now let’s return from this little side-track back to our Tofsee analysis.

I grew up thinking ‘jiggery pokery’ was just some kind of clever/fancy tricks, however when I looked it up to verify that, I found out that it is actually used to mean deceptive/slight-of-hand trickery. I’m going to leave it in though because I like the phrase, just not to mean what it’s formal/common definition seems to mean. ︎
POSIX^TM: Portable Operating System Interface defines system APIs and commands for portability. Consequently, shell scripts written on one POSIX^TM system should run without modification on another POSIX^TM system. ︎

Article Link: Rebuilding a PE File From Memory | Malware Musings