tl;dr: In this blog post we describe “Selfie”, a tool we have developed that automates finding the OEP for a majority of malwares packed with self-modifying code. The tool itself is now open-sourced, compiled to 32-bit, and can be found here: https://github.com/BreakingMalware.

Introduction

As malware researchers, we mainly come across packed, encrypted or obfuscated code. The malware typically comes this way as it does this to evade security-based security detectors. However, to investigate malware we need to get rid of the packers and find the point from where the malware begins to run – what’s called the Original Entry Point (OEP). Unfortunately, unpacking is the most time-consuming and complex process for us researchers.

This tedious task is particularly true for self-modifying code. Self-modifying code is a technique commonly used by packers which adds an additional layer of security complexity. With self-modifying code, the packed malware overwrites over itself. The packer does this by unpacking over a dynamically allocated memory, resetting its original code and image, and then copying back the unpacked code. Most frustrating for researchers is that retrieving the malware’s OEP of self-modifying code is done manually.

For this reason, we have developed “Selfie”. Selfie automatically finds the malware’s OEP given malware that’s packed with self-modifying code. The Selfie tool itself is based on DynamoRIO – a Dynamic Binary Instrumentation (DBI) framework.

To note, several less-popular methods implement self-modifying code for which Selfie does not deal with. For example, using the return instruction to get to the OEP or self-modifying code which does not change the size of the IAT.

Manually Unpacking Self-Modifying Code: Step-by-Step

To recall how the unpacking of self-modifying code looks like, we’ll provide here a step-by-step methodology. For demonstration purposes, we use the Trojan Shylock.

Step 1: Taking a Look at the Portable Executable (PE)

This is where we get the initial info to start our unpacking process.

Image 1: Shylock’s characteristics

Image 2: Take note of the highlighted fields

Step 2: Analyzing the first dynamic memory allocation

We start analyzing from the Entry Point as shown in Step 1 (i.e. address 0x00404920).

The analysis brings us to address 0x4040F8. There, we can see a call to the VirtualAlloc function. The call dynamically allocates memory of 5E200 bytes with PAGE_EXECUTE_READWRITE permissions.

Image 3: Virtually allocating the memory

Image 4: The virtualalloc Parameters

Once the memory was successfully allocated, the code (i.e. the loader stub) copies the packed process image to the dynamically allocated memory.

Image 5: EAX points to the base address of the allocated memory and ECX is the counter.

Step 3: Analyzing the second dynamic memory allocation

Once copied, the loader transfers execution to the code in the allocated memory.
We can see that another memory allocation – with PAGE_READWRITE permissions only – occurs. Note that the VirtualAlloc here is called from the dynamic memory from Step 2.

Image 6: A second VirtualAlloc

Once again, we see a copy loop (0x00574210) where the original packed malware (recall, it’s currently mapped in the dynamic memory) includes an embedded encrypted PE. The following code copies the encrypted embedded PE into the dynamic memory allocated above (virtualAlloc #2):

Image 7: The copy loop which copies the embedded encrypted PE

Image 8: The original image base (left) includes the encrypted embedded PE (right).

Image 9: The encrypted PE that was copied to the dynamic memory

Image 10: The copied PE from above after decryption

Step 4: Analyzing the third dynamic memory allocation

A third memory allocation occurs. This time the loader stub copies the decrypted PE to the allocated memory:

Image 11: A Third VirtualAlloc

Image 12: The third copy loop copies the decrypted PE

After the copy routine successfully ends, we land on the following code. In a nutshell, it resets the original image base. Alas, we recognize that this is self-modifying code.

Image 13: The original process image space (0x00400000) is filled with 0’s

Image 14: Finally, a zeroed process image space

Step 5: Analyzing the Penultimate Infection Step

Image 15: the loader copies the decrypted EXE to the zeroed process image space.

The unpacked malware (Shylock):

Image 16: Recall that 0x0040000 is the same address we saw as the Address of Image Base in the malware’s metadata (Image #1)

Next, the loader rebuilds the Import Address Table (IAT) using the LoadLibrary/GetProcAddress combination.

Image 17: LoadLibrary

Image 18: GetProcAddress

Step 6: The Ultimate Goal: Calling the OEP

Image 19: The indirect call to the OEP (0x00403780)

Selfie: Automating the Unpacking of Self-Modifying Code

Selfie uses dynamic instrumentation which allows the author to add or modify an application’s code in runtime. This really empowers us researchers since it allows us to monitor and deeply analyze the malware in runtime. We can consider it as if the instrumentation leeches on the malware so that our code receives full control of the malware.

Before we dive into it, let’s start with some instrumentation background.

A brief background on DBI and DynamoRIO

Selfie uses DynamoRIO – VMWare’s framework (acquired from HP) which provides Dynamic Binary Instrumentation (DBI) capabilities.

For a brief background on DBI and DynamoRIO, I’ll quote http://uninformed.org/index.cgi?v=7&a=1&p=3 which explains it best:

“Dynamic Binary Instrumentation (DBI) is a method of analyzing the behavior of a binary application at runtime through the injection of instrumentation code. This instrumentation code executes as part of the normal instruction stream after being injected. In most cases, the instrumentation code will be entirely transparent to the application that it’s been injected to. Analyzing an application at runtime makes it possible to gain insight into the behavior and state of an application at various points in execution. This highlights one of the key differences between static binary analysis and dynamic binary analysis. Rather than considering what may occur, dynamic binary analysis has the benefit of operating on what actually does occur. This is by no means exhaustive in terms of exercising all code paths in the application, but it makes up for this by providing detailed insight into an application’s concrete execution state.

…

DynamoRIO is an example of a DBI framework that allows custom instrumentation code to be integrated in the form of dynamic libraries. The tool itself is a combination of Dynamo, a dynamic optimization engine developed by researchers at HP, and RIO, a runtime introspection and optimization engine developed by MIT. The fine-grained details of the implementation of DynamoRIO are outside of the scope of this paper, but it’s important to understand the basic concepts.

…
In concrete terms, Dynamo works by processing an instruction stream as it executes. To accomplish this, Dynamo assumes responsibility for the execution of the instruction stream. It uses a disassembler to identify the point of the next branch instruction in the code that is about to be executed. The set of instructions disassembled is referred to as a fragment (although, it’s more commonly known as a basic block). If the target of the branch instruction is in Dynamo’s fragment cache, it executes the (potentially optimized) code in the fragment cache. When this code completes, it returns control to Dynamo to disassemble the next fragment. If at some point Dynamo encounters a branch target that is not in its fragment cache, it will add it to the fragment cache and potentially optimize it. This is the perfect opportunity for instrumentation code to be injected into the optimized fragment that is generated for a branch target. Injecting instrumentation code at this level is entirely transparent to the application. While this is an oversimplification of the process used by DynamoRIO, it should at least give some insight into how it functions.

One of the best features of DynamoRIO from an analysis standpoint is that it provides a framework for inserting instrumentation code during the time that a fragment is being inserted into the fragment cache. This is especially useful for the purposes of intercepting memory accesses within an application. When a fragment is being created, DynamoRIO provides analysis libraries with the instructions that are to be included in the fragment that is generated. To optimize for performance, DynamoRIO provides multiple levels of disassembly information. At the most optimized level, only very basic information about the instructions is provided. At the least optimized level, very detailed information about the instructions and their operands can be obtained. Analysis libraries are free to control the level of information that they retrieve.”

Deep-Diving into Selfie

Using DynamoRIO’s API we were able to write our client – essentially, the client is the DLL which is our Selfie tool. DynamoRIO provides hooks for each time code is placed into the code cache. Through these hooks the client has the ability to inspect and transform any piece of code that is emitted into one of the code caches. That way, we achieved full control, allowing us to perform any action of our liking.

The Selfie Tutorial

Our Selfie algorithm works as follows:

Step 1: On Selfie’s entry point, dr_init(),we retrieve module data for the main executable (say, malware.exe)

Step 2: We save the main module’s start address, end address and IAT size.

Image 20: the Selfie’s entry pointRegister a callback for basic block event using the dr_register_bb_event(func)

Step 3: Register a callback for basic block event using the dr_register_bb_event() API..

Image 21: registering the callback

Step 4: On_event_basic_block callback function

Through the basic block creation event, registered via dr_register_bb_event(), the client has the ability to inspect and transform any piece of code prior to its execution. For every new block, we iterate over the block instructions using the instrlist_first() / instr_get_next() routines.

Image 22: basic block creation event

Step 5: Iterate over the block instructions.

If instr’s opcode is OP_call or OP_call_far (instr_is_call_direct()) or if instr’s opcode is OP_call_ind or OP_call_far_ind (instr_is_call_indirect()) we call the basic block callback function (on_call_event):

Image 23: Call instrumentation

Step 6: Determine the OEP

Check if the target address of the call (the callee) is between the main executable start address and end address.
If the answer is yes we call GetImportAddressTableSize() to get the IAT size at this current point.
If the current IAT size is different from the original IAT size (as we got at the main function) the callee address is the suspected OEP.

Taking a Selfie

To inject Selfie into the malware.exe process, we use drrun.exe with the following parameters:

Selfie in Action

How well does Selfie do?

First, let’s run Selfie against the malware (Shylock) we manually unpacked above:

Image 24: Selfie on Shylock

Let’s take a look now at a Win32/Xswkit (alias Gootkit) malware

To those unfamiliar with the Xswkit malware, it is a clone of the Win32/Poweliks but with some extra features (UAC bypass using Microsoft’s shim engine and different startup method (rundll + mshta.exe)).

For more details of how this sample works, please refer to the EP_X0FF full analysis in the kernelmode.info forum.

For packed and unpacked samples, please also refer to kernelmode.info forum. Credits for Tigzy and R136a1 for samples and to EP_X0FF for the unpacked sample.

Packed:

https://www.virustotal.com/en/file/ed3d622c54b474c6caef540a3147731a1b2c7d4a7563b97731880bb15305d47d/analysis/1420287664/

Prior to running Selfie, this is what we see:

Image 25: Packed sample of Xswkit malware. Note the Image Base Address and the AddressOfEntryPoint (EP) fields.

Unpacked:

https://www.virustotal.com/en/file/c3885c394a3ad75bc53e7ef2b2d8c8e9e5a12a7f3a52c7399d63814f05c52c96/analysis/1420287667/

Image 26: Unpacked sample of Xswkit malware. Note the Image Base Address and the AddressOfEntryPoint (EP) fields. The EP in the unpacked sample is the actual OEP.

Now let’s see if we get similar results with “Selfie” tool: