Threat Research Blog
FLARE IDA Pro Script Series: Automating Function Argument Extraction
This blog post is the next episode in the FLARE team IDA Pro Script series. All scripts and plug-ins are available from our GitHub repo at https://github.com/fireeye/flare-ida.
Automating the Repetitive
I am a big believer in automating repetitive tasks to improve and simplify reverse engineering. The task described in this blog comes up frequently in malware analysis: identifying all of the arguments given to a function within a program. This situation may come up when trying to:
- Identify the size, location, and possible key used to decrypt encoded strings used by the malware.
- Identify each function pointer used to start a new thread (i.e., the arguments to CreateThread or _beginthreadex).
- Identify static strings used in API functions that may be used as an indicator (i.e., named mutexes and events).
- Identify all of the functions resolved at runtime (i.e., all of the arguments to GetProcAddress).
To assist reverse engineers who face similar obstacles we are releasing a new IDA Python utility as part of our FLARE IDA script release: argtracker. This tool is not meant as a stand-alone plug-in, but rather as an aid to quickly write and develop your own custom analysis scripts within IDA.
With argtracker, you can request all arguments to a given function, and then analyze, decode, or emulate in any way you see fit.
Starting off with a simple example, let's consider a Gh0st variant. This malware family traditionally has its own wrapper function around the Win32 function _beginthreadex to launch new threads. This function typically has a function prototype shown in Figure 1.
Suppose we’d like to quickly find all values of the lpStartAddress parameter so that we can perform some automated analysis of each thread function. We can use argtracker for exactly this purpose using the code in Figure 2. The full source code for this example can be found in examples/argtracker_example1.py in our flare-ida GitHub repo.
In this sample code, we start by creating a new Vivisect workspace. Vivisect (see installation instructions at the end) is a separate binary analysis framework written in Python and is used heavily by argracker. The workspace contains analysis information similar to an IDB file for IDA. Using this workspace, a new ArgTracker instance is created. For each code cross-reference to MyCreateThread, we call tracker.getPushArgs(xref, 7). This function takes as parameters:
- The location of the call instruction being analyzed: xref
- The number of stack arguments to extract: 7
- An optional list of register names that contain arguments to extract, which defaults to being empty
This malware sample uses the _cdecl calling convention where all arguments are pushed on the stack, so all seven arguments should be able to be recovered by argtracker. The third optional argument is explained in the next example.
The return from getPushArgs() is a list of result dictionaries. Each dictionary contains numbered keys ranging from 1,2,..n where n is the number of stack arguments requested. In the sample code in Figure 2 we obtain the lpStartAddress values by retrieving the tuple with key 3, since lpStartAddress is the third parameter to the function we analyzed. The values in the tuple are (va, value), where va is the effective address that argtracker observed the data value being passed as a parameter. The sample script merely prints this information out, but real scripts would begin actual analysis with these results.
The reason that getPushArgs() returns a list is for situations like in Figure 3, where separate code paths can setup function arguments to a single call instruction. Each entry in the result list contains the complete set of arguments if the different code paths were taken.
For a more complex example, suppose you have an annoying piece of malware that decodes all of its strings at runtime. In this example, the function that decodes the strings has the function prototype shown in Figure 4. The function has a non-standard calling convention, and we are using IDA’s __usercall annotation to allow us to specify that it takes two parameters on the stack (inptr and tempPtr), and three arguments in registers (outPtr in ecx, strLen in edi, and key in eax).
Note: setting a function prototype in this way is not necessary for argracker, and is merely done here for illustrative purposes.
The examples/ argtracker_example2.py in our flare-ida GitHub repo shows how argtracker can be used in this situation. In this example we really only care about three of these parameters: inptr, strLen, and key. Figure 5 shows basic initialization of the Vivisect workspace using a helper function in the jayutils file (also in our GitHub rep). A tracker object is created from the Vivisect workspace, and then we get all cross-references to the string decoder function (decStringFunc).
Figure 6 shows the call to getPushArgs() for every reference to the decoder function. Only two arguments are passed on the stack so the second argument is 2. Because registers are used to pass parameters, we now need to use the third argument to getPushArgs() by passing in a list of register names that contain arguments to extract: [‘eax’, ‘ecx’, ‘edi’]. The return value is still a list of dictionaries whose keys are the recovered arguments. In addition to the stack arguments that can be accessed based on their order (1,2.. n), the register arguments can be recovered by using the register name as the dictionary key. As in the first example, the result is a tuple that contains the effective address where the register was modified prior to the call instruction, and the value.
How it Works
argtracker relies heavily upon Vivisect to perform additional analysis on the malware. A separate Vivisect workspace (.viv file) is created to store the analysis, so you may be asked to specify the path to the original malware file if the script cannot find the file based on information stored in the IDB. Each function that contains a function call whose arguments are desired is emulated by Vivisect and all memory reads, memory writes, and register modifications are tracked. The script then traces backwards from the call instruction under observation, queuing separate branch sources as they are encountered, until either all conditions specified by the user are met, or the function start is reached.
The Python script has been used successfully on both 32-bit x86 and 64-bit x64 disassembly, but the 32-bit analysis has been tested much more extensively. Other processor types have not been tried.
One important caveat for this is that Vivisect’s emulation used by argtracker is only run at the function level. Vivisect returns fake stub data for data that is non-constant within the function, such as uninitialized global data or function parameters, to allow analysis to complete. You should sanity-check the results of argtracker to make sure that these stub values aren’t affecting the results.
As with our other IDA Pro plug-ins, clone the git repository at https://github.com/fireeye/flare-ida. The python directory can either be copied to the %IDADIR%\python directory, or it can be in any directory found in your PYTHONPATH environment variable.
Clone the Vivisect repository from https://github.com/vivisect/vivisect and add the package to your PYTHONPATH environment variable if you don’t already have it installed.
Test the installation by running the following Python commands within IDA Pro and ensure no error messages are produced:
We hope you find argtracker as useful as we do and that it speeds up your analysis. Stay tuned for more helpful reverse engineering code and blog posts from the FLARE Team.