Mandiant's Nick Bennett and Jake Valletta discussed data stacking at MIRcon™ last month. If you were unable to attend the talk, we will discuss this data analysis technique here on the M-Unition blog.
What is Data Stacking?
Data stacking is the application of frequency analysis to large volumes of similar data in an effort to isolate and identify anomalies. In short, data stacking is an investigative technique that can be used to find a needle in a digital haystack. It involves an iterative process of reducing large amounts of data into manageable chunks that can be consumed and investigated.
At Mandiant we frequently investigate large organizations with hundreds of thousands of hosts. Looking at this data with no pre-processing and reduction, such as data stacking, is infeasible. We commonly use data stacking to discover unknown malware and new Indicators of Compromise (IOCs) during our investigations.
When investigating targeted attacks, malicious data should be much less common throughout the enterprise than benign data, allowing investigators to concentrate their efforts on audit rows that appear different. (We will discuss the "different" in different rows later in the post.)
Given a large set of data, the investigator determines what attributes make the unusual data rows stand out and may indicate they are malicious. These attributes then become the grouping criteria used to create the frequency analysis calculations. The frequency, or count, is then used to determine investigative starting points for possible anomalies within the enterprise.
Data Stacking in Four Steps
There are four basic steps involved in the data stacking process. Most of the time we perform steps three and four several times - grouping, filtering and reducing the data on different sets of attribute as discussed below. The four basic steps are as follows:
First, we acquire data from across the environment to find anomalies across all hosts. It is imperative to get data from as many hosts in the environment as possible. At Mandiant we typically use Mandiant Intelligent Response® (MIR), but the data can come from any source. As long as you have a sufficiently large amount of similar data from multiple systems, the data can be aggregated and analyzed. In the examples given below we discuss stacking of data collected with MIR -- specifically the services audits.
We usually skip step 2 listed above. However, we mention it to show the iterative approach to data reduction. This step is stacking the data without reducing the data set to interesting attributes, which is explained in step 3 below.
At step 3, we are trying to determine a good subset of attributes for stacking. The list of attributes in a standard MIR Services Audit is shown below:
The investigator's goal is to determine a set of attributes that can be used to stack the data. If you pick something that is unique to every system you will end up with a grouping that means nothing. For example, using the process ID of a running service is meaningless for stacking. If you group and count services based on that attribute, you will end up with an even distribution, with approximately the same counts for every row. Also, the fact that something is running with a particular process ID, with no other context, doesn't help a great deal in determining the nature of the process.
Likewise, if you pick an attribute set that is not sufficiently unique, you'll run into the opposite problem: not being able to identify the "evilness" because you don't have enough data. For instance, if you only stack (or group) on the service name, you may be able to identify malware that is installed with random names, but chances are you will not find malware that uses more sophisticated techniques to disguise itself. So, only looking at the names, will not find a malicious service that is called "Disk Driver" (a common service name), but is installed in C: EMP. If you were to group the data on service name and service path you would be able to easily identify such instances.
Determining a useful set of attributes to group the data by, thus reducing the data set, is a balancing act. There are a number of standard attribute sets that investigators at Mandiant use, but even then, a lot depends on the environment, the type of malware, the type of data available to the investigator and so on.
That is why you would typically repeat the grouping/data reduction/analysis process more than once, employing different attribute sets and different ways to slice the data.
A set of attributes commonly used for Service Audits by Mandiant investigators is:
Using this smaller grouping of attributes, we are able to obtain better and more manageable frequency distribution counts. This distribution makes flagging rows for more detailed analysis much easier. This makes analysis easier because the stacked data now has rows at each end of the spectrum. Rows that have been seen on a large number of systems (typically ignored) and rows that have only been seen on a few systems. As we learned earlier, these rows that we have only seen on a few systems are probably where targeted attackers are hiding. With data stacking, false positives are not uncommon, and reducing the number of rows that require investigation makes discovery of unknown threats more feasible.
Other investigators may choose different attribute groupings for their stacking purposes. Most, however, have the same goal in mind: data reduction in an effort to make manual inspection of results possible and manageable in a relatively short time frame.
Last is step 4, looking for anomalies, which can be very tedious. Some rows (frequently mass malware) will immediately jump out of the frequency distribution, making identification easy. Sophisticated attackers are typically more subtle and therefore, harder to discover. We hope to give you a few pointers in discovering these camouflaged outliers.
As an example, let's consider a small set of services data that we obtained and stacked on service name, service path and service DLL.
In the example above, we have two rows representing the "Seclogon" service. One has nearly 5,600 systems with the same attributes. We know the count roughly represents systems because the service names must be unique, so a single system cannot have two services with the same name allowing that system to increment the count more than once. The other row has a much lower count of two. From the table, we can see the difference lies in the naming of the Service DLL, seclogon.dll vs. selogon.dll (note the missing 'c'). In this situation the investigator would acquire the "selogon.dll" file to determine if it's malicious.
The iprip service exhibits the same distribution: one row has a high incidence and two rows that are uncommon. The uncommon rows are worth investigating further for different reasons. The first has an odd spelling of the Service DLL name and the other has the DLL in a temp directory -- a common location where attackers place malicious service DLLs. It is easier for malware authors to drop files into these directories than the more protected directories, such as system32.
It is important to note here that just because something is not like the others doesn't automatically mean that it is evil. For example, it is common to find a few services that are located in C:winnt. Windows NT 4 systems are not commonly found these days but are still legitimate installs. Therefore, services on NT systems commonly show up in the unique rows because they are rare in 2012, but that doesn't make them malicious. Uniqueness provides a starting point for more thorough investigation and that is one of the main purposes of data stacking.
Consider the following data stack:
The psexec service listed above, in and of itself, is not malicious, but is frequently used by attackers to execute tools on systems throughout an enterprise. However, depending on the network administration practices, the presence of this service may or may not be a red flag. It might be that the red flag in this example is the presence of three different versions. Or perhaps that is expected due to other administrators using different versions. Determining the concern level at this point requires further investigation.
Another services audit where we are concentrating on the Service DLLs:
Here we see a listing of services named "bits". Looking at the table above we see that each row has a unique MD5. Using a feature in MIR, the agent tries to verify the signature of the file associated with the service. Signatures are used to ensure that a file is the same as what the original author created. If a single bit in the file changes so will the signature, and therefore it would not match a previously known good signature. Shown above, we see one service where the agent could not verify the signature. This could mean several things: that the file was modified, that the file does not have a signature, or that the MIR agent was not able to properly verify it. Either way, this is worth further investigation.
In the situation where the MIR agent is unable to properly verify a signature (for whatever reason: an error, lack of internet connectivity and so on) you may find other rows that have the same MD5 and a verified signature. That means you can ignore the fact that a signature didn't verify correctly for one row because another row has the exact same file that did verify.
We have shown you just a few examples of the usefulness and data reduction capabilities of performing data stacking. At Mandiant we will stack and slice and analyze the services data set several times using different sets of attributes: analyzing services as a whole; and then, service paths separately; then, service DLLs and associated MD5s separately. The process is more art than science and there is no universal procedure. Generally, it depends a lot on the environment and the attacker, but practice makes perfect.
At Mandiant, we do not limit our data stacking analysis to service audits. We stack many different types of audits in an effort to discover all aspects of a compromise. For example, we have stacked particular registry keys (think runkeys and shim cache entries), file data from file listing audits (careful! too much data alert!), user accounts and even Master Boot Records (MBR) from physical disks to identify rootkits.
In all instances, the goal is the same: group and reduce the data set to identify the outliers and make manual analysis of an entire environment feasible.