Relational Learning Tutorial

At FireEye, we apply machine learning techniques to a variety of security problems. Malware detection and categorization is a great use of the technology, and we believe that it can also play a role in security challenges that extend beyond malware.

In one such R&D effort, the Innovation & Custom Engineering (ICE) team is utilizing machine learning to build statistical models of relationships between entities. These models can then be used to “connect the dots” by making predictions about relationships we haven’t observed, or to spot anomalous relationships that don’t fit expected patterns.

The most popular application of these algorithms is probably item recommendation, where they are used to personalize our consumer experiences in today’s online marketplace. There are also many important security applications, such as analyzing relationships between threat actors, TTPs, and their targets. Another is detecting attacker activity by modeling relationships between users, machines, applications, and network connections.

Figure 1 shows an example of how these algorithms can also be used for clustering and visualization. This particular model has automatically clustered several thousand machines into groups of similar function, just by observing internal network connection behavior.

Figure 1. Visualization of model-based clustering when trained on network connection relationships

We have created a tutorial that steps through building several types of relational learning models using Python and Google’s new machine learning framework, TensorFlow.  The target audience is machine learning researchers and practitioners, but security professionals who like to channel their inner “data nerd” may also find it interesting!

The tutorial and accompanying code is available as a Jupyter Notebook here.