Deep Learning In Cybersecurity

In previous blogs in our Deep Learning Series, we addressed deep learning, how it differs from machine learning, and how it actually works. In this blog we discuss the application of deep learning in cybersecurity.

As a reminder, deep learning is “a family of methods within machine learning that uses available data to learn a hierarchy of representations useful for certain tasks.” While in traditional machine learning a lot of human expert effort is needed to define the set of features to represent the data, there is no feature engineering involved in deep learning. The system learns the best representation of the data by itself to produce the most accurate results.

Artificial intelligence has been touted for years as a feature in security products. However, the reality is that most security products that claim AI are either (1) using statistical analysis to prioritize data or (2) using traditional machine learning to optimize tasks.

For example, some network traffic analytics vendors use machine learning to baseline “normal traffic” and correspondingly flag anomalies that are in need of attention. The problem with this approach is that anomalies are common in network traffic, and not all anomalies are a threat indicator. This can lead to organizations being overwhelmed by false positives, that can run security operations teams ragged.

In fact, deep learning is an ideal technology to address the cybersecurity challenges we’re facing today because of:

  • Complex decision boundaries – protocols and payloads are complex in their variety and structure. Deep learning can make sense of the complexities of threats and identify all types of threats if trained correctly.
  • Large training sets – Enormous threat data sets with hundreds of millions of samples are already available. This is comparable or larger than the size of training set in many popular computer vision applications.
  • GPUs – Recent advancements in processing and  the lowering of the costs of the underlying technology have made it possible for deep learning model training and validation to be performed in hours, or even minutes when it used to take weeks.

But where’s the best place to apply deep learning? Let’s compare and contrast the various places in the network we can apply deep learning as shown in the graphic below:  

  • Deep Learning on endpoint traffic – Deep learning applied on endpoint traffic has access not only to the payload but also runtime behavior of the specific malware. However, the challenge is that there are very limited processing and memory resources on the endpoint to meet the computational requirements of a deep learning system. In addition, the visibility of threats is limited to one specific endpoint, that may be influenced by the actions and responsibilities of the user on that endpoint.
  • Deep Learning on SIEM traffic  – Security Incident and Event Management Systems (SIEM) collates logs and alerts from a variety of network and security devices in the network. This means that applying deep learning to SIEM traffic has the benefit of lots of interesting data across the enterprise, but the detection is significantly delayed by the time it gets to the SIEM. Additionally, due to the myriad of data available, scattered data points may lead to unclear threat verdicts.
  • Deep Learning on network traffic – Network traffic is immensely rich and interesting. Applying deep learning to network payloads and headers brings complexity because of the variety in the structure, but there are multiple ways to identify malicious intent. Architected correctly, resources for advanced AI models can be available on network security appliances, and models can be easily updated when needed. Additionally, applying deep learning to network traffic at the perimeter of the enterprise brings the benefit of stopping the threat closest to the source of entry before it has the opportunity to move laterally through the enterprise.

Applying deep learning on network traffic would therefore seem to be the most attractive application for network security. There are important considerations such as careful curation of threat data to enable proper training of the deep learning models. In addition, threat verdicts must include appropriate security details such as threat family and indicators of compromise to ensure threat analysts have information for further analysis. Threat alerts provided must be accurate and actionable as every false positive costs time and money.

Here are the key metrics that a network threat protection solution (harnessing deep learning or otherwise) must deliver on:

  • Detection speed – threat detection must be performed quickly, ideally in less than a second, in order to keep up with the speed that attackers are launching attacks.
  • Accuracy and reliability – a network threat protection solution must be able to detect threats with extremely high accuracy at greater than 99.5% detection rates, and less than .5% false positive.
  • Known and unknown threats – the threat landscape continues to evolve, therefore deep learning models must be trained correctly to not only address existing threats, variants of current threats but also new threats as cyberattackers evolve.
  • Orchestrate prevention – Once a threat is detected, the platform must orchestrate prevention to stop further lateral movement and communications by the attacker (such as C2 communications).
  • Performance – the network threat protection cannot impact the latency of a 10G network.