Blue Hexagon’s deep learning tools succeed quickly, whereas others fail even after long periods of malware activity. Indeed, one piece of ELF malware was undetected for more than six months in the VirusTotal database despite repeated scans, but Blue Hexagon’s deep learning methods picked it up instantly.
– Dr. Saumitra Das, CTO, Blue Hexagon
Introduction
Hackers and cybercriminals are constantly finding new ways to infiltrate corporate networks and cloud. The massive amounts of data that flows through even the simplest enterprise deployment effectively allow attackers to hide in plain sight while they exfiltrate sensitive company information and spread their attack tendrils deep and wide laterally throughout the company. Cybersecurity professionals and tools must evolve quickly to counteract these increasingly sophisticated attacks.
Machine learning, a subset of artificial intelligence focused on training algorithms to perform better, has emerged as a powerful tool in the fight against cybercrime. Advanced cybersecurity tools apply machine learning to more quickly and accurately identify anomalous behavior that indicates a potential attack.
Within the discipline of machine learning, deep learning shows tremendous promise for helping build highly accurate threat identification and remediation tools. This blog provides an overview of deep learning and its use in cybersecurity.
AI vs Machine Learning vs Deep Learning
It is important to understand that deep learning is not distinct from machine learning any more than machine learning is distinct from AI. Instead, the three form a hierarchy, with AI as the broadest principle and deep learning as more specific.
Artificial intelligence involves replicating processes associated with human intelligence using computers and computational methods. AI and its various specific sub-disciplines are useful in pattern analysis, language processing, decision-making, and more. Essentially, if a human brain can do it, AI should also be able to do it.
Machine learning is an AI subset focused on developing techniques to learn from available data, in mathematical form, and harness the available computational power to make decisions, improve efficiency and expand model generalization beyond the available data. Machine learning methods are particularly useful for identifying patterns and anomalies in large data sets.
Deep learning takes machine learning and AI to the next level. Deep learning follows the same principles of human thought process i.e., the brain itself. Its architecture is primarily based on neural networks which enables the technique to thrive on enormous amounts of data available, harness the available processing power to the limit and approximate very complex functions to perform complicated tasks that are far beyond trained individuals.
Figure 1: The hierarchy of artificial intelligence
Of course, this is a simplified generalization of deep learning that begs further definition. To quote Yoshua Bengio, one of the leading scholars in deep learning theory:
Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features.
What does this mean in practice? It means that deep learning algorithms can build increasingly sophisticated layers of understanding by starting with simple concepts and refining them. The layers provide a sense of depth, which is why this methodology is known as deep learning.
Consider, for example, a common application for machine learning methods: image processing. The lower layers recognize a general shape outline within the image. Higher layers generate ever greater comprehension of the image, identifying contours within the shape. Even higher layers resolve the contours into eyes, a nose, a mouth, ears, etc.
But there is another sense in which this subset of machine learning is deep, and that is in its use of multi-layered applied neural networks (ANN).
Understanding artificial neural networks
Neural networks are far more complex than traditional machine learning algorithms like decision trees or linear regression. While the mathematics underlying neural networks is quite complex, several simple key features make neural networks interesting for cybersecurity applications. One of the most significant is that the more data they have, the better they will perform – a crucial benefit when analyzing information data and traffic. In contrast, more traditional machine learning methods have an initial surge in performance as the process input data, but performance plateaus as the amount of input data builds.
Just as importantly, deep learning methods help reduce the amount of human intervention needed, making them faster and more efficient. Let’s return to the image processing example. A human component is part of the workflow in a traditional machine learning application. For example, a software engineer or data scientist may select specific features and create classifiers that help the machine learning algorithm identify the various contours in the image.
Deep learning methods, on the other hand, eliminate this manual step. The neural network layers allow the algorithm to perform feature extraction on its own and use it to classify features in the image.
Figure 2: Deep learning minimizes human intervention
(https://www.softwaretestinghelp.com/data-mining-vs-machine-learning-vs-ai/)
Implications of deep learning for cybersecurity
The deep learning model’s ability to learn from large data sets, coupled with its ability to operate almost independently without human assistance, theoretically makes it uniquely suited for cybersecurity applications. Indeed, deep learning methods can improve performance in several key metrics that are crucial to effective protection against threats, including:
- Speed of threat detection – effective threat detection requires identification in fraction of a second, based on processing massive amounts of data. Both latency and throughput are important, especially in the cloud with the number of transactions that need to be analyzed.
- Accuracy of threat detection – crucial to maximizing overall detection and minimizing the number of false positives and false negatives. False positives undermine trust in the system and lead to the system not being utilized
- Scope of coverage – constant evolution of threats requires the ability to identify currently unknown attacks and generalize to as many future variations as possible without needing retraining
Applying deep learning techniques to cybersecurity should be part of the business strategy of all organizations in the future. And it is easier to do so now than ever before. Advances in computer hardware, specifically the graphical processing units (GPUs) typically used for running deep learning models, are also making deep learning methods faster and more economical. Training that used to require weeks now takes only hours and minutes.
Empirical field data prove the superiority of deep learning over more traditional machine learning methods. For example, Blue Hexagon analyzed data traffic for a specific financial institution client and compared the results of its deep learning threat detection methods to results from other vendors analyzing the same data set.
As shown in the graphic below, in the supplied traffic data, 68% of the samples were not present on VirusTotal or other existing threat intelligence databases at the time of detection test. Vendor identification of threats was sporadic, with 64% of samples being detected by fewer than 20 vendors.
In contrast, Blue Hexagon’s deep learning threat identification processes detected every single threat in the data set, and identification took only milliseconds. And given that this example is representative of data that many institutions see daily, you can easily see the benefits that using deep learning will bring to your cybersecurity efforts.
Specific applications of deep learning to threat identification
Blue Hexagon has observed the unique efficacy of deep learning for several different categories types of threats.
Protocol threats
In December 2021, attackers launched millions of attempts to exploit the Log4Shell vulnerability in the Log4j 2 Java library. Initially identified by attacks on Minecraft servers through the chat function in Minecraft, log4j can affect a wide range of applications and devices.
Log4j is protocol-agnostic, and Blue Hexagon has determined that attackers will attempt to run log4j exploits over multiple protocols. Existing security tools have been mostly ineffective at identifying log4j exploits. For instance, only 3 out of 59 vendors successfully identified one piece of ELF malware inserted after exploiting the log4j vulnerability. Blue Hexagon’s deep learning tools not only caught this exploit but many others associated with log4j.
Blue Hexagon’s deep learning models can help with both detections of log4j exploits and identifying post-exploit behaviors (e.g. reconnaissance, brute forcing and lateral movement).
Command-and-Control and Encrypted traffic
Command and Control detection is present in almost every attack regardless of how the Initial Access happened. Such traffic can happen over common North-South protocols such as HTTP, DNS and HTTPS. Blue Hexagon’s deep learning models are trained on the network activity behavior of millions of attacks and can identify such command-and-control without having to rely on known threat intelligence like IPs, domains and certificate hashes.
Encrypted network traffic poses a challenge for many organizations. Decryption is often impossible without diminishing network performance or is complicated to deploy by installing trusted certificates for every entity, and existing methods are frequently ineffective in picking up encrypted threats. Nonetheless, threats are frequently present in encrypted traffic and it is crucial to identify the threats before they create malicious activity. Blue Hexagon’s deep learning methods allow companies to inspect encrypted traffic in real time, identifying threats without impacting network performance.
Beaconing detection is another key component of detecting infections that have become resident due to the log4j exploit. Using deep learning and signal processing techniques helps find the “low and slow” beacons used for command and control by experienced attackers.
Cloud Security
With more organizations expanding their cloud footprint daily, cloud security has become crucial. Unfortunately, it is also an area where many organizations are failing due to the new frameworks in the public cloud and the skill shortage in cloud Ops and DevOps. Deep learning can be applied in different stages of the cloud application lifecycle of “Build-Ship-Run” to reduce the risk of code vulnerabilities and exploit as well as defend against active runtime stage attacks.
CI/CD and DevOps: Deep learning can be used to defend against supply chain attacks on code being deployed in cloud environments by detecting the presence of malware such as backdoors and remote access trojans in containers prior to pushing into production.
Runtime: Deep learning methods can effectively analyze metadata and files from deep packet inspection in the cloud, cloud storage as well as from cloud activity logs to pinpoint attempts to infiltrate cloud services from the outside that other methods will likely miss.
Deep learning is particularly useful for picking up attempts to inject malicious code into cloud systems. Linux-based ELF code, which is increasingly used for cloud attacks, is incredibly difficult to identify. Traditional signature and sandbox-based methods, as well as shallow machine learning methods, have been unsuccessful in picking up ELF malware. In addition, there is a lack of effective sandboxes for addressing ELF malware and the scale to deal with the millions of files that need to be inspected each day in the cloud between the VMs and containers being deployed.
Again, Blue Hexagon’s deep learning tools succeed quickly, whereas others fail even after long periods of malware activity. Indeed, one piece of ELF malware was undetected for more than six months in the VirusTotal database despite repeated scans, but Blue Hexagon’s deep learning methods picked it up instantly.
Conclusion
Blue Hexagon has proven that deep learning is incredibly effective in rapid, reliable cyber threat identification, both for known and unknown exploits. If you are not already using deep learning as part of your cybersecurity program, you should make the shift as soon as possible. Let me know how I can help or contact us.
Blue Hexagon also offers a courtesy trial or security assessment custom-designed to your cloud deployments.