Blue Hexagon Blog

Deep Learning Analysis of Iranian “Dustman” Malware Targeting Bahrain Oil Company Bapco

Additional Blue Hexagon Labs author: Joseph Nicolls, Rohit Kashibatla

(This is Part I of a series of blogs into threats targeting/originating in the Middle East)

As I first alerted on January 7th on LinkedIn, a new variant of “ZeroCleare” called “Dustman” was called out by The National Cyber Security Centre (NCSC) الهيئة الوطنية للأمن السيبراني. On January 8th, we followed up with the news that the target victim was Bahrain Oil Company Babco

Overview

Here’s a summary of what occurred based on the NCSC report along with our analysis: 

  • The initial infiltration occurred in July  2019, based on the exploitation of a vulnerability on a VPN server. 
  • Once the attacker gained foothold on the VPN server, they were able to access administrative and service accounts to gain access to the anti-virus management console
  • Using the anti-virus management console, the attacker then distributed the “Dustman” malware across the organization’s network. Then VPN detailed logs were removed to cover their tracks. 
  • The actual malware execution didn’t occur right away. It took place months later on December 29th 2019. The malware was remotely executed on this date. What was the significance of this date? On December 29th, US had retaliated with airstrikes in Syria and Iraq after an attack by Kataib Hezbollah on an American military base.  (An American contractor was killed in this K1 military base attack near the Iraqi city of Kirkuk. Kataib Hezbollah has ties to Iran.) 

However, the NCSC report interestingly enough did not provide details about where the attack occurred. Early speculation was that Saudi Arabia had been attacked again. In fact, the Dustman attack actually targeted Bapco, the national oil company in the neighboring Kingdom of Bahrain, a sovereign state in the Persian Gulf. 

But it is no surprise that the Saudia Arabia NCSC looked into the issue as:

  • The Saudis have experience dealing with previous attacks from this family.
  • Blue Hexagon Labs found anti Saudia sentiment in the code as seen in Figure 1 below (“Down with Bin Salman”) . Mohammad bin Salman bin Abdulaziz Al Saud, colloquially known as MbS, is the Crown Prince of Saudi Arabia. 

Dustman malware anti Saudi sentiment
Dustman malware anti Saudi sentiment
Dustman malware anti Saudi sentiment

Deep Learning Analysis  on Dustman 

In this blog, we wanted to take a different approach at looking at the attack and using Data Science to approach the problem. As always, in attacks like these, every minute counts when it comes to identifying the attack. 

Based on our deep learning analysis, the “Dustman”malware is similar  to the Shamoon (also known as Disttrack) malware used in the Saudia Aramco attack, and the ZeroCleare attack discovered in September 2019. 

But first, a little explanation on how we can use deep learning and mathematical representation techniques, e.g., PCA, to investigate the similarity among the samples from Dustman, Shamoon and Zero Cleare.  In other words, using the trained deep learning models, we are able to show that these malware samples are basically from the same family and hence the attacks are correlated and likely from the same actors.   

Principal Component Analysis Representation

Deep learning allows both detection and categorization of malware samples. When processing millions of samples, a deep learning model develops decision boundaries and weights which clusters seemingly disparate files. This processed view of a sample is called an embedding. In other words, the trained deep learning model projects the input samples to a new multi-dimensional space, which is more suitable to separate malicious samples from benign samples and also group similar malwares together . Utilizing principal component analysis (PCA), we can visualize the embeddings generated from different malware variants by collapsing the multi-dimensional view of a deep learning model into three-dimensional space, with highly correlated dimensions sharing an axis. This technique yields both visualization into a deep learning model’s perspective of a malware and relationship inferences among different samples. 

PCA Graph of Dustman, Shamoon and Zero Cleare samples vs other malware
Figure 1: PCA Graph of Dustman, Shamoon and Zero Cleare Samples Vs Other Malware

The distance between samples in a PCA is directly correlated with similarity. The farther samples are from each other in a PCA graph, they are perceived to be more distinct from the model perspective. Conversely, the closer samples are from each other in a PCA, they are expected to be more similar based on the samples that are used to train the deep learning model.

Figure 1  shows the PCA representation of the Dustman, Shamoon and Zero Cleare samples (i.e., the blue dots)  as compared to samples from different malware families (i.e., the orange dots). Making use of the PCA technique, we see that the three malware variants are highly correlated with each other and highly distinct from other malware samples that are from different malware families depicted in the graph. This implies that these malware variants are related and likely designed and used by the same actor. 

This analysis depicts how a deep learning model can learn the proper representation of the malware samples such that it can not only detect the malicious samples from benign samples but it could also associate different variants of the same family even if the new variants have not been seen in the training data. 

Why is this important? 

Note that the PCA graphic in Figure 1 is just our way of showing you how the new malware samples are similar or different from others. The Blue Hexagon deep learning models actually make this determination in less than a second. Why is this important? Instead of waiting for the results of sandbox analysis or waiting for a human threat analyst to execute and analyze the malware, the threat detection verdict and categorization of threat family is performed in real time. This allows security teams to then focus their efforts on the appropriate response and remediation, as well as answering the question of “why” an organization is being targeted. 

Considering the fact that as an industry, we see more than 350,000 new malware variants every single day, the ability to automate our detection and categorization of malware is immensely useful to security teams. 

This is the first of a series of blogs on the Middle East threats.