Defined as the “ability for (computers) to learn without being explicitly programmed,” machine learning is huge news for the information security industry. It’s a technology that potentially can help security analysts with everything from malware and log analysis to possibly identifying and closing vulnerabilities earlier. Perhaps too, it could improve endpoint security, automate repetitive tasks, and even reduce the likelihood of attacks resulting in data exfiltration.
“With the fast-moving explosion of data and apps, there is really no other way to do security than through the use of automated systems built on AI to analyze the network traffic and user interactions.”
The problem is, hackers know this and are expected to build their own AI and machine learning tools to launch attacks.
How are cyber-criminals using machine learning?
Criminals - increasing organized and offering wide-ranging services on the dark web - are ultimately innovating faster than security defenses can keep up. This is concerning given the untapped potential of technologies like machine and deep learning.
“We must recognize that although technologies such as machine learning, deep learning, and AI will be cornerstones of tomorrow’s cyber defenses, our adversaries are working just as furiously to implement and innovate around them,” said Steve Grobman, chief technology officer at McAfee, in recent comments to the media. “As is so often the case in cybersecurity, human intelligence amplified by technology will be the winning factor in the arms race between attackers and defenders.”
This has naturally led to fears that this is AI vs AI, Terminator style. Nick Savvides, CTO at Symantec, says this is “the first year where we will see AI versus AI in a cybersecurity context,” with attackers more able to effectively explore compromised networks, and this clearly puts the onus on security vendors to build more automated and intelligent solutions.
“Autonomous response is the future of cybersecurity,” stressed Darktrace’s director of technology Dave Palmer in conversation with this writer late last year. “Algorithms that can take intelligent and targeted remedial action, slowing down or even stopping in-progress attacks, while still allowing normal business activity to continue as usual.”
Machine learning-based attacks in the wild may remain largely unheard of at this time, but some techniques are already being leveraged by criminal groups.
1. Increasingly evasive malware
Malware creation is largely a manual process for cyber criminals. They write scripts to make up computer viruses and trojans, and leverage rootkits, password scrapers and other tools to aid distribution and execution.
But what if they could speed up this process? Is there a way machine learning could be help create malware?
The first known example of using machine learning for malware creation was presented in 2017 in a paper entitled “Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN.” In the report, the authors revealed how they built a generative adversarial network (GAN) based algorithm to generate adversarial malware samples that, critically, were able to bypass machine-learning-based detection systems.
In another example, at the 2017 DEFCON conference, security company Endgame revealed how it created customized malware using Elon Musk's OpenAI framework to create malware that security engines were unable to detect. Endgame's research was based on taking binaries that appeared to be malicious, and by changing a few parts, that code would appear benign and trustworthy to the antivirus engines.
Other researchers, meanwhile, have predicted machine learning could ultimately be used to “modify code on the fly based on how and what has been detected in the lab,” an extension on polymorphic malware.
2. Smart botnets for scalable attacks
Fortinet believes that 2018 will be the year of self-learning ‘hivenets’ and ‘swarmbots’, in essence marking the belief that ‘intelligent’ IoT devices can be commanded to attack vulnerable systems at scale. “They will be capable of talking to each other and taking action based off of local intelligence that is shared,” said Derek Manky, global security strategist, Fortinet. “In addition, zombies will become smart, acting on commands without the botnet herder instructing them to do so. As a result, hivenets will be able to grow exponentially as swarms, widening their ability to simultaneously attack multiple victims and significantly impede mitigation and response.”
Interestingly, Manky says these attacks are not yet using swarm technology, which could enable these hivenets to self-learn from their past behavior. A subfield of AI, swarm technology is defined as the “collective behavior of decentralized, self-organized systems, natural or artificial” and is today already used in drones and fledgling robotics devices. (Editor’s note: Though futuristic fiction, some can draw conclusions from the criminal possibilities of swarm technology from Black Mirror’s Hated in The Nation, where thousands of automated bees are compromised for surveillance and physical attacks.)
3. Advanced spear phishing emails get smarter
One of the more obvious applications of adversarial machine learning is using algorithms like text-to-speech, speech recognition, and natural language processing (NLP) for smarter social engineering. After all, through recurring neural networks, you can already teach such software writing styles, so in theory phishing emails could become more sophisticated and believable.
In particular, machine learning could facilitate advanced spear phishing emails to be targeted at high-profile figures, while automating the process as a whole. Systems could be trained on genuine emails and learn to make something that looks and read convincing.
In McAfee Labs’ predictions for 2017, the firm said that criminals would increasingly look to use machine learning to analyze massive quantities of stolen records to identify potential victims and build contextually detailed emails that would very effectively target these individuals.
Furthermore, at Black Hat USA 2016, John Seymour and Philip Tully presented a paper titled “Weaponizing data science for social engineering: Automated E2E spear phishing on Twitter,” which presented a recurrent neural network learning to tweet phishing posts to target certain users. In the paper, the pair presented that the SNAP_R neural network, which was trained on spear phishing pentesting data, was dynamically seeded with topics taken from the timeline posts of target users (as well as the users they tweet or follow) to make the click-through more likely.
Subsequently, the system was remarkably effective. In tests involving 90 users, the framework delivered a success rate varying between 30 and 60 percent, a considerable improvement on manual spear phishing and bulk phishing results.
4. Threat intelligence goes haywire
Threat intelligence is arguably a mixed blessing when it comes to machine learning. On the one hand, it is universally accepted that, in an age of false positives, machine learning systems will help analysts to identify the real threats coming from multiple systems. “Applying machine learning delivers two significant gains in the domain of threat intelligence,” said Recorded Future CTO and co-founder Staffan Truvé in a recent whitepaper.
“First, the processing and structuring of such huge volumes of data, including analysis of the complex relationships within it, is a problem almost impossible to address with manpower alone. Augmenting the machine with a reasonably capable human, means you’re more effectively armed than ever to reveal and respond to emerging threats,” Truvé wrote. “The second is automation — taking all these tasks, which we as humans can perform without a problem, and using the technology to scale up to a much larger volume we could ever handle.”
However, there’s the belief, too, that criminals will adapt to simply overload those alerts once more. McAfee’s Grobman previously pointed to a technique known as “raising the noise floor.” A hacker will use this technique to bombard an environment in a way to generate a lot of false positives to common machine learning models. Once a target recalibrates its system to filter out the false alarms, the attacker can launch a real attack that can get by the machine learning system.
5. Unauthorized access
An early example of machine learning for security attacks was published back in 2012, by researchers Claudia Cruz, Fernando Uceda, and Leobardo Reyes. They used support vector machines (SVM) to break a system running on reCAPTCHA images with an accuracy of 82 percent. All captcha mechanisms were subsequently improved, only for the researchers to use deep learning to break the CAPTCHA once more. In 2016, an article was published that detailed how to break simple-captcha with 92 percent accuracy using deep learning.
Separately, the “I am Robot” research at last year’s BlackHat revealed how researchers broke the latest semantic image CAPTCHA and compared various machine learning algorithms. The paper promised a 98 percent accuracy on breaking Google’s reCAPTCHA.
6. Poisoning the machine learning engine
A far simpler, yet effective, technique is that the machine learning engine used to detect malware could be poisoned, rendering it ineffective, much like criminals have done with antivirus engines in the past. It sounds simple enough; the machine learning model learns from input data, if that data pool is poisoned, then the output is also poisoned. Researchers from New York University demonstrated how convolutional neural networks (CNNs) could be backdoored to produce these false (but controlled) results through CNNs like Google, Microsoft, and AWS.