Machine learning: A new cyber security weapon, for good and ill

Machine learning may be able to boost information security, but it can also be potentially employed by cyber criminals

Comments

There’s a new weapon in the never-ending battle against cyber crime: Machine learning. It’s generating a great deal of interest, getting substantial backing from venture capitalists and even being billed as a must-have addition to the cyber-security arsenal.

The idea of machine learning has been around for many years but today’s realisations differ radically from earlier technologies. In simple terms machine learning software has no programmed-in knowledge about the domain to which it is to be applied but gains that knowledge by being taught.

For example if you want to develop a machine-learning based optical character recognition system you feed it with many images of letters of the alphabet and you tell it “this is an ‘A’ this is a ‘B’” and so on. Given sufficient data correctly identified, it will get very good at optical character recognition.

This is the concept underpinning recent cyber security startup Cylance, as Asia Pacific regional director, Andy Solterbeck, explained to Computerworld in April. He said Cylance had developed its malware detection algorithm by examining every possible file type it could get its hands on and applying machine learning techniques.

“Each file has a few thousand useful attributes and each of those attributes has a bunch of different settings: that adds up to millions of different combinations. What we have done is develop an algorithm that can look at those attributes and say if a file is good or bad.”

This algorithm is now the core of Cylance’s endpoint security product. It claims a 99 per cent success rate for malware detection with very few false positives, and investors love it. It was founded in 2012 and in June this year closed its fourth round of funding of US$100m that valued the company at just under US$1 billion.

Darktrace: inspired by immunology

UK-based Darktrace also claims to have been founded to exploit machine learning. It’s a year younger than Cylance and has achieved about half the value: It closed a US$65m funding round in July 2016 that reportedly valued the company at about US$400m.

Darktrace claims its self-learning approach has been “inspired by the biological principles of the human immune system, identifying never-seen-before anomalies in real time, including insider threats and sophisticated attackers - without using rules, signatures or assumptions.”

Modesty is not the company’s strong point. It claims to be “the only technology capable of detecting and responding to emerging cyber-threats, from within the network,” and that its self-learning software has been “recognised as the de facto standard for defending organisations of all sizes from constantly-evolving threats.”

Darktrace announced Telstra as a customer in February, saying that the telco had decided to deploy the Darktrace Enterprise Immune System across its enterprise network “because of its unique capability to spot emerging abnormal behaviours in real time within the organisation.”

Established security vendor Symantec has also embraced machine learning to boost the power of its endpoint protection offering.

The latest version (14) of Symantec Endpoint Protection was announced in November and billed as “the industry’s first solution to fuse essential endpoint technologies with advanced machine learning and memory exploit mitigation in a single agent.”

The company says its machine learning technology is underpinned by its established Global Intelligence Network and “Provides a depth of knowledge no one else can claim.”

According to Nick Savvides, manager of Symantec cyber security strategy for Asia Pacific and Japan, the company has taken technology that was already being used in its cloud services and incorporated it into the latest version of the endpoint.

Symantec ML technology learns on the job

“We have built machine learning that continuously relearns and retrains itself on your host,” Savvides said. “It runs pre-execution to examine a file and say ‘that does not smell right. I have seen this type of stuff before. This file has these IPs that are bad. It has some certificates that I know were compromised’.”

Savvides claims Symantec has the edge on the competition because of the amount of data available to train its algorithms: “We have the largest amount of telemetry and security information and we have trained the machine learning algorithms based on those and put them on the endpoints,” he said. “What this means is that we can go singatureless. Our software runs the files through its own algorithms and goes ‘I think this is probably bad’ and blocks it.”

Machine learning is only one of the protection techniques deployed in the Symantec offering: The company has not ditched any legacy functions in the long-established product.

Its endpoint machine learning software continues to learn ‘on the job’. “It learns on the run but what it learns it sends back to us and we benefit from that and incorporate it into future versions,” Savvides said.

However he made no claims that machine learning was a new magic bullet for security. “Machine learning is about predicting that something will be bad, but not all predictions are correct. I am not going to tell you that it will pick up everything. That would be fantasy.”

Buzzword, not magic bullet

Another cyber security startup exploiting machine learning is CrowdStrike. Machine learning is one of several technologies employed in its Falcon Host endpoint protection offering. The company claims it uses “Sophisticated machine learning and behavioural analytics [that] go beyond signatures to identify anomalies and distinguish malicious activity from legitimate actions with unprecedented scale and precision.”

Michael Sentonas, vice president of technology strategy for CrowdStrike Asia Pacific, told Computerworld that machine learning was only one component of the company’s security offering and he believes its usefulness has been exaggerated.

“Machine learning has become something of a buzzword, heading down the path of being commoditised,” he said. “Everybody is claiming it as a differentiator in their marketing blub. The people who focus on it tend to over hype it and those without it play it down.

“For us it is not the be-all-and-end-all. It is extremely powerful and offers significant advantages but it is not a silver bullet that will solve every security problem. We use as part of our detection and protection techniques, but we also like to look at the effects of what an attacker is trying to accomplish.”

He claimed CrowdStrike to be unique, and well suited to using machine learning. “We combine the next generation endpoint/AV component with managed hunting to deliver as a cloud based service. We are indexing over 26 billion events every 24 hours and that is growing by two billion every four weeks. For machine learning that data set is critical and with it becoming so large it gives us a significant advantage.”

Jack Chan, an NZ-based network and security strategist with Fortinet, agrees that the use of machine learning in cyber security is somewhat overblown saying: “It’s a bit of a buzzword.”

Fortinet does not employ machine learning directly in customer products, but Chan says it is now essential to support the long-established practice of identifying new viruses and writing signatures. He said machine learning was used in Fortinet’s patented Content Pattern Recognition Language, used to identify new threats.

“On a good day we might see 200k variants of viruses. There is no way we can hire enough analysts to write signatures so we are using machine learning to generate signatures,” he told Computerworld.

Technology however is just as easily deployed for ill as for good, and Patrick Hubbard, director of technical product marketing at SolarWinds — who prefers to be known as ‘Head Geek’ — sees it being called into service by cyber criminals.

‘The economy of scale that is offered by cloud, the economy of scale offered by automation means that sooner or later it will be easier to have a really talented machine attack a network rather than a large team of people,” he told Computerworld.

Machine learning making malware

“If you have a team of humans they can only try a certain amount of attacks in any given time, and they can only be engaged for a limited about of time,” Hubbard said. “If it is a machine it does not get tired and it tries things that would be drudgery for humans.

“This means it is far more likely to find something. It is much more patient and it gets smarter the more it learns. It will become more valuable the more attacks it makes.”

He suggested that the workload for such a project could to run undetected on public cloud services, or distributed across multiple compromised devices.

He was doubtful that any such attack had yet been launched but said it would be very different from a manually driven attack. “It will look novel, unlike anything a human would try.It will be a combination of attacks, a lot of things simultaneously that will be difficult to correlate. That will become the signature of a machine attack, a lot of different things that don’t make any sense.”

While the type of attack envisaged my Hubbard might be yet to appear, Fortinet’s Chan says cyber criminals are already using machine learning to generate multiple versions of malware.

“They can use machine learning to generate unique malware that a signature based detection system cannot detect,” he said. “In some of our daily stats we can see a couple of hundred examples of malware that traditional antivirus would not pick up, but that could be picked up by sandboxing, which looks at the behaviour of the malware. “We also see a lot of the phishing and drive-by attacks with very long URLs that you have never seen before.