Using AI to spot malware patterns

Traditional antivirus can't keep pace with today's threats. Here's how one start-up is using machine learning to fight a better fight.

Protecting an organization's data, people and applications has never been more critical or more difficult.  The number of entry points and connected endpoints has skyrocketed and the bad guys keep getting smarter.  Clearly, something has to change. 

One of the more innovative security start ups is Cylance, which is using artificial intelligence (AI) to change the security game.  CSO recently interviewed Cylance founder and chief scientist Ryan Permeh to understand better how AI can impact security. 

You came from McAfee and Cylance is disrupting the legacy antivirus (AV) model.  What were the challenges you saw at your previous company and what motivated you to start Cylance?

Ryan Permeh Cylance

Ryan Permeh, chief scientist at Cylance

AV was a great technology decades ago, but the threat actors have evolved and now it doesn’t catch enough.  Today there are more threats coming in from more places and AV solutions miss many things.  The technology is built to be reactive in nature, as it could only catch things that were known.  The way to break AV is to up the number and variety of viruses and the systems would fall exponentially further behind because the bad guys have a much lower cost of operations than the good guys.  Also, the timelines for delivering new AV updates is typically days, far too long for today’s businesses.  It became apparent that the game needs to change, which is why we started Cylance.

How does Cylance solve the problem differently?

We use machine learning to fight a better fight.  Humans are inherently inefficient, as they have no ability to connect seemingly unrelated dots.  For example, insurance companies can predict the future correctly by asking 20 questions and analyzing the data using machine learning.  It’s a similar concept just with much more data.  Historically, an AV researcher might see 10,000 viruses in a career.  Today there are over 700,000 per day.  Machine learning is the only way to combat today’s attackers.

Why hasn’t this been done before? 

We actually couldn’t have done this without some of the recent advancements in the cloud, machine learning and GPU technology.  The models we created would have taken months to run on traditional servers.  We took the millions of data points and used them to create our models.  This process took two weeks in Amazon using 40,000 cores.  There is no way this could have been done pre-cloud but it can be done now.

How does machine learning help find threats?  

The dirty little secret of security is that the bad guys have a hard time being original.  Most just take existing malware and shift something a little bit to evade the current solutions.  Each type of malware leaves behind a signature, or fingerprint, so if one can collect the data and analyze it, the causal good and causal bad patterns can be found.  Our models run on commodity hardware and can identify malicious programs with an accuracy of 99.7 percent.

Can you give some examples of “causal bad” patterns?

Our model is actually a neural network of over 7 million features that has identified several characteristics of the causal bad.  Because of the popularity of Microsoft, bad guys tend to like to make things look like Office documents so 60 percent of the causal bad can be found by looking at file sizes, what is in the file, etc.  Also, generally speaking files of over 2.5 MB in size are good.  This may seem basic but it’s the majority of malware.  It’s important to note that there are many factors that need to be analyzed and there isn’t a simple “this is bad” calculation that can be done, hence the need for machine learning. 

Thirty percent of bad programs are measures of other types of entropy such as letters in a string.  These are different but very predictable.  The last 10 percent are third order functions that can be found by understanding the fingerprints left behind by other types of malware.  Here is where humans could not keep up so let the machines do what they do best.  To a machine, the patterns that are left behind look obvious even though to a human they are impossible to find.  It’s the final 10 percent that almost all AV systems miss but where machine learning shines.

Can you give an example of where the machine learning aspect has made a difference?

The most recent WannaCry ransomware was a great example.  It was new and almost all the traditional AV solutions missed it initially.  However, the code that loaded was predictably bad so a machine learning algorithm caught it right away.  The AV vendors eventually had a solution for it, but for many organizations, it was too late. 

The best way to think about the value of machine learning is that it works really well on zero-day threats.  Now that these are coming faster and faster, it’s time for businesses to completely rethink their approach to security.   Instead of catching 60-70 percent of the threats and only about 10-20 percent of the new, we can now catch about 99 percent of them. 

Final note

It’s my personal belief that we are on the verge of machine learning changing the world and there is some tremendous upside in security.  If the hackers are using automated systems and AI, then how can humans work fast enough to secure their organizations?  The fact is, they can’t.  Cylance is an excellent example of a company that is leveraging the advancements in technology to change an industry. 

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about AmazonBlackBerry CylanceCSOMicrosoft

Show Comments
[]