The security function needs SMART metrics

Our manager has learned to love metrics and finds them useful as the new security operations center gets off the ground.

Comments

I've become a big fan of metrics. I wasn't always, but throughout my career in information security, I've had bosses who have challenged me on metrics, and I have honed my skills so that now I feel the metrics I collect meet the "SMART" test: specific, meaningful, actionable, repeatable and time-dependent.

For example, once a quarter I report on the patch and antivirus compliance of our DMZ and production infrastructure. I also report on the number of unmanaged resources that are discovered. These metrics have never been challenged, as they are very specific, they are a measurement of risk (meaningful), there are clear actions to take (actionable), the method to collect and report each quarter are clear (repeatable) and they measure risk over time (time-dependent). Some metrics I usually report on once or twice a year. For example I like to report on the amount of security budget spent per employee, the number of security head count as a percentage of IT, and the percentage of security budget as a percentage of the IT budget. I then compare those numbers against my peers and other industry analyst benchmarks (such as Gartner). I don't have many metrics, but the ones I have tell a good story and represent the security health and risk of the enterprise.

Now that our security operations center is up and running, I wanted to create some additional and of course meaningful metrics that measure the effectiveness of that function. As I mentioned in my previous installment, one of the problems we're currently having with our outsourced operations center is the high level of false positives related to malware incidents. To avoid a high degree of response to false positives and to avoid stress between the security department and the help desk, I have directed the Level 1 analysts to forward malware incidents to a Level 3 analyst for verification. If the Level 3 analyst determines that a malware event is a false positive, he annotates it and trains the Level 1 analyst accordingly. Until the number of false positives is driven down to a manageable level, only Level 3 analysts can open a malware ticket to have an incident acted on. I mention all of this because it's one of the items that I am now measuring (more on that below).

Our company's help desk ticketing system is not sophisticated enough to track the details that I am interested in collecting relating to security incidents. Therefore, I use a Microsoft SharePoint list to capture various elements of a security incident. For example, I have our analysts annotate how the incident was detected which includes our SIEM, endpoint protection agent, employee, customer, law enforcement or other third-party agency. If the incident is related to malware (as most of our incidents are), I capture the department the user belongs to. If the incident results in a false positive finding, I have the analysts mark the incident as a false positive. I capture whether the incident was a result of a phishing attack or the installation of an application. I capture whether the malware is categorized as a rootkit, Trojan, Info-Stealer, browser helper object (BHO) or some other potentially unwanted program (PUP). I also track when the incident was detected and when the analyst actually started working on it. And I capture these, as well as many other aspects of an incident, so that I can, over time, track and trend around things like which departments tend to be the source of malware infections so that I can concentrate awareness training.

I track what the sources of incidents are, so that if we see an increase in malware incidents resulting from phishing attacks, we can visit our phishing defense strategy. I track the difference in time between when an analyst starts working on an incident compared to when the incident actually occurs. This is key to identifying how long it takes to respond to an incident. If it's taking too long, then that poses a serious risk to the organization.

I created the SharePoint list so that the data entry is simple and doesn't take more than a few minutes per incident to complete. I use lots of pull-down and checkboxes and I try to minimize the amount of free-form text boxes. What's nice about SharePoint lists is that I can easily export the list to an Excel spreadsheet and create pivot charts that represent the various metrics I use to measure both the effectiveness of the security operations function and various trends within the enterprise. Once I create the pivot charts, I simply update the table to obtain the most current data. Once I have the pivot charts looking pretty, I simply copy and paste them into a PowerPoint slide presentation or make them available on our CIO dashboard.

This week's journal is written by a real security manager, "Mathias Thurman," whose name and employer have been disguised for obvious reasons. Contact him at mathias_thurman@yahoo.com.

Join the newsletter!