How machine data helps Shazam remain 'app store royalty'

Despite a billion downloads, the company has struggled to gain insight into its data. Until now.

Shazam is one of the most popular mobile apps in the world. The company earlier this month announced it had reached a billion downloads – half of which came in the last two years – and, for the first time, turned a profit.

Having achieved the holy grail of its brand name being used as a verb, in recent years Shazam has expanded what's Shazamable beyond recorded music. In Australia consumers can scan KFC buckets, recognise television adverts and live performances to receive targeted marketing and more. The app even has its own music chart, launched in August, that runs every Sunday afternoon on Nova.

Shazam is, as the company's senior infrastructure engineer Chris Kammermann puts it, "app store royalty", but it has to work hard to maintain its reign.

"People throw away apps all the time," the Australian told Computerworld at the Splunk .conf 16 in Orlando in September, "if it's not in your top ten, it's gone."

"We have that app real estate on your phone," Kammermann added. "Now we've got to leverage that so we can get way beyond music."

Dollars in the data

A billion downloads generates a lot of data which the company had been struggling to gain a timely view of.

Every tap made within the Shazam app generates a beacon log file which is sent to cloud servers. In an effort to unlock the insight in this data, and drive better updates, the company turned to machine data search and analysis platform Splunk.

"The world moves so quickly. If we change something on the app we want to know the effect it's having now, not two days from now," Kammermann says. "If you're trying to run a full table scan on a traditional SQL database it's going to take forever.

"Now you're able to get what users are clicking on, how long they're spending on pages, if they're clicking on Youtube links, what the top ten songs are," Kammermann adds.

"For 10 per cent of users we'd change a feature here, for 90 per cent we'd change a feature there and compare the results. You'd think that's what Shazam would have been doing immediately. but it was just too hard to do it on the old system."

And as the company focuses it's efforts on advertising revenue, and its offering to brands, data insight has become more important than ever. The company had struggled to analyse customer behaviour and put together reports for advertisers to show demographic breakdowns of the users Shazaming their products.

"We wanted to sell that," Kammermann says, "and we just couldn't do it. It just took too long to do anything."

Chris Kammermann, senior infrastructure engineer at Shazam
Chris Kammermann, senior infrastructure engineer at Shazam

Using Splunk to analyse the hundreds of gigabytes of log files generated daily, Shazam was able to produce accurate campaign reports, reduce app faults and make ad hoc queries such as 'the most popular song in Sydney today'.

"We know what songs are selling fast, which band is trending in which location," says Kammermann. "Then we engage with the record label and say: 'Your band is doing well in outback Australia, you should send them there'."

Splunk and the data stored in it runs on 600 out of warranty servers from "a previous incarnation of Shazam", with historic data stored on Amazon RedShift. "Old servers break more," says Kammermann, "but in theory if a node fails I can just click a button to reprovision and reconfigure it."

Hack the charts, and predict them

Shazam was also able to catch artificially inflated tag counts – a good indicator someone was trying to rig the charts.

"If you are featured in the Shazam charts, you can enhance your career," says Kammermann. "People do try to hack the charts. We find some script kiddie has got the app running. They play a song over and over at home and continuously press the tag button. We can detect that now."

Kammermann, who grew up on a farm in outback South Australia, joined Shazam two and a half years ago. He is now expanding the use of machine data as a DevOps aid, adding Git, Jira, Jenkins, Puppet, virtualisation and container logs into Splunk.

His team are starting to explore the potential of machine learning, trying to predict if an app feature release or advertising campaign will cause the tagging rate to increase and by how much. Anomaly detection will be a useful tool when realised, says Kammermann.

"We've had events like, for a small period of time, a country of 30,000 people was in our top ten Shazam list because the app incorrectly recognised the country. But we don't have alarms and thresholds for that, we don't have anything that can predict when things are going to break or that something weird has happened. That's the next focus."

There's also the question of whether machine learning can predict the next number one chart hit. The company believes it can already determine, 33 days in advance, what song will top the US Billlboard chart with a Hadoop based model. Now Kammermann is hoping to improve on that with machine data and Splunk.

"Currently I've got a prototype," he says. "And I think mine is better."

The author travelled to Splunk .conf 16 as a guest of Splunk.

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags appsmobile applicationssplunkapp storemobile appslog filesShazamMachine Datamusic recognition

More about Splunk

Show Comments