OpenStack helps deliver data insights at CBA

Open source pays off for bank’s Analytics and Information team

OpenStack, together with a substantial array of other open source projects, is aiding the efforts of the Commonwealth Bank’s data scientists to build a culture of experimentation at CBA.

Quinton Anderson — head of systems engineering for the bank’s Analytics and Information team — yesterday told the OpenStack Summit in Sydney that open source is helping crunch vast amounts of data, with insights fed into bank’s product systems to personalise engagement with CBA’s customers.

He said that data analysis is also helping deliver insights directly to the bank’s employees, “allowing our internal staff to know what’s going on, not only with our business but with our customers’ business and within our customers’ lives, and making decisions in business operations.”

“When you think about what it takes to not only personalise but fully target and really understand what your customers need, it’s very much a data and experimentation exercise,” Anderson said.

“We started our data journey four to five years ago, really trying to mature the way that we approach both management of data and machine learning for that purpose.”

The bank has some “fairly large Hadoop clusters” as well “some fairly traditional data warehousing technology”. Initially the Hadoop clusters were running on hardware in CBA’s data centres, with Cloudera installed directly onto physical servers.

“That worked in the beginning but as you can imagine comes with a bunch of inherent restraints which we needed to work through,” Anderson added. “The primary ones are obviously unit cost, cost of change every time we have to do large upgrades etc, but also the developer experience was not amazing.”

The Analytics and Information team embarked on a journey to “cloud-enable” some of the technologies it was employing, speeding up the process of experimentation as well as making it “a lot cheaper and lot more deterministic.”

“The technology stack that we landed on is not all that unique,” Anderson told the summit.

At the bottom is OpenStack’s Ironic service for provisioning bare metal servers. “The reason for Ironic for the initial set of use cases is big data processing [with] very stateful services — you want to be close to disk,” Anderson said. “Also increasingly as we move into microservices, the VM abstraction is in a lot of ways uncomfortable if you want to drive immutability.”

On top of Ironic, CBA relies on a range of other open source software, including Ubuntu, Docker for containers, Kubernetes for container orchestration, Calico for networking, Vault for secrets management, and Mesos and Marathon for resource orchestration. CBA data scientists’ toolkits include R, Apache Spark and Hadoop, Anderson said.

Anderson said it’s a “fairly typical” stack but added: “What we feel like we’ve done fairly uniquely is we focused on continuous delivery at all layers of the stack. We also focused on versioning through codification and immutability at all layers of the stack. “

“So for us, an upgrade of an operating system or even changing the permissions on a folder inside an operating system in an underlying host is not a matter of SSHing into a box but rather updating a declarative form of the environment and issuing a pull request, and allowing the tooling to make all the changes on your behalf,” Anderson said.

“This is true for operating system, it’s true for containers, it’s true for container clusters, container orchestrators, monitoring, alerting – absolutely everything. We use the same basic interaction pattern, which is versioning through codification, automation through codification and continuous delivery applying the changes off the back of that.”

“What we’ve done in the overall stack is we’ve composed a rather large number of open source projects together in order to generate our business outcomes,” Anderson told the summit.

The approach relies on “different levels” of composition, from build-time composition, through to deploy-time composition and run-time composition, he said.

“We believe strongly that versioning through codification gives us an absolute base case to be able to change all of those different levels of compositions and therefore services and systems in a sustainable manner over time because it allows us to recreate environments,” Anderson said.

“It allows us to build automation in a scalable manner and it allows us to recreate environments and therefore have automated testing, which is the basis of on which safe change can take place.”

Although the team began with an OpenStack-based environment, over time the same approach is being moved over into public cloud environments as part of a hybrid cloud strategy, Anderson told the summit.