In 2005 when Google was a $6.1 billion business, the database that underpinned the company’s primary cash cow – it’s AdWords online advertising platform that accounted for more than 95% of its revenue – was not keeping up with the growth of the company.
Typically when a traditional database needs to scale, a process called sharding is used. It breaks data into multiple smaller databases to distribute load. More than a decade ago, the database powering AdWords was getting so large that one reshard took multiple years. A new database was needed. So Google built one.
This week Google has made the database it built to handle AdWords available to the general public as a product named Spanner. It comes during the nascent stages of a wave of new databases hitting the market that are similar to traditional, relational SQL databases, but they’re much better at scaling to massive sizes. This new class has been appropriately dubbed NewSQL. And experts who track the database market believe they could one day give the giants of the database world, from Oralce, IBM and Microsoft, a run for their money.
What Spanner is
Google built Spanner to satisfy a number of criteria: It needed to be horizontally scalable to massive sizes and globally distributed in data centers around the world. Google also wanted a relational database that uses SQL – the popular database programming language; plus it needed to be low-latency and highly reliable. In 2012 after almost a decade of development, Google released a research paper describing Spanner and its use cases within Google.
Over the next few years the company developed Spanner as a database offering from the Google Cloud Platform. Google released an initial beta of Spanner earlier this year.
Spanner is a distributed database hosted in Google’s cloud that is globally consistent and scalable. That means there can be instances of Spanner located around the world – so that data is close to end users who need to access it – but each copy of the database is the same. Doing so is much easier said than done.
Google points to two unique qualities in its cloud that Spanner relies on to operate. One is to use a time-stamp method named TrueTime, which uses atomic clocks – the most accurate way of keeping time – to synchronize data around the world.
Spanner also relies on Google’s internal fiber network that connects Google’s data centers around the world. Spanner’s internal database traffic does not run on the public Internet, instead it runs through pipes built and controlled by Google, carrying only Google traffic. That gives Spanner internal traffic basically it’s own high-speed highway to get anywhere in the world.
The NewSQL market
Spanner is considered one of the first widely available cloud hosted NewSQL databases. NewSQL “represents the next chapter in the continuous development of database technologies,” a paper authored by 451 Research Director Matt Aslett and Carnegie Mellon University’s Andre Pavlo states.
Characteristics of NewSQL databases are not new, but they’ve only been available in individual database types. Traditional relational databases support SQL and have strong consistency, but they do not scale well. NoSQL databases scale easily but lack support for SQL.
“(NewSQL databases) are by-products of a new era where distributed computing resources are plentiful and affordable, but at the same time the demands of applications is much greater,” Aslett and Pavlo note.
The market for these new flavors of databases is still emerging. Perhaps the most notable example of NewSQL databases is SAP HANA, it’s in-memory relational database. A handful of other newer companies offer NewSQL databases, including NuoDB, H-Store, Clusterix, VoltDB, MemSQL and others. Amazon Web Services offers Amazon Aurora, which supports MySQL and PostreSQL, which some consider NewSQL.
One of the advantages of NewSQL databases is they support applications that run on traditional SQL databases, such as Oracle’s line of databases. Aslett and Pavlo point out, however that workloads running on those traditional databases are typically core applications that enterprises may be more reluctant to move to new databases unless there is a strong need to do so. NoSQL databases, on the other hand, excel at scalability and are typically used in new applications revolving around social, mobile and Internet of Things applications.
Analysts who track the NewSQL market still believe it will grow healthily in the coming years. Market Analysis, a research outfit in California, predicts 26% compound annual growth rate of NewSQL databases, reaching $1 billion by 2020. That is dwarfed by the traditional relational database management market, which IDC pegs as more than $30 billion annually. Customers with pain points from traditional databases are willing to invest in NewSQL for new workloads, though.
Spanner in practice
JDA, a supply-chain logistics firm, was one of Google’s alpha testers on its public version of Spanner. The company helps customers plan when to make and ship product, and tracks product lifecycle around the world. “In our world, consistency is very important,” says John Sarvari, JDA’s Group VP of Technology. “Our customers are making very important decisions based on their view of the data.” JDA has no plans to phase out is existing relational databases, but Sarvari expects new applications and workloads could be built on Spanner moving forward. The biggest benefit of using Spanner, he says, is that it frees his staff up from having to manage databases that can become unruly to scale. “We just get this as a service from Google,” he says, leaving staff to focus on the core competencies of what JDA provides to customers, instead of managing databases.
Spanner costs $0.90/node/hour, plus $0.30/GB/month for storage. Spanner has a 99.99% uptime Service Level Agreement if deployed in a single region, and a 5 9s of availability if its spread across multiple regions.