Only a few years ago, database work was among the most boring of tasks in IT -- in a good way. Data went into one of the major SQL databases and it came out later, all in one piece, exactly as it went in. The database creators had succeeded in delivering rock-solid performance, and everyone started taking it for granted.
Then the nature of what we wanted to store changed. Databases had to move beyond bank accounts and airline tickets because everyone had begun sharing data on social networks. Suddenly there was much more data to store, and most of this new data didn’t fit into the old tables. The work of database admins and creators transformed, and what has emerged is a wide array of intriguing solutions that help to make databases among the more intriguing technologies today.
Cassandra, MongoDB, CouchDB, Riak, Neo4j -- the innovations of the past several years are by now well-established at many organizations. But a new generation is fast rising. Here we provide an overview of 11 cutting-edge databases tuned to store more data in more flexible formats on more machines in a way that can be queried in a variety of ways.
The database world has never been as varied and interesting as it is right now.
When a few refugees from Twitter wanted to build something new with the experience they gained processing billions of tweets, they decided that a distributed database was the right challenge. Enter FaunaDB. In goes the JSON, and out come answers from a distributed collection of nodes. FaunaDB’s query language offers the ability to ask complex questions that join together data from different nodes while searching through social networks and other graph structures in your databases.
If you’re simply interested in experimenting or you don’t want the hassle of rolling your own, FaunaDB comes in a cloud database-as-a-service version. When and if you want to take more control, you can install the enterprise version on your own iron.
You wouldn’t be the first architect to throw up your hands and say, "If only we could mix the flexibility of document-style databases with the special power of graph databases and still get the flexibility of tabular data. Then we would have it made."
ArangoDB isn't the only tool in town trying to mix the power of graph and document databases. OrientDB does something similar, but packages itself as a "second-generation graph database." In other words, the nodes in the graphs are documents waiting for arbitrary key-value pairs.
This makes OrientDB feel like a graph database first, but there's no reason you can't use the key-value store alone. They also include a RESTful API waiting for your queries.
How many times have you found yourself wishing for the power of a search engine like Lucene but with the structure and querying ease of SQL? If the answer is more than zero, Crate.io may be the answer.
While Lucene began as a search engine for finding keywords in large, unstructured blocks of text, it's always offered to store keys and matching values in each document, allowing some to consider it part of the NoSQL revolution. Crate.io started with Lucene and its larger, scalable, and distributed cousin Elasticsearch but added a query language with SQL syntax. The folks behind Crate.io are also working on adding JOINs, which will make Crate.io very powerful -- assuming you need to use JOINs.
People who love the old-fashioned SQL way of thinking will enjoy the fact that Crate.io bundles newer, scalable technology in a manner that's easier for SQL-based systems to use.
The name might not be appealing, but the sentiment is. CockroachDB’s developers embraced the idea that no organism is as long-lasting or as resilient as the cockroach, bragging, "CockroachDB allows you to deploy applications with the strongest disaster recovery story in the industry."
While time will tell whether they've truly achieved that goal, it won't be for lack of engineering. The team’s plan is to make CockroachDB simple to scale. If you add a new node, CockroachDB will rebalance itself to use the new space. If you kill a node, it will shrink and replicate the data from the backup sources. To add extra security, CockroachDB promises fully serializable transactions that are across the entire cluster. You don't need to worry about the data, which incidentally is stored as a "single, monolithic map from key to value where both keys and values are byte strings (not unicode)."
In a traditional database, you send a query and the database sends an answer. If you don't send a query, the database doesn't send you anything. It's simple and perfect for some apps, but not for others.
RethinkDB inverts the old model and pushes data to clients. If the query answer changes, RethinkDB sends the new data to the client. It's ideal for some of the new interactive apps that are coming along that help multiple people edit documents or work on presentations at the same time. Changes from one user are saved to RethinkDB, which promptly sends them off to the other users. The data is stored in JSON documents, which is ideal for Web apps.
Some databases want to store all of the information in the world. InfluxDB merely wants the time-series data, the numbers that come in an endless stream. They might be log files from a website or sensor readings from an experiment, but they keep coming and want to be analyzed.
InfluxDB offers a basic HTTP(s) API for adding data. For querying, it has an SQL-like syntax that includes some basic statistical operators like MEAN. Thus, you can ask for the average of a particular value over time and it will compute the answer inside the database without sending all of the data back to you. This makes building time-series websites easy and efficient.
Clustrix may not be a new product anymore -- it's up to Version 6.0 -- but it still calls itself part of the NewSQL revolution because it offers automatic replication and clustering with much of the speed of an in-memory database. The folks behind Clustrix have added plenty of management tools to ensure the cluster can manage itself without too much attention from a database administrator.
Perhaps it makes more sense to see the version number as a sign of maturity and experience. You get all of the fun of new ideas with the assurance that comes only from years of testing.
If you have data to spread around the world in a distributed network of databases, NuoDB is ready to store it for you with all the concurrency control and transaction durability you need. The core is a "durable distributed cache" that absorbs your queries and eventually pushes the data into a persistent disk. All interactions with the cache can be done with ACID transaction semantics -- if you desire. The commit protocol can be adjusted to trade off speed for durability.
The software package includes a wide variety of management tools for tracking the nodes in the system. All queries use an SQL-like syntax.
Some databases store information. VoltDB is designed to analyze it at the same time, offering "streaming analytics" that "deliver decisions in milliseconds." The data arrives in JSON or SQL, then stored and analyzed in the same process, which incidentally is integrated with Hadoop to simplify elaborate computation. Oh, it also offers ACID transactional guarantees to the storage.
RAM has never been cheaper -- or faster -- and MemSQL is ready to make it easy to keep all of your data in RAM so that queries can be answered faster than ever. It's like a smart cache, but can also replicate itself across a cluster. Once the data is in RAM, it's also easy to analyze with built-in analytics.
The latest version also supports geospatial data for both storage and analysis. It's easy to create geo-aware mobile apps that produce analytical results as the apps move around the world.
- Quick guide: Which freaking database should I use?
- SQL unleashed: 17 tips for faster SQL queries
- Bossie Awards 2015: The best open source application developer tools
- 9 cutting-edge programming languages worth learning now
- 8 MySQL gotchas worth a rant
- 7 essential SQL Server security tips
- Why your next database will be open source
- MongoDB, Cassandra, and HBase -- the three NoSQL databases to watch