We are excited to be a part of AWS re:Invent 2024. Visit us at booth #1844 in Las Vegas.More info
Glossary

What is a key-value store?

A key-value store, or key-value database is a simple database that uses an associative array (think of a map or dictionary) as the fundamental data model where each key is associated with one and only one value in a collection. This relationship is referred to as a key-value pair.

In each key-value pair the key is represented by an arbitrary string such as a filename, URI or hash. The value can be any kind of data like an image, user preference file or document. The value is stored as a blob requiring no upfront data modeling or schema definition.

The storage of the value as a blob removes the need to index the data to improve performance. However, you cannot filter or control what’s returned from a request based on the value because the value is opaque.

In general, key-value stores have no query language. They provide a way to store, retrieve and update data using simple get, put and delete commands; the path to retrieve data is a direct request to the object in memory or on disk. The simplicity of this model makes a key-value store fast, easy to use, scalable, portable and flexible.

Scalability and reliability

Key-value stores scale out by implementing partitioning (storing data on more than one node), replication and auto recovery. They can scale up by maintaining the database in RAM and minimize the effects of ACID guarantees (a guarantee that committed transactions persist somewhere) by avoiding locks, latches and low-overhead server calls.

Use cases and implementations for key-value storage

Key-value stores handle size well and are good at processing a constant stream of read/write operations with low latency making them perfect for:

  • Session management at high scale

  • User preference and profile stores

  • Product recommendations; latest items viewed on a retailer website drive future customer product recommendations

  • Ad servicing; customer shopping habits result in customized ads, coupons, etc. for each customer in real-time

  • Can effectively work as a cache for heavily accessed but rarely updated data

Key-value stores differ in their implementation where some support ordering of keys like Berkeley DB, FoundationDB and MemcacheDB, some maintain data in memory (RAM) like Redis, some, like Aerospike, are built natively to support both RAM and solid state drives (SSDs). Others, like Couchbase Server, store data in RAM but also support rotating disks. Some popular key-value stores are:

  • Aerospike

  • Apache Cassandra

  • Berkeley DB

  • Couchbase Server

  • Redis

  • Riak

Common use cases for key-value storage

Key-value store vs cache

Sometimes likened to a key-value store because of its ability to return a value given a specific key, a cache transparently stores a pool of read data so that future requests for the data can be quickly accessed at a later time to improve performance.

Data stored in a cache can be precomputed values or a copy of data stored on disk. When an application receives a request for data and it resides in the cache (called a hit), the request can be served by reading the cache, which is fast. If on the other hand, the requested information does not reside in the cache (called a miss) the requested data must be recomputed or retrieved from its original source which results in a delay.

Caches and key-value store do have differences. Where key-value stores can be used as a database to persist data, caches are used in conjunction with a database when there is a need to increase read performance. Caches are not used to enhance write or update performance yet key-value stores are very effective. Where key-value stores can be resilient to server failure, caches are stored in RAM and cannot provide you with transactional guarantees if the server crashes.

Summary

For most key-value stores, the secret to its speed lies in its simplicity. The path to retrieve data is a direct request to the object in memory or on disk. The relationship between data does not have to be calculated by a query language; there is no optimization performed. They can exist on distributed systems and don’t need to worry about where to store indexes, how much data exists on each system or the speed of a network within a distributed system they just work.

Some key-value stores like Aerospike, take advantage of other attributes to extend performance, such as using SSD’s or flash storage and implementing secondary indexes to continue to push the limits of today’s technology to places we’ve not yet conceived.