What is a real-time database?
A real-time database is a database that can ingest, process, and analyze data in real-time rather than on a delay or in batches. In practice, this means data is often loaded into the database immediately after creation through an automated process using direct data inserts, streaming ingestion, or similar approaches. As real-time applications become a larger part of the digital landscape, more databases are moving to a real-time format rather than a legacy batch approach.
Virtually any kind of database can be configured as a real-time database, though it can be easier to accomplish with more modern database classes like NoSQL and NewSQL. Some configurations are difficult enough to configure and maintain that they rarely exist in production settings, like hard real-time SQL databases.
Besides processing data in real-time, there are three important points to know about real-time databases:
1. There are two slightly different sub-definitions of 'real-time.'
Not all real-time is the same, and while the differences can be somewhat nuanced, they can also significantly change the way the database operates.
Hard real-time
In a hard real-time database, the transactions are guaranteed to be completed within a specific time, and missed deadlines are considered to be a system failure. Hard real-time databases are used in situations where a delay in data can mean serious consequences, such as avionics and air traffic control, real-time financial transaction clearing, and similar events.
Soft/streaming real-time
In a soft real-time database, transaction and processing deadlines are a bit more flexible, and some data staleness is acceptable in favor of higher throughput and graceful degradation. Soft real-time database systems are often used in situations where speed is important but not critical, such as updating consumer behavior data in fraud detection systems (compared to evaluating specific financial transactions for fraud, where speed is everything), real-time analytics, or IoT systems.
2. Every class of database can be real-time, but some are rarer than others.
As mentioned earlier, there are examples of real-time databases using just about every possible combination of database class, type, and configuration. However, legacy SQL databases tend to struggle with hard real-time functions, making them extremely rare and only useful in certain niche applications. Instead, most real-time databases tend to be NoSQL databases or NewSQL databases.
3. There's no 'correct' way to get data into (or out of) a real-time database.
Traditional legacy approaches to data ingestion primarily focused on a data source like a software or hardware system, a database system, or a dedicated ETL application in the middle that extracts data from the source, formats and transforms it, and then loads it into the database periodically.
Real-time databases don't have to rely on Extract, Transform, and Load (ETL) systems. Instead, they can ingest data in a variety of ways, depending on the application. Some common options include:
Streaming ETLs
Like traditional ETLs, these are dedicated applications for extracting data from a source and loading it into a database. Unlike traditional ETLs, streaming ETLs constantly pull data from the source and push it to the database without waiting for a batch to be ready or a time period to pass.
Common use case: IoT sensors
Direct transaction inserts
Applications skip the middle layer and write transactions directly to the database. With direct transaction inserts, data is usually delivered as it's generated with minimal processing or transformation. These are often used by microservices and APIs, where additional layers would represent an unnecessary lift or excessive overhead.
Common use case: E-commerce orders
Streaming ingestion
Events from a source are funneled directly to the database using a streaming platform like Kafka, Pulsar, or Kinesis. This is similar to streaming ETLs in that a middle service is used, and similar to direct transaction inserts in that data is sent with minimal transformation directly to the database; however, it differs from both and occupies a space somewhere between the two.
Common use case: User interactions for clickstream analytics
Change data capture
Records when and how data is changed at a source and pushes just those changes to update a destination database. This approach is often used for replication and other situations that create backups of current data.
Common use case: Database replication and recovery, like Aerospike XDR
Bulk load
Another approach often used for recovery and replication, these utilities will clone a point-in-time database state into another database with minimal or no transformation.
Common use case: Initiating a new cluster
Types of real-time databases
Just like traditional databases, real-time databases come in a variety of types and structures, ranging from simple to complex.
Non-relational/NoSQL real-time databases
The most common real-time databases are typically NoSQL-based. NoSQL databases are cloud-native and were often designed to function in real-time in a highly distributed environment, making real-time applications easy to develop using built-in functionality.
Some examples include Aerospike 8, Apache Cassandra, Amazon DynamoDB Streams, and Google Firebase Realtime Database.
Relational/NewSQL real-time databases
NewSQL was developed as an effort to bring legacy SQL systems into a cloud-native real-time environment. These databases prioritize the benefits of SQL, like strong consistency and highly structured queries and data, but still offer more modern approaches to data processing, like real-time transactions.
Some examples of NewSQL real-time databases include VoltDB, Google Spanner, PostgreSQL, and TiDB.
Other types of real-time databases
Besides the traditional split between relational and non-relational databases, real-time databases come in a variety of types and subtypes, making it easy to match the perfect database to an application. Some common additional real-time database types include:
In-memory real-time databases, which store data on DRAM for exceptionally high speeds
In-memory real-time data grid, a type of in-memory database that stores data across DRAM on multiple servers
Time-series real-time databases, which store time-stamped data for tracking changes over time
Streaming real-time databases, which are set to continuously ingest and process data
BaaS real-time databases, which are built to be feature-rich backends requiring minimal configuration and management
Benefits of real-time databases
Real-time databases are becoming increasingly common because the types of services and applications making up the digital landscape are becoming more biased toward real-time. For these real-time applications, real-time databases offer significant advantages compared to legacy batch approaches.
Low latencies: Real-time databases read and write data quickly, often taking just tens of milliseconds to complete a transaction, and sometimes even achieving sub-millisecond transaction times.
High throughput: Real-time databases can handle significantly higher sustained volumes than traditional databases, which might have large batch sizes but are limited to periodic updates.
Real-time syncing: Real-time databases can ensure that data is synced and identical across millions of clients at the same time… within a reasonable definition of "at the same time."
Event-driven architecture ready: Real-time databases are perfect when paired with event-driven software architectures and integrate into modern development patterns and best practices.
Data durability/recovery: Real-time databases are a cornerstone of modern replication capabilities, making data loss in the event of an outage or corruption more limited and allowing for real-time replication across distributed regions to minimize regional disruptions.
Real-time database challenges and considerations
While real-time databases offer many advantages, they also present some specific challenges and considerations.
Deadline handling: Real-time database systems have to decide how to handle deadlines, e.g., whether to implement hard real-time or soft real-time. This can have significant implications for how applications are developed and the capabilities they can offer.
Concurrency and conflict resolution: Data is continuously updated, but still has some latency between client and database. This can lead to conflicting transactions, as some clients may base decisions on data that changes while their requests are en route. Real-time database management software has to be able to handle these conflicts in a reliable and predictable way.
Consistency: Similar to concurrency, real-time databases are eventually consistent, meaning there are occasionally states where different clusters or clients see different data. Real-time databases need to be able to handle these consistency issues, and doing so effectively can be difficult.
Scalability bottlenecks: Real-time databases can suffer from several difficult-to-correct scalability bottlenecks that need to be accounted for. This can include potential problems like hotspotting, CPU and memory overutilization for complex queries, latency from concurrency locking, network saturation, and write amplification. All of these need to be accounted for in the design of the database and the application.
Security: Traditional databases limit when and how writing occurs, allowing them to handle security at a single point with traditional approaches. Real-time databases can be more difficult to lock down because of the myriad of clients with write permissions.
Cost: Storage and processing fees can be higher for real-time databases due to their large volume of transactions. This can be especially difficult to overcome for in-memory databases and data structures that rely on extensive, compute-heavy transformations.
Real-time vs. legacy databases
While real-time databases offer many advantages over legacy databases, they aren't always the best replacement. Which one is best will depend heavily on the specific priorities and characteristics of the application.
Feature | Real-time databases | Legacy (traditional) databases |
---|---|---|
Data ingestion | Immediately as it's created | Batched ingestion at scheduled periods |
Data processing | Can be immediate, batched, or based on triggers | Data processed in batches |
Latency | Milliseconds to microseconds | Seconds to hours, depending on batch scheduling |
Synchronization | Data is pushed to clients as it changes | Data is pulled by clients on request |
Consistency | Eventual consistency typical | Strong consistency and ACID compliance |
Scalability | Easy horizontal scaling for handling high-volume, high-velocity data | Vertical scaling and horizontal scaling are often difficult |
Data structure | Can be structured, semi-structured, or unstructured | Often highly structured |
Cost | Higher costs, more complex architectures | Lower costs, simpler architectures |
Security | More complex; must be handled in real-time from distributed clients | Batch processing and limited write sources make security easier and more straightforward |
When to consider a real-time database
Not every application needs to be able to update data as soon as it's generated. As an example, a business analytics dashboard is unlikely to have enough meaningful change minute-to-minute or even hour-to-hour to justify the effort of implementing real-time databases. However, there are some situations where they not only make sense but are absolutely required, for example, in a real-time messaging platform.
It could be time to consider a real-time database when:
There's high data velocity and frequent changes: Applications like financial reconciliation and clearing services benefit from real-time data by blocking transactions when a user runs out of money, for example.
Decisions might be influenced by more recent data: IoT applications that monitor temperature in agricultural automation, for example, would adjust their behavior based on rising temperatures over the course of a day.
Users rely on concurrent and consistent data: For example, multiplayer games need to know the state of each player at all times and communicate that state to other players.
There's little data-loss tolerance: Real-time databases are less likely to lose significant amounts of data due to disasters, regional outages, or similar problems because of better and more granular replication support.
The time for real-time databases
As more applications move to a real-time, event-driven architecture, real-time databases will become more and more common and important. Even more so as the world moves toward AI and real-time automation and programmatic decision-making, which rely on real-time signals to identify the best course forward. Choosing the right real-time database to support modern applications is as important a decision as selecting the language to develop in and the algorithms to use.
Aerospike's real-time data platform supports best-in-class applications by providing a stable, cost-effective solution with the lowest latencies and the highest throughput of any real-time database, making it an ideal choice for any developer.
What is a real-time database?
A real-time database is a database that can ingest, process, and analyze data in real-time rather than on a delay or in batches. In practice, this means data is often loaded into the database immediately after creation through an automated process using direct data inserts, streaming ingestion, or similar approaches. As real-time applications become a larger part of the digital landscape, more databases are moving to a real-time format rather than a legacy batch approach.
Virtually any kind of database can be configured as a real-time database, though it can be easier to accomplish with more modern database classes like NoSQL and NewSQL. Some configurations are difficult enough to configure and maintain that they rarely exist in production settings, like hard real-time SQL databases.
Besides processing data in real-time, there are three important points to know about real-time databases:
1. There are two slightly different sub-definitions of 'real-time.'
Not all real-time is the same, and while the differences can be somewhat nuanced, they can also significantly change the way the database operates.
Hard real-time
In a hard real-time database, the transactions are guaranteed to be completed within a specific time, and missed deadlines are considered to be a system failure. Hard real-time databases are used in situations where a delay in data can mean serious consequences, such as avionics and air traffic control, real-time financial transaction clearing, and similar events.
Soft/streaming real-time
In a soft real-time database, transaction and processing deadlines are a bit more flexible, and some data staleness is acceptable in favor of higher throughput and graceful degradation. Soft real-time database systems are often used in situations where speed is important but not critical, such as updating consumer behavior data in fraud detection systems (compared to evaluating specific financial transactions for fraud, where speed is everything), real-time analytics, or IoT systems.
2. Every class of database can be real-time, but some are rarer than others.
As mentioned earlier, there are examples of real-time databases using just about every possible combination of database class, type, and configuration. However, legacy SQL databases tend to struggle with hard real-time functions, making them extremely rare and only useful in certain niche applications. Instead, most real-time databases tend to be NoSQL databases or NewSQL databases.
3. There's no 'correct' way to get data into (or out of) a real-time database.
Traditional legacy approaches to data ingestion primarily focused on a data source like a software or hardware system, a database system, or a dedicated ETL application in the middle that extracts data from the source, formats and transforms it, and then loads it into the database periodically.
Real-time databases don't have to rely on Extract, Transform, and Load (ETL) systems. Instead, they can ingest data in a variety of ways, depending on the application. Some common options include:
Streaming ETLs
Like traditional ETLs, these are dedicated applications for extracting data from a source and loading it into a database. Unlike traditional ETLs, streaming ETLs constantly pull data from the source and push it to the database without waiting for a batch to be ready or a time period to pass.
Common use case: IoT sensors
Direct transaction inserts
Applications skip the middle layer and write transactions directly to the database. With direct transaction inserts, data is usually delivered as it's generated with minimal processing or transformation. These are often used by microservices and APIs, where additional layers would represent an unnecessary lift or excessive overhead.
Common use case: E-commerce orders
Streaming ingestion
Events from a source are funneled directly to the database using a streaming platform like Kafka, Pulsar, or Kinesis. This is similar to streaming ETLs in that a middle service is used, and similar to direct transaction inserts in that data is sent with minimal transformation directly to the database; however, it differs from both and occupies a space somewhere between the two.
Common use case: User interactions for clickstream analytics
Change data capture
Records when and how data is changed at a source and pushes just those changes to update a destination database. This approach is often used for replication and other situations that create backups of current data.
Common use case: Database replication and recovery, like Aerospike XDR
Bulk load
Another approach often used for recovery and replication, these utilities will clone a point-in-time database state into another database with minimal or no transformation.
Common use case: Initiating a new cluster
Types of real-time databases
Just like traditional databases, real-time databases come in a variety of types and structures, ranging from simple to complex.
Non-relational/NoSQL real-time databases
The most common real-time databases are typically NoSQL-based. NoSQL databases are cloud-native and were often designed to function in real-time in a highly distributed environment, making real-time applications easy to develop using built-in functionality.
Some examples include Aerospike 8, Apache Cassandra, Amazon DynamoDB Streams, and Google Firebase Realtime Database.
Relational/NewSQL real-time databases
NewSQL was developed as an effort to bring legacy SQL systems into a cloud-native real-time environment. These databases prioritize the benefits of SQL, like strong consistency and highly structured queries and data, but still offer more modern approaches to data processing, like real-time transactions.
Some examples of NewSQL real-time databases include VoltDB, Google Spanner, PostgreSQL, and TiDB.
Other types of real-time databases
Besides the traditional split between relational and non-relational databases, real-time databases come in a variety of types and subtypes, making it easy to match the perfect database to an application. Some common additional real-time database types include:
In-memory real-time databases, which store data on DRAM for exceptionally high speeds
In-memory real-time data grid, a type of in-memory database that stores data across DRAM on multiple servers
Time-series real-time databases, which store time-stamped data for tracking changes over time
Streaming real-time databases, which are set to continuously ingest and process data
BaaS real-time databases, which are built to be feature-rich backends requiring minimal configuration and management
Benefits of real-time databases
Real-time databases are becoming increasingly common because the types of services and applications making up the digital landscape are becoming more biased toward real-time. For these real-time applications, real-time databases offer significant advantages compared to legacy batch approaches.
Low latencies: Real-time databases read and write data quickly, often taking just tens of milliseconds to complete a transaction, and sometimes even achieving sub-millisecond transaction times.
High throughput: Real-time databases can handle significantly higher sustained volumes than traditional databases, which might have large batch sizes but are limited to periodic updates.
Real-time syncing: Real-time databases can ensure that data is synced and identical across millions of clients at the same time… within a reasonable definition of "at the same time."
Event-driven architecture ready: Real-time databases are perfect when paired with event-driven software architectures and integrate into modern development patterns and best practices.
Data durability/recovery: Real-time databases are a cornerstone of modern replication capabilities, making data loss in the event of an outage or corruption more limited and allowing for real-time replication across distributed regions to minimize regional disruptions.
Real-time database challenges and considerations
While real-time databases offer many advantages, they also present some specific challenges and considerations.
Deadline handling: Real-time database systems have to decide how to handle deadlines, e.g., whether to implement hard real-time or soft real-time. This can have significant implications for how applications are developed and the capabilities they can offer.
Concurrency and conflict resolution: Data is continuously updated, but still has some latency between client and database. This can lead to conflicting transactions, as some clients may base decisions on data that changes while their requests are en route. Real-time database management software has to be able to handle these conflicts in a reliable and predictable way.
Consistency: Similar to concurrency, real-time databases are eventually consistent, meaning there are occasionally states where different clusters or clients see different data. Real-time databases need to be able to handle these consistency issues, and doing so effectively can be difficult.
Scalability bottlenecks: Real-time databases can suffer from several difficult-to-correct scalability bottlenecks that need to be accounted for. This can include potential problems like hotspotting, CPU and memory overutilization for complex queries, latency from concurrency locking, network saturation, and write amplification. All of these need to be accounted for in the design of the database and the application.
Security: Traditional databases limit when and how writing occurs, allowing them to handle security at a single point with traditional approaches. Real-time databases can be more difficult to lock down because of the myriad of clients with write permissions.
Cost: Storage and processing fees can be higher for real-time databases due to their large volume of transactions. This can be especially difficult to overcome for in-memory databases and data structures that rely on extensive, compute-heavy transformations.
Real-time vs. legacy databases
While real-time databases offer many advantages over legacy databases, they aren't always the best replacement. Which one is best will depend heavily on the specific priorities and characteristics of the application.
Feature | Real-time databases | Legacy (traditional) databases |
---|---|---|
Data ingestion | Immediately as it's created | Batched ingestion at scheduled periods |
Data processing | Can be immediate, batched, or based on triggers | Data processed in batches |
Latency | Milliseconds to microseconds | Seconds to hours, depending on batch scheduling |
Synchronization | Data is pushed to clients as it changes | Data is pulled by clients on request |
Consistency | Eventual consistency typical | Strong consistency and ACID compliance |
Scalability | Easy horizontal scaling for handling high-volume, high-velocity data | Vertical scaling and horizontal scaling are often difficult |
Data structure | Can be structured, semi-structured, or unstructured | Often highly structured |
Cost | Higher costs, more complex architectures | Lower costs, simpler architectures |
Security | More complex; must be handled in real-time from distributed clients | Batch processing and limited write sources make security easier and more straightforward |
When to consider a real-time database
Not every application needs to be able to update data as soon as it's generated. As an example, a business analytics dashboard is unlikely to have enough meaningful change minute-to-minute or even hour-to-hour to justify the effort of implementing real-time databases. However, there are some situations where they not only make sense but are absolutely required, for example, in a real-time messaging platform.
It could be time to consider a real-time database when:
There's high data velocity and frequent changes: Applications like financial reconciliation and clearing services benefit from real-time data by blocking transactions when a user runs out of money, for example.
Decisions might be influenced by more recent data: IoT applications that monitor temperature in agricultural automation, for example, would adjust their behavior based on rising temperatures over the course of a day.
Users rely on concurrent and consistent data: For example, multiplayer games need to know the state of each player at all times and communicate that state to other players.
There's little data-loss tolerance: Real-time databases are less likely to lose significant amounts of data due to disasters, regional outages, or similar problems because of better and more granular replication support.
The time for real-time databases
As more applications move to a real-time, event-driven architecture, real-time databases will become more and more common and important. Even more so as the world moves toward AI and real-time automation and programmatic decision-making, which rely on real-time signals to identify the best course forward. Choosing the right real-time database to support modern applications is as important a decision as selecting the language to develop in and the algorithms to use.
Aerospike's real-time data platform supports best-in-class applications by providing a stable, cost-effective solution with the lowest latencies and the highest throughput of any real-time database, making it an ideal choice for any developer.