WEBINARS

Top 5 Things to Know About Aerospike Document Database

Video cover
You can view it at https://vimeo.com/844327984

Vikranth Dharamshi:

Good morning, good afternoon, good evening to all the participants from wherever you've joined across the globe. I welcome you to this webinar session of the Aerospike Tech Tips series of sessions.

Before I quickly start off, a few housekeeping points. This webinar is being recorded, and post the completion of it, you will have the recording shared with you. If you have any questions that come up in your mind as I'm going through the recording, please make sure that you post them on the questions box that you can see on the console through which you have joined in.

All right, so quickly about myself. My name is Vikram Dharamshi. I play the role of a solutions architect within Aerospike supporting our customers in the APAC region. I come with some 18 years of industry experience. And today's webinar session is going to be focused on top five things about Aerospike as a document store.

So here's the agenda that I'm going to run through today. I'm going to start off in terms of how Aerospike is really good at supporting extreme throughputs at low latencies and a few other attributes about Aerospike as a platform. And then we start delving into document store specific aspects. We look at what is the fundamental way the documents are stored in Aerospike. Then we look at how we can query these documents within Aerospike using JSONPath. Next, we look at the aspect of how we can index various attributes inside your JSON document through nested indexes and Aerospike expressions. And finally, we sum it up and see how Aerospike actually helps you build a real-time document store, which is extremely low in terms of total cost of ownership out there.

So let's look at Aerospike as a document store for real-time application. Aerospike is fundamentally supports these four key tenets that we are looking out over here. One is the lowest latency at whatever scale you would like to operate the database on. We call it unlimited scale. And if I'd like to quantify that, when I look at it in terms of the database operations, irrespective of your running a few hundred operations per second or millions of operations per second, Aerospike can give you lowest latencies in the sub-millisecond space with predictable performance across 99 percentile and above of your requests. And we make sure that as a data store we are always on. If you look at our Aerospike database, we very boldly mentioned that we can support 99.59s and upwards in terms of availability. So this is extremely critical to support your real-time workloads, your tier zero mission-critical applications.

So using these four tenets, Aerospike is really geared up to support what we call as the right now economy or your real-time applications. And any feature that we build in Aerospike, we make sure that we do not compromise on any of these four key tenets that come out.

So let's look at Aerospike in a little more detail as a platform in totality, just to give you a quick overview for the benefits of users who have not looked at Aerospike earlier. So I spoke about the four key tenets of Aerospike platform, which is nothing but the Aerospike data store that you see right in the beginning. Over a period of time, we build a very rich connector ecosystem around Aerospike. So if you look at ones on the left, other connectors that actually help you to get data in and out of Aerospike. So we have enterprise-supported connectors for Kafka, Pulsar, GMS. And ESP is our connector, which is an outbound connector where you can capture changes on the Aerospike records and push it out to HTP compliant endpoints.

If you look at on the right-hand side, these are predominantly connectors that focus on accessing data that is already inside Aerospike. So we have connectors with the Spark ecosystem where your Spark-based applications can read and write data in an highly paralyzed manner, which in turn helps you to access data in real time or write data in real time into Aerospike, and gives you the benefit of making sure your Spark cluster slices are fairly small, because you have real-time sub-millisecond access to data that is stored inside Aerospike.

Next comes our Aerospike for SQL connector. This is a connector with the Presto Trino, the open source motion, and the Starburst world. This connector basically helps you access data within Aerospike through SQL-based systems. We have our SDKs for popular programming languages like you see right on top, and we support various storage mediums in terms of where you can store your data.

So that's at a high level the four key tenets, which Aerospike is really good at supporting it. A quick overview of what an Aerospike platform is. Let's not spend more time on this and quickly move into the document-specific aspects of Aerospike.

So let's start looking at the fundamental aspects that has actually made Aerospike evolve from being a key value store to a document store to be a true multimodal data store out there to support your next gen real-time applications. So let's look at the first part of it in terms of how we actually store the JSON documents within Aerospike. As Aerospike supports what is called as the collection data types. And if we specifically look at the kind of data structures we support under this, those are nothing but lists and maps. Now, these are essential to ingest, store, and serve your document data. And we'll look at it in a moment how I can support that claim that I'm making. These are good at collection data types enable aggregation of related objects in a single record. And this also gives you transactional semantics across multiple object updates that are present within a collection data type inside a record out there. So let's delve a little more deeper into this.

So what are these collection data types? Like I mentioned, we support two kind of data structures out here. One is the list or a map. So you'll see two representations of that. The one in the blue is a list, and an example of a map is seen out here. The top level structure has to be either a list or map in a collection data type. The map keys can be data types like string, integer, double, or it can be a byte. And the good part is, you can actually nest to any level you would like below. Okay, this is optional, it's not mandatory. Depending on how you are representing your data, you can have a map within a map or you have a map at a top level and a list within that. And same holds good when at the top level you have a list. Inside a list, you can have a map. So depending on how you represent your data, you can decide on the collection data type nesting hierarchy out there. So this within Aerospike is one of the data types we support, and this is the fundamental aspect, how we actually store JSON documents within Aerospike.

So if you look at the next slide, this is a typical JSON document, JavaScript Object Notation, if I expand the short form out there for you. Now if you look at the similarity between what I showed you in the previous slide and how JSON data within a JSON document is organized, you will see startling similarities in terms of the collection data types out there. So you begin with the map, are those two curly brackets that you see on the outer side an adjacent document. And within that, it can be composition of multiple maps, or if you're encountering arrays, those are actually translated and stored as lists inside a map within Aerospike out there.

Okay, so at the top level structure, when you're storing a JSON document within Aerospike, it's always going to be a map. Map keys in case of adjacent document are going to be of string data type. So this is the fundamental way of how JSON documents are stored within Aerospike.

Now, let's dig a little further into this. So we've looked at the storage aspect of how JSON documents are stored within Aerospike. Now, recently, what we have done is we have introduced the Aerospike document API, predominantly with our Java-based client or the SDK or the driver, different terminologies to represent that. Very soon, we are going to have this capability rolled out to our other programming language clients out there.

So through the document API capability, we give you extensive querying ability into your JSON documents that are stored within an Aerospike record. So we leveraged the feature of JSONPath, which is very, very similar to what was out there in the XMLPath, you remember? So you had these large XML documents, and XMLPath was an easy way to navigate travels through your XML documents and land up to a specific data element of interest which you would like to read and write.

On the same principles, JSONPath has evolved. And this is something which we have adopted in our document API. This uses the JSON syntax. It basically enables you to traverse through your JSON document through JSONPath queries. It enables individual documents not only to be queried. So inside a document, you can do sub-document operations and just read or update specific aspects of the document. So your document could be a large one. Let's assume a 10-kilobyte document in a record of Aerospike you're storing. But as part of your [inaudible 00:11:43] operations that you perform on this document, because of the access pattern that is demanded by your application, you don't need to access that entire 10 kilobytes at one point. So using JSONPath, a document API gives you the ability to traverse to specific aspects within the JSON document and then either read that specific aspect or update that specific element within the JSON document. So with that, you're not moving around large documents across the network, and it shows just those specific aspects out there.

So let's look at some examples of how JSONPath queries can be used to traverse and navigate across your JSON documents out there. So if you look at the first example, which is basically a read-based example, on the right hand side what you see is a sample JSON document. It represents books inside a bookstore. And if you look at the string, first three line code that you see out over here, you are trying to define the JSONPath. So there you're saying dollar dot store. So you start off with the root element store on top, then navigate down to the book-based array that you see out there. And then you specifically want to read the authors within that book-based map that you see out over here. So you see the aspect of our collection data types, the maps and lists, they really gel well with the JSON document structure out there.

And then using a JSON document API, so the method that you see out below, document client dot get, very similar to your native get and put calls of a key value store. You pass on your primary key to the record. You pass on the name of the bin, which is you can have multiple bins in a record of Aerospike. And the JSON part. And this will help you just read those specific authors for every book that is stored in the JSON document.

Similarly, if you look at the second code example out there, this is to update a specific part of a JSON document. And in this example, it is the author. So you would like to update the author as J.K. Rowling for all the books that are out there in the store, an hypothetical example. So you leverage the document APIs put method, again passing the standard parameters like the primary key, the name of the bin where the JSON document is stored, and the JSONPath where your specific attribute inside the JSON document resides, and the value with which you want to update that. So it's very simple, it is very intuitive to navigate and do sub-document operations through the document API out there.

So we've looked at aspects of storage, how collection data types which are already present in Aerospike form the fundamental basis to store JSON documents. Then we have looked at how our document API leverages the concept of JSONPath that helps you leverage the JSON-like syntax to navigate across your JSON documents to do sub-document read and write operations.

Now, let's look at how can you actually build nested indexes and leverage our expression filters to actually filter and read specific aspects of your JSON-based documents across multiple documents at scale, how we call it in Aerospike terminology.

So let's look at a few examples out here. So on the one on right that you see is a JSON document. It represents all the Nobel laureates who have won Nobel Prizes for the subject of chemistry out there. And this is for the year 2021.

So what you see over here, the first part is you are creating an index on a specific attribute inside the JSON document. So in this case, you're trying to create an index on the category field. So if you look at categories, nothing but that represents the subject for which they have received the Nobel Prizes. So you're creating index on that, a good candidate to create an index. So if you want to read any data where you say, "Get me all the Noble laureates for subject chemistry," then this index comes into play out there and helps you retrieve that specific document out there. So the indexes are built across multiple JSON documents that are stored across multiple records of Aerospike. So that gives you a fast way to access the document using these nested indexes.

So right now we have built this index at a category level. Similarly, you can go down, and let's say you can build the index at the first name level or the surname level. So any attribute inside your JSON document, however nested it might be, the nested indexes feature of Aerospike helps you build index on that and basically run filter-based queries across multiple JSON documents, which will help you fetch documents of interest based on this secondary index.

So if you look at the next example is we have created another index. This time it is on the year-based field. So you're saying, "Get me all the Nobel laureates for a specific year," and then you can use our filter expressions to further filter down, let's say, on some of the criteria inside the JSON document.

So the fundamental aspect over here is the context part that you see in the second last line. Context basically helps you navigate across the collection data types within Aerospike and then reach a specific aspect and then update it, read it, and do things like that.

So we've looked at three aspects out there. The first one is how JSON documents are stored within Aerospike, how do you access them through the document API with the help of JSONPath-based queries. And next we looked at how you can build nested indexes within these collection data types, which internally translate into indexes into specific JSON documents, which can help you query the specific documents of interest that match your access pattern within the application out there.

So when we marry the original capabilities of Aerospike that you saw out there, the four key tenets, which is nothing but supporting workloads that require low latency in the sub-millisecond range by persisting your data on flash drives, giving you that predictable performance at whatever scale you're looking at, few hundred operations to few millions of operations, storing a few gigabytes of data to petabytes of data within Aerospike. We have customers across all this spectrum. And making sure that your database is always on 99.59s and up times.

And we do this at a very, very attractive proposition, which is with TCOs. When we compare it with other players out there in the market, it can be as low as 80% lesser infrastructure that is required to support such workloads. So with these capabilities, you can ideally use Aerospike to be a document store for your real-time applications to support your real-time, right now problems where you want to access data that is stored as JSON documents.

Let's delve a little further into how does Aerospike actually support. We make these claims that we can support 80%. You need server footprint, which is 80% less than what other people in the market require.

The fundamental reason for that is Aerospike is a flash SSD, I'll use these terms interchangeably, optimized data store. We have around 15 patents in the space where we actually store your records into the flash or SSD drives. Only the primary indexes are stored in memory when you use Aerospike in our hybrid memory architecture. So majority of your data size is actually residing on the SSD or flash drives, and the kind of aspects we have built around it, we are able to give you that sub-millisecond performance across 99 percentile of your request when your data is stored on SSD drives.

And like you're aware, you can have much denser nodes. You can pack much more storage through the form of SSDs or flash drives per server in comparison to what you can do when you have RAM on that server. So you can start off as low as four gigabytes to 512 gigabytes is a very common server configuration that you come across.

What if your requirement is more than that? Anything more than 512 gigabytes are those very specialized instances, and they come at significantly higher costs. But if you make a flip and look at the amount of flash storage that you can have on servers, it can be as low as from 1.2 terabytes minimum SSG drives to you get flash drives as high as 15 terabytes, and we can support multiple drives across a node. So you can have much denser nodes, more data packed on the same server, giving you all these four aspects of the performance. And that's how we are able to drive significantly high TCO that is required to run your Aerospike clusters that sit out there.

So quickly to summarize the entire presentation, the webinar. We first looked at these four key tenets of Aerospike. We looked at how Aerospike is really good at supporting this. We did a quick overview of Aerospike as a platform in totality. We looked at storage, how JSON documents are stored within Aerospike, how do you query Aerospike JSON documents through the document API leveraging JSONPath, how you can build nested indexes inside our collection types, which basically help you build indexes inside specific aspects of JSON documents. And then finally, we looked at you can achieve all of this with a significantly lower TCO out there in terms of the hardware footprint that is required to run the Aerospike cluster. And the fundamental reason for that is we are a flash optimized data store, which is persistent by nature, but gives you in-memory or RAM-like performance out there.

So I ask all the webinar participants, if you have any questions, I see a few of them already that have popped out there on the questions box. So we have another five minutes left in the webinar. So I'm going to switch over to do a Q&A very soon. Here's my email ID. So if you have any questions even after the webinar, once you have the recording available and something comes up in your mind, feel free to reach out and post your questions out there and we will be more than happy to help you out there.

Stay tuned for more such short [inaudible 00:23:47] webinars that are going to come out from Aerospike, month on month basis out there. So post this, I'm going to start looking at the Q&A.

About this webinar

Learn how to future-proof your infrastructure for JSON Document Data at Scale.

In this session, Aerospike Solutions Architect, Vikranth Dharamshi, will talk through the top 5 things to know about Aerospike Document Database:

1. Document Database for Extreme Throughputs & Low Latency 2. Document Database Storage 3. Document Database Query using JSONPath 4. Document Indexing and Aerospike Expressions 5. Document Database at Lowest TCO

Watch this session of Aerospike Tech Tips to learn the top 5 things you should know about the Aerospike Document Database to future-proof your infrastructure, such as querying using JSONPath, document indexing, and lowering your TCO.

Speaker

vikranth-dharamshi
Vikranth Dharamshi
Solutions Architect