The 5 best things about the Aerospike client you may not know
Please accept marketing cookies to view this content.
Tim Faulkes:
Thank you everyone for joining today. So I've been working with Aerospike and the Aerospike client in particular for probably close to eight years now, and it's got a lot of power. Like everything at the Aerospike, there's a lot of power in the client and many people aren't necessarily aware of them. So, I've come together and put a list of things that I think are probably some of the hidden gems inside the client. Some people know about them, other people don't, and some of these are things that people don't necessarily do right.
So what I want to do is take you through some of the good things that are in the client that you may not know. There's a lot of good things in the client that I'm not going to cover because they're very common, but these are some things that I either find people just don't know about or they don't necessarily use in the right manner. So, let's go through and talk about them.
Now, in this presentation, I'm going to be using Java as my examples, but the concepts will apply to almost all of the clients. So, the syntax might be slightly different, but if you like something and you think, "Oh wow, that compression stuff looks really interesting, I really want to know how to use it, but I'm using Go," for example, or C#, the same facilities will exist inside those clients as well.
So, the first one I want to talk about is compression. What I very commonly see is that people want to minimize the amount of network traffic between the client and the database and the amount of data stored in the database. Now, this is a very common pattern I see where they say, "I've got my object," and it might be a Java object, it might be a C# object or a Python object or whatever. And the first thing they need to do is they want to serialize it. They want to turn it into a binary representation. So if they're using Java, they might use a binary format or binary serializer to turn it into a byte array. They might use something like Protobuf, and Protobuf is a bit more language agnostic. Once they've got it as a byte array, they're going to go and compress it using Gzip or some other form of compression algorithm, and then they're going to store it as an object in Aerospike.
So Aerospike sees it as a binary data. Now they've solved their goal, they've compressed the data, they've got a smaller network packet from the client to the server, and they've got a smaller amount of data being stored on the server. So, really from that perspective, it's very successful. There are some big drawbacks of this, and the more features we release on the server, the more that we have these sort of issues. So, let's look at some drawbacks of this pattern.
For a start. If you're not using Protobuf, you're using a inbuilt language serializer like Java, it's really easy to take an object and serialize it into a binary representation. The problem is that it is then language specific. The information in your database can only ever be read by the thing that you serialized it with, Java in this case. Now, if you're using native Aerospike types inside the database, we store them in a language agnostic format so that you can write them in Java and read them in Python and so on.
You're not tied to that one language. The data in the database is able to be read by any of the client languages. If you're doing the serialization, particularly with something that isn't through a Protobuf or something like that, then you're very, very tied to that serialization. But the representation of a large object just stored as bytes means that Aerospike has no knowledge of the contents of that. So you can't do things like secondary indexes. You can't say, "Hey, there's a field in this bin that I really want to know about," and be able to index across the application. People sometimes work around that and the information they want to store in the bin, or sorry, they want the secondary index on, they put in a separate bin. So, they end up with a set of indexing bins and then one big data bin, and that's sort of a work round of the problem, but it is just a work round.
Another thing that you can't do is atomic server operations. You've got this huge binary data. Sometimes they're big, I've seen them in many megabytes. You've got this binary representation of your data. If you want to change one single item in it, you have to suck it back down to the client, de-serialize it, work out which part you need to change, change that, re-serialize it, and then send the entire thing back. If you're doing it properly, you probably have to do a check and set pattern. You've read it, you've modified it and you've written it back, but you want to make sure no one else has modified it since you last read it, so you lose that atomicity of those transactions. That's a big drawback.
It also means that one of the whole points of us doing this was to minimize that network traffic. You've actually counteracted that because you're sending these big objects forwards and back. Now for something like a simple cache that might work, but for many of the other use cases, you're actually making things worse for your infrastructure.
We also have things like container data types. I want to manipulate an object. I want to say, "Hey, I've got a list of addresses and I want to update the ZIP code on the third address." The container data types or CDTs inside Aerospike can do that very easily. It's an atomic operation. You just send a very small network packet to the server, it changes the data and just returns the result. It's minimal traffic and it's very, very fast. You avoid all of those, you can't use any of those if you're using this binary serialization.
Similarly, expressions. We have had some form of filtering expressions for a while. They used to be called predicate expressions and we came out with a much more powerful version called expressions. They don't work with these large binary objects. Yes, you can manipulate bits, but typically you don't know where the information that you want to manipulate in these binary-formatted objects are, so they don't work. And XDR, there's a couple of features in XDR which are very useful. You can say, "I don't want this data to be transmitted to remote destination." For example, if your privacy rules GDPR, CCPA, and so on, you could put a destination or a source in there and say, "If this information originated in the UK, it must stay in the UK. If it didn't, then I can transmit it to the rest of the world. Even if I've changed in the UK." If I don't have that origination field in there, XDR can't work that out. So, you're actually handicapping some of the main capabilities of the product by doing this.
Another drawback of this approach using XDR is if I've got two data centers and I'm modifying the same record at the same time on both ends, if I use it as an object now that Aerospike understands, so it's bins, I can say to Aerospike, "I want you to do bin-level shipping. I want you to not only ship the things that have changed on a per column or per bin basis." If it's one big object, then I lose the capability. So, if I've got two data centers and one data center changes my name and another changes my address, for example, they're non-conflicting changes. XDR and bin-level convergence can make sure they work very nicely with no conflicts. Using this serialized approach can't.
So what's a better approach? Well, Aerospike natively supports compression. Now, we have server-side compression and this is fairly well known. There's different algorithms you can put on with a server-side compression. What I typically find is that many people don't know that we have client-side compression too. On a per-core basis, I can specify I want the data to be compressed on the way to the server and the way back from the server and it's somewhat intelligent. We use a Zlib compression and if the data is greater than 128 bytes, it will automatically be compressed.
Now, I was working with one very large customer recently and they were bringing back large batches of information. These batches could be up to a megabyte in size and they handset compress. So once we turned that on, they found that yes, it cost them a bit of CPU, but the amount of data they were transmitting over the network dropped dramatically and it greatly improved performance and it's just a simple one-line change to them.
So, it's a really useful feature if you have large objects and you're either reading or writing large objects or you're doing large batches. Simply turn on compression and the clients all support this, it's a one-liner to turn it on. If it is only a small amount of data, it's less than 128 bytes, there's no cost to it. So, it's one of those things you might actually want to think about turning on if you've got a large amount of data on your client and server almost in your default policies, and that will work nicely.
The next one I want to talk about is info calls, and this is a little subjective, but I'm sure if you're an Aerospike developer, you've used tools like ASADM and AQL, things that allow you to inspect data. I can say, "Show me all the data in a set," or, "Show me not so much the data, but how many records are there in the set? When was it last truncated?" Information about metadata stored in your repository. Now, these tools are actually just using the standard client library, but what they're doing is they're using what we call info calls.
Info calls allow you to not just get and set data in the database like the standard library calls do, but they allow you to look at the metrics of the system, manipulate the configuration if you've got access permissions to do so, and retrieve a lot of different sorts of information. If you go to our reference docs, there's a lot of information on them, both on the information side or the info calls. These are the commands that you can go and invoke that will show you information or set information and commonly these are used through the command line with something called asinfo. You can do an asinfo minus V. The other thing that it will show you is some of the metrics you can get, metrics from the server and say, "Look at almost any of the metrics we've got," and there's a lot of them, there's several hundred of those metrics, easy to be read inside your program.
So let's have a look at example. This is from the info link. So, I went onto the reference site and I said, "I want to look at the info commands." One of those is sets and you can see that it comes out with a good description of it and it says that I can do sets by itself, I can use sets and pass a namespace, or I can pass it a namespace and a set. Now, if I want to test this from the command line, then I can do something like asinfo minus V, single quote sets slash test slash demo. So my namespace is test, my set is demo. And this will give me a bunch of information about that set. It will tell me when was it last truncated? In this case I haven't truncated the set so it's got a last truncate time of zero. It tells me whether there's secondary indexes. It says how much memory and data is consuming, how many tombstones, a whole bunch of information.
So, I did this through the command line. What if I want to write a program to use this? It's very simple. So this is a function that takes my Aerospike client, my namespace and my set name and all I want to do is go through and find out its last truncate time. Now, when you do this, you have a cluster, typically. You're not just going against a single node. If you have data that will change between the nodes, you'll get different values for each of these nodes. In my case, because the truncate time of a set is stored in what we call the system metadata, it's shared between all the nodes in the cluster, so they should all agree that they all have the same time.
So, the first thing I do is when I want to go and get my info, I go and get all my nodes. So I call client.getnodes and this gives me an array of all the nodes currently visible in the cluster. I parse my info call, so it's very similar to what we saw on the previous screen. I have my sets and then my namespace name and then my set name. And then for each one of those nodes, I'm going to invoke a request. So, I've got an info.request, and this is one of the reasons that many people don't necessarily know about the info calls. You don't invoke it on the client, so you might notice normally you do a client.get, client.put and so on. info commands have their own object. They're invoking those requests on. So I'll do my info request, parsing the node that I got out of my array and the info parameter and this will give me something like this. It's very similar to what we saw on the command line earlier.
I'm then going to try and manipulate that. I want to split it up so my colon becomes my delimiter and I get a set of strings. So that's what my metrics are. I then iterate through these metrics and look for my last update time. So, at this point, it's just basically string parsing. I've got my information from a node in the cluster and I'm parsing it and when I find my last update time, then I'll get that as a date.
I do want to make sure that now ... because this is slightly overkill, that I want to make sure that all the nodes in the cluster agree on the last update time. So, I will go through each call and do this. If I didn't want quite that level of rigor to it, I could actually have just found this on one node, know that it's a system metadata call and just had it doing one node. Got the result from one node and returned it. Some of those things you can do it on one node, some you actually need to poll every node in the cluster to make sure that they all agree, or if you're doing cumulative metrics or something like that, they may all have different values. So, you can see it's quite easy for me to code something that goes and extracts information, not data, but information about the cluster and how it's working.
The next one I want to mention is client-side logging. Now, this is something that is fairly useful but many people don't necessarily know about. Aerospike's client is designed to be very, very lightweight. So, for that reason, we don't integrate with standard logging frameworks in Java. Things like Log4J or the SLF4J, the standard logging protocols. Your application will enforce that. Aerospike's client shouldn't. We want to integrate with your framework.
So, the client itself has inbuilt calls. It's got the different log levels of things like debug and info and error, as you'd expect. If I want to use this, the first thing I need to do is tell it where to log the data to. So you can see that my first call is log.setcallbackstandard. And this will effectively, when something interesting happens to the client, it will log it and it will log it to stand it out. It will put it in this format where I have the date and time of that and the log level ahead of it, and then what happens?
So, these messages came out when I seeded my cluster. So I've got my Aerospike client that's pointing to my Docker container at this address and when I set my log enable I got these messages on the console. So you can see it's given me more information about what's happening, how is my cluster responding? I have a node, I saw that I've only got my one server node. If I had multiple nodes in the cluster, I can see more information. It allows you that level of control. I can also put my own debugging messages in.
Now, some people I see will have standard application level logging facilities like Log4J. So, you'll probably use your Log4J for that, but if I want, I can use this standard facility so I can do all the normal sort of things you do with logging protocols. I can check to see what my log level is set to, test for flags and then write information. And that will appear in the standard format that I've specified here.
If I want to integrate with another standard framework, instead of calling the log set callback standard that I did originally, what I'm going to do is change this. I'm going to say log.setcallback and give it ... this is a callback that happens and it takes a function which has the level as an argument, the debug level, or info level or whatever, as well as the message, the log message and I can call whatever I like. So, my standard is Log4J2. I could call my logger inside this callback and have the Aerospike log integrated with your standard application log.
So it's a very powerful feature, easy to use, but again, because it's not built into the client object itself, it's a separate object. Many people don't necessarily know that you can do this and if you're trying to debug service ... or sorry, client side problems in particular, having this level of extra detail is very useful.
The next one I want to talk about is what happens on failure? What happens if a nod times out, or something like that? And some people are quite surprised about how it behaves.
So, let's say I've got my application, my application's running on a client node and I have three nodes in my cluster. Now, let's say I'm doing a read. Reads are easy, they can be retried. If something happens to the node I'm reading from, then I want to try again. So, let's say I've got my three node cluster, A, B and C. My master is on node A, my replica is on node C. By default, the reads have retries. There's a number of retries per read. In fact, the number of retries by default is two, and bear in mind that's the number of retries, it doesn't count the original message. So all in all, I will have three attempts going to my server.
Now, what used to happen a long time ago was when you sent a message to the server, it said ... Let's say that message timed out, I had an aggressive timeout policy and ... or the server was running slow, or something like that and I didn't get a response back inside my timeout. We would then retry on the same node and try over and over again until we got a response. That's honestly, not very smart. And so, a number of years ago we changed this so the retries occurred on sequence.
What sequence means is I'm going to try my master on my first attempt. So, first attempt goes to the master and if I time out on that, I'm going to go and go to the replica and read the data from the replica. Why? Why might this be a good idea?
Well, one of the most common causes of the master not responding is the master has fallen over. So, let's say I've got a read-heavy use case, my reads are super critical. So, maybe I've got 50 milliseconds to say, "Yes, I want to read this piece of data in that time." That's my SLA.
If I go to the master, if it doesn't respond because it's fallen over and ... the cluster will adjust, it takes about a second and a half of that cluster to adjust. In that second and a half, the cluster isn't necessarily aware this master has fallen over. It might just have blown a power supply or something and the cluster hasn't worked that out yet.
So, what I can do is I can set an aggressive timeout on my client. I can say, "This request, I want you to time out in 20 milliseconds," for example. What will happen is we've put the message on the socket, there's nothing listening there because the server blew its power supply. 20 milliseconds letter will get something back saying, "Timeout."
There's no point going back to the server. We haven't fixed the power supply at that time, so we're going to go to the replica. So automatically Aerospike will set the sequence mode, so the replica mode will be sequence, which will say go to the replica. This replica hasn't blown the power supply and so it has a copy of the data and we can read that in a very aggressive manner. So, within those 50 milliseconds we can do multiple round trip network calls to different servers and get the data.
This works really well in almost all cases. There are some cases, particularly in strong consistency mode, where you want to guarantee linearized reads. You want to make sure that under no circumstances can you get a stale piece of data. And in that case, we can't use sequence. The problem with that is it might just be the master is busy, it might've taken a write, got the latest copy of the data and linearized guarantees that we don't ever have a stale copy of the data, so we have to go and read it from the master. So, that's the one drawback on this sequence, is you can't guarantee linearized reads.
But what about writes? So, reads it makes sense. I can retry those reads over and over. What about writes? Is this useful on writes and if so, should we go to the master or we go to the replica? Well, writes, you have to be very careful of. If you're doing a right and it's not an idempotent transaction, you can't just retry it.
What does idempotent mean? Well, idempotent means I can do the same operation multiple times with no side effects. So, let's say I'm going to set Tim's name, I'm going to change my middle name and I'm setting it to Bob. If my first transaction failed, I can retry it. I'm setting it to the same value so I can just retry that safely. It's entirely safe for me to retry that. If the first one succeeded but I didn't get a notification as they had succeeded, I can retry it. There's no side effects.
Compare that, for example, with changing someone's account balance. I'm taking $100 out of someone's account. If my first transaction succeeded but I retry it, then I've taken $200 out of their account. That's not acceptable. That's an example of a non-idempotent transaction. So, never retry a non-idempotent transaction and there's good strategies for working out how to address these problems, but if you've got an idempotent transaction, something that we can retry, then what we can do is we can set sequence mode on that.
Now, with Aerospike to do a write, you must go to the master. So, why would it make sense for the retry to come to the replica? Why won't we just get back to the master on the retry? Well, again, the most common cause of the master timing out is the master has fallen over, a power supply blew, or something catastrophic happened.
In that case, the cluster is going to start adjusting. It's going to take that second half or the heartbeats we're sending back and forth between the nodes, will take about a second half to say that master has died. Once the cluster has worked that out, the replica will get promoted to be the master. So, if I'm retrying against the master, it's probably going to fail, but if I'm retrying against replica, there's a possibility that replica has been promoted to be the master and therefore that retry went to the right spot. Even if it hasn't, even if the master was just slow and it's still there, when I go right to the replica, it's going to try and talk to the master and say, "Hey, I'm the wrong node for this write. You have to handle this write." And so, it will proxy that request onto the master and that means that the cluster will behave correctly but gives you this high probability of getting a faster write on the probability of that replica being promoted to be a master when the cluster readjusts.
So, that's what happens in a standard cluster. But what about if we've got a distributed cluster? So, let's say I've got three zones. I've got nine nodes in my zones, and let's say I've got three copies of my data. So, I've got a full copy on each zone. Is sequence still useful here?
Well, sequence is, but typically your application will run inside a single rack. In this case my application is running in zone two. I've got a full copy of the data in zone two, my master's on zone one. If I use sequence, it's going to go and head to the master on zone one. That means it's going to be slower. There's typically a performance penalty if I'm going between zones in one of the cloud providers and sometimes there's even a cost penalty. They charge money for inter-AZ traffic and so there's a cost penalty of going between zones.
If we're in a networking situation where we have this sort of topology, what we'll typically do is we'll set up what we call rack-aware reads. I have a full copy of the data on my local cluster. I can go and read for my local cluster. So, instead of using sequence, we're going to use prefer rack and that's [inaudible 00:25:28]. When I start up my application, I'm going to set up my client policy and I'm going to say, "I'm in zone two, I'm running on rack two." And I'm going to say, "I am rack aware." I want to know that the cluster has been set up into multiple racks and when I go and read, then I tell it that I'm allowing you to read from the replica, it'll go and read from zone two first.
Now, what happens if the read fails? What happens if node F has fallen over? Well, if prefer rack fails for some reason, it falls back to sequence mode and so Aerospike on the retry will try rack one. And if that fails, it will then go and try rack three. So, it's giving you the best of both worlds sequence makes sense in many cases. If you're doing this rack awareness, then sometimes prefer rack makes a bit more sense.
All right, the last topic I want to cover, and it's a fairly big one, is about policies. And almost everyone who uses Aerospike has heard of policies. Every API call that the client takes has a policy as the first member, but the number of times I've spoken to people and they don't understand them, means that I want want to spend some time covering them. I want to talk about how they work and how you should be using them.
Now, inside the Aerospike policy at the moment, there's a few different parts of information. Some of these are used for data modeling, things like the generation counter, so that check and set I mentioned earlier where you're doing a read, modify, write cycle back to the database, you have to use the generation counter. Doing things like durable deletes, I'm doing strong consistency. I want to make sure that records I've deleted never come back. I should use durable deletes.
This first set, I find are pretty well understood. They have very well-defined situations when you would use them and when you'd not use them and people tend to understand them. The other major part of a policy is the timeout. What happens when bad things happen? And these are the parts that I find are very, very badly understood and so, I want to talk about them into context of these timeouts and look at the settings that make sense on those timeouts.
So, policies are quite complex. There's a lot of information. Some of these are only used once. The client policy, once you instantiate the Aerospike client, you never need it again. It's keep that client policy for the rest of the time.
These gray boxes over here are the ones that you tend to do on a per-core basis. And the primary ones you have are scan policies, batch policies, query policies and write policies. Now, scans have been deprecated in terms of favor of queries, so this is more historical. These are all subclass a policy. And the particular things that I'm talking about are some of these timeouts on the policy class. So, connect timeout of max retries, sleep between retries and so on.
Let's talk in detail what happens at the client level. Now, we have a client library. It sits inside your applications, it's linked in as a jar, or a DLL, or whatever your appropriate technology is and it talks to the cluster. The cluster has multiple nodes. When the client attaches to the cluster, it'll go and talk to each node in the cluster and it'll create a connection pool to each node in the cluster. It used to be that these connection pools started empty.
So, the first request that came in had to go and establish a connection to that node in the cluster. It had a connection pool, but those connection pools were empty at that point. Now, if you are using TLS, this socket, this connection establishment, can take a decent amount of time. If you have an application that is very, very heavy initially as you fire up the application, the first requests that come in need to be very, very fast.
This was a bad model. If I'm using TLS, that connection establishment can be hundreds of milliseconds or longer. So, this was a bad connection model. So, we've now got a ability to specify the minimum number of connections per node. So, I could specify say two connections per node. That means that when Aerospike starts up the client, it'll create these connection pools and it will go and establish these connections to each one of the nodes in the cluster. Take some time, but then it means that when traffic starts hitting your application, you're already got those connections to the cluster.
You can also specify the maximum number of connections per node. Now, this used to be 300 connections per node. It's dropped recently, it's down to about 100 connections per node and that's quite a lot when you consider each one of these transactions typically takes say a millisecond. That's a lot of information you can exchange and 100 connections is a pretty good level of connections you can grow.
We can also specify a number of connection pools per node. So, I've only got one connection pool per node, but if I had a node that had a really high core count, so it's got say 128 cores, I could specify multiple connection pools per ... on the Aerospike client and that would actually improve some of the parallelization across that client. Most applications don't benefit from increasing that.
All right, we have a number of timeouts. So, once we've got that established connection, what happens? What do we need to do? We've got these parameters, the connection timeout, the max retries, the replica, sleep between retries, socket timeout, timeout delays and total timeout. If I could take a poll of the people on this call and say who guarantees they understand every one of these and what they mean? I'm willing to bet that almost no one would raise their hand and say, "I've got a rough idea of how they work, but I don't really know. I just use what we've used and it works historically, so it must work going forward." I hear that over and over again and the problem I have with that is different use cases have different requirements.
Let's say I've got a use case where my writes are idempotent. Reads aren't terribly important in terms of they can take a hundred milliseconds, we don't care. Writes, I'm ingesting data from an IoT sensor. So, the write time is very critical. I might set up one way, but then if I've got the converse, I've got a very read-heavy use case, where I need to get data back to the application very, very quickly. I can retry those retry ... Sorry, I can retry those reads. But I want to do that sequence mode we talked about earlier. I want to do that quickly. That's a very different configuration than what we just mentioned about the writes. So, we need to understand this because each use case may have different values they need to set.
So, what I want to do is look at a timeline. So, I've got my time, I've got my timeline here and I submit an API request. Let's say it's a get. I'm doing a simple get of a key. The first thing the Aerospike client will do is it will test to see, do I have a spare connection? So, I just mentioned those connection ports. If I've got my minimum number of connections, so let's say I've got my two connections, and both of them are busy, I don't have anyone ... any other free connection in the port, I must establish that connection. Now, if I'm not using TLS, this can be moderately quick. If I'm using TLS, this can take a reasonable amount of time.
So, there is a separate timeout called the connect timeout, this top one. And this says, "How long should we wait for this connection to be established?" If I can't establish a connection in that time, throw an exception. So, that's why you connect timeout. It's only used if you have to create a new connection. If you've already got spare connections in that connection pool and you're just borrowing one, that connection timeout does not apply to your transaction at all.
So, I've now got my connection, be it from the connection pool or because I established a new one. What I want to do now is put my API call on. So, I put my API call on the wire, I wait for a reply and the time I wait for that reply is your socket timeout. Each one of these things that I put on the wire, I'm going to wait socket timeout, and if I get a response in that time, great, I'm done. I've finished that API call. If I don't, so my socket timeout. Let's say we set the socket timeout to be 50 milliseconds. My 50 milliseconds has elapsed. Well, what's going to happen is Aerospike will then look at the retries and say, "How many retries do I have remaining?" By default on a write, this is zero. It is two by default on a read.
So I'm doing a read, I've got two timeouts remaining, so two retries remain. At this point, the sleep between retries comes into play. So, I get an extra time which says, "I'm going to wait 30 milliseconds between each one of my reads." So I've got my timeout after my 50 milliseconds, wait 30 milliseconds and then retry. At that retry, this becomes my first retry. Remember, we've set two retries, that means a total of three attempts.
Retry one, the same thing happens. I put my message on the socket, I wait for a response. Whether it's the same socket or it goes to a different node depends on this replica. We've just discussed the sequence mode replica, which says I might start on my master. So this one might've gone to my master and my second one might go to a replica. It might be different, or it might be the same, depends what you set that replica policy to.
Again, we will wait for that socket timeout to elapse. If that socket timeout elapsed, we decrement the max retries, it goes down to one remaining and we wait. We wait for that sleep between retries. Our second retry comes in and we put the message on the socket. Now, after this one would've finished, we would've said, "We've got no more retries. We'll throw a timeout exception." But there's one other parameter which is this total timeout, and that is from the time we have established a connection, this is the maximum amount of time the entire transaction including retries will take.
So, I've got 50 milliseconds here. I've got say 30, milliseconds here. So I've got 50, 80, 130, 160. It was going to take 210 milliseconds for me to time out, but I might've set my total timeout to be 200 milliseconds. It is shorter than the aggregate of all those individual parts.
So, when I do that, the timeout will fall down to this total timeout. So, instead of waiting those 50 milliseconds, you'll say, "My timeout will elapse 40 milliseconds in, therefore I'm going to only wait 40 milliseconds and then throw a timeout exception." So, your total timeout is end to end. Socket timeout is a timeout of each one of these messages and then you slip between retries, between each one of those messages to the server.
There's this one other thing that I haven't touched. It's this timeout delay. By default, what will happen is as soon as this socket times out, the client will reap that connection. It'll say, "Hey, I didn't get a response to my timeout. I'm going to throw away that socket so no one puts information on that I don't know about." But if I'm using TLS, that can be expensive. That means that I have to reestablish a new connection. That can take time.
It might be that I've set a very, very aggressive timeout and the server's just running a bit slowly. So, let's now set my timeout to 10 milliseconds instead of 50. So, if it's running a bit slowly or I've had a network kick out or something, the server still gets a request and will process it and send me a response back. Once I've got that response, the connection is still good. So, I could in theory not need to reap that connection, I could just reuse it.
So, timeout delay says don't reap the connection at this time. Wait for a period of time. If that period of time elapses, then go ahead and reap the time. Now, typically we recommend this timeout delay is reasonably big. Something like say, three seconds. That gives time for the cluster to readjust and you should be able to get a response back in that time. If not, something has gone wrong and you need to reap the connection. There is work on the client side to know what connections we're waiting for and monitor them and make sure that we have consumed any information on those sockets. So, there's more work on the client side, but it can mean faster application response because you're not reaping those connections if you get timeout. So, if you're in a flaky network, sometimes this timeout delay in particular can make a difference to your application.
All right, so there is an interrelationship between socket timeout, remember that's timeout ... each one of those messages takes on the socket, and the total timeout. So remember, total timeout is this big overarching total timeout which says, "This is the maximum time we can take," and the socket timeout is for each message.
So, the rules go something like this. If I have set a socket timeout so it's greater than zero and the socket timeout is less than the total timeout, well total timeout trumps socket timeout. We want the total time to be the total timeout. So, server timeout is going to be set to that total timeout.
If we have a no socket timeout and our total timeout is greater than zero, then we're going to set the server timeout to be the total timeout. That means that we can set it longer than the server is expecting. We can set it to be 10 seconds if we want and the server will wait those 10 seconds. If we have set neither the socket timeout nor the total timeout, then we're going to set the server timeout to be the transaction max milliseconds. And so, by default, that's a second. So, if you don't set either of these, your default timeout will be a second.
Sleep between retries applies only for server timeouts and not client timeouts. If you get a client side timeout, then you can automatically retry without that sleep between the reads. And we mentioned about the sequence mode, so we can say ... This is an example of doing sequence. I'm setting my sequence, I'm setting my timeout aggressively to 20 milliseconds. Then I want my timeout delay to be say three seconds to make sure the socket will drain fast.
All right, so policies are good. We've got these different options within the policies. We need to set them. Now, I see a lot of people who do this, they do something like client.put, my policy is null, takes a key and I'm putting some information. So, a simple write to the database. What's the problem with this? Well, null says I'm going to go and get the default off the server.
Now, that's not bad. If I have set ... when I created my client connection, I can set up a meaningful write policy. So I can do something like this, I can create a write policy and I can set my socket timeout to be two seconds. When I go and create my policy, and you must do this before you connect to the cluster. You create your client policy and you set your right policy default, you parse that in and whenever you parse null, you'll come and get this default policy which includes that socket timeout. So, that works really well, but it means that if you parse null, you've got to make sure that the defaults really do work for you. Don't just be lazy and parse null because it's convenient. Make sure you've thought about do my defaults work in this case? There's no difference between these two lines.
If I'm parsing null, I get in the default write policy. So, these two are identical. In fact, the client itself will say, "If policy equals null, go and get me the write policy default." But make sure you think about your policies and what a meaningful value is.
So, we've got these policies and as I mentioned there's two different parts. There's the application-specific part, like generation counts, and then there's the timeouts and things like that. They do have interrelationships, and I see people doing this wrong all the time, of setting the policies.
So, let's say I want to do a check and set. So I've read my record, I got a record back and that includes a generation count. How many times has the client changed? Or sorry, how many times has that record changed? I want to write it back. So, I'm going to go and get my write policy default and then I'm going to say, "Set the generation to be the generation I read for my client." Okay, that's great. I want to make sure that matches on the server and set my generation policy to be gen expect equal. That's going to tell the server that when I do this write, only write the record if the current generation count, the number of times that record has changed on the server, is this value. So, that'll do check and set, but this is actually really bad code.
What's the problem? Well, this write policy default has a single instance. You're not getting a copy of it. You're getting the one instance that everyone shares. Remember the Aerospike client is multi-threaded. You should instantiate it once and everyone should use that one copy. So, I might have 100 threads and they're all doing writes at the same time. If they all get the write policy default, as soon as I change this, I've changed that one policy that every one of those 100 threads is using. It worked just great in my use case, my thread will work nicely, but those other 99 copies will almost certainly fail their write and get an exception saying that generation wasn't what you expected because they're probably writing a different record. So, this is a very dangerous and bad approach.
So, that's the wrong way of doing. Let's look at another way of doing, and this is probably the most common pattern I see. I get my write policy and I create a new write policy. So getting all the defaults from the compiler. I set my generation, I set my generation policy, I do the write. What solves this problem? We have our own write policy. No one else has a copy of it. It's still not great code though, and this is probably the common mistake I see people make.
We set information in our write policy default, remember? We set a socket timeout to be two seconds, but by instantiating a new write policy, I've gone and got one that the compiler has defaults in and I just got those compiler defaults. I didn't get the values of that default policy that I set. So, I've lost that socket timeout that I so carefully set.
So, the right way of changing a policy is something like this. Get your write policy, default and instantiate a new write policy based on that policy. It'll copy all those settings over. Change it and use it. Now, this works really well. It's a bit more verbose, but it is guaranteed that you'll get all your defaults and you can modify the object and it's your own copy of it. So, this is a pattern you really should be using and most people don't. They tend to do this far more than anything else.
The last thing I want to touch on it's related to policies is what we call circuit breakers. It's a bonus, but this is something we've come out ... It's not something that is unique to Aerospike. It is a distributed problems where it is called metastable failures. And effectively what happens is you have a system.
So, this is a closed loop system, for example. I exist in a stable state. I'm processing my request, they're done at a certain transaction rate. I might be in a stable state. There's some form of increasing load. And what this is comes down to the system and the hardware and what you're doing. It's very difficult to predict, but that increasing load can take you out of a stable state and put you into a vulnerable state. That vulnerable state, you don't know about. You can't predict it, you don't necessarily know it's in a vulnerable state at that point, but a trigger at that point, be it something like I'm writing a lot of big records, I'm accessing keys that are particularly hot, I've got a network delay or something, it's some black swan event that causes a trigger. And that forces you out of a vulnerable state and into a metastable failure state.
Now, the problem with these metastable failure states is they are self-preserving. As you try and fix it actually worsens that. To give you an example, let's say we have set a timeout and we're just going to the master, we're retrying. We're trying to say, "I'm working nicely, but if the timeout happens on the server, just retry that." That works well at a low state. But as I get into a heavy load state, what might happen is a server gets overwhelmed and might start running slowly. Instead of backing off, which lets the server recover, what you're doing is you're retrying those client transactions. You're pushing it harder and harder and harder, which is worsening the situation.
In fact, these metastable failures can be so difficult to get out of, they can do ... they can require things like node or even cluster shutdowns to get them out of these metastable failures. There's a link here which gives a good amount of information about them. As I say, it is not a concept we created. It's something we've seen at our customers and it's a well-known distributed computing problem. Luckily there is a solution.
In more recent clients, there is a max error rate and an error rate window. And the error rate window is typically measured in 10 intervals. And it says basically, if I get this number of errors, this max error rate in this number of ... in this time period, I want you to fail the transactions I'm sending to that server. Don't even bother sending them to the server. I want you to fail them.
Now, this sounds really bad, my application is suddenly going to fail. I'm not even going to the server to test to see if it's going to fail? What you're actually doing is you're backing off. You're going to ease the load on the server. You're going to move these so that the load out of this metastable failure state backs off. You shouldn't get these max error rates unless something bad has happened to your servers anyway. And what you're trying to do is give your cluster time to attach. Now, when we originally released these circuit breakers, they were disabled. They had zero values.
If you're running an older client, please check what version you're running on and see if these circuit breakers have non-zero values. More recent invocations of the client have them set to be meaningful values. I think it's 100 errors within a five-tenth interval or five-second period. It will do that back off. But you should be using these.
This is something our performance and system reliability engineering team did as they're examining this effect. So, let's look at this. I've got my stable state, I'm at a particular transactions per second and I push it harder. I push it into that next state. If we look at this, we've gone from stable but pushing it into a vulnerable state.
We then apply a trigger. So trigger one, I put some heavy load on, that's fine. It actually survived that trigger just nicely.
Trigger two however, I put it on for a longer period of time. When we removed that trigger, you can see it recovered, but it recovered to less than it was in that stable state and it never got out of that until the test ended. So, it pushed it into that metastable failure state where it was trying to work and failing.
When we implemented these circuit breakers, the same test was repeated. So, you can see we did things, we got into stable state, we pushed it into what we believe is the next state up saying, "Hey, I'm in a vulnerable state." And then we applied a series of triggers to it. But after each one of these triggers was applied, you can see that recovery happened. So, we stayed at that increased rate.
There was a little blip whilst it was trying to get into that metastable failure state. But because the clients backed off because of these error rate windows, we never actually got into that metastable failure and the cluster survived. So, this is a good effect. This is something that you want. You want to make sure that your cluster remains healthy, even if bad things are going on. So, it sounds counter-intuitive, but I strongly encourage you to use these circuit breakers.
All right, I hope this has been useful. We've got a number of resources for getting more help, as always, at the end of these webinars. You can go on developer.aerospike.com. There's lots of good developer-style information. We've got good documentation at docs.aerospike.com. A lot of examples on things on aerospike.exam, sorry, Aerospike examples on GitHub. And Aerolab is a tool to fire up clusters and work easily.
What we'd really like however, if you're a developer and you're not on our Discord community, come and join us. It's a place for the developers to hang out and talk about all things Aerospike, ask Aerospike questions, talk with experts and engineers, see what other people are doing. So, there's a link here. I strongly encourage you to come and join us. And on that I'm going to see if there's any questions or comments.
Stacey:
Awesome. Thanks, Tim. This is Stacey again. Thanks everybody for joining us. I'm not seeing any questions right now and I know I've posted there. So, if nothing comes up in the next few minutes here, we're going to let everybody have a little bit of their day back. But we're here to answer your questions and obviously, the Discord community's up here and that's one of the best ways to reach us. We're on there all the time. So, if you have anything that you want to bring up, please let us know.
And again, we have this being recorded, so I will be providing this for everyone that has registered and I'll let you know when that link is up and available. So, thanks all for joining us. Tim, any closing words of encouragement?
Tim Faulkes:
No, just thank you for you time and hopefully this is helpful. Any questions, come and join us on Discord. We would love to chat.
Stacey:
Yeah, we would love it. And everybody have a very safe and happy holiday season and a very happy new year and we're really looking forward to seeing you in the new year. Thanks so much.
Tim Faulkes:
Thanks, everyone.
Stacey:
Bye-bye.
About this webinar
In this Dev Chat, Tim demonstrates how the Aerospike client allows developers to easily interact with the Aerospike database and supports multiple programming languages, including Java, C#, and Python, making it accessible to a wide range of developers. He goes into further detail on the following topics:
🚀 Asynchronous Operations: Dive into the power of Aerospike’s asynchronous operations, enhancing performance and responsiveness.
🧲 Real-time Metrics with Prometheus: Explore how the Aerospike Client seamlessly integrates with Prometheus for real-time monitoring and metrics.
🔄 Automatic Reconnect Feature: Discover the convenience of the automatic reconnect feature, ensuring robust connections even in challenging network scenarios.
📈 Efficient Query Aggregation: Learn about the efficient query aggregation capabilities of the Aerospike Client, optimizing data retrieval processes.
🔄 Cross-Datacenter Replication: Unearth the benefits of cross-datacenter replication, a crucial feature for distributed systems and data redundancy.
🌐 Global Secondary Indexes: Delve into the advantages of using global secondary indexes, facilitating complex query patterns and enhancing search capabilities.
🧬 Extensibility with User-Defined Functions: Understand how user-defined functions extend the functionality of the Aerospike Client, allowing customization for specific use cases.