DevChat: Optimizing query performance with Aerospike expressions
Stacey:
Thanks so much for joining us, everybody.
Tim Faulkes:
Perfect. Thank you very much, Stacey. Appreciate that. And welcome to everyone to the webinar. Really excited to have you here and be part of this. As Stacey said, I really want to keep this interactive. If you have questions or comments, please post them. I'm monitoring them in real time and Stacey will make sure I don't miss anything at the end. And let's go and talk about AeroSpike. And again, this is part of our dev chat series. It is developer to developer, and so we're all technical experts here. So let's just go and find out what AeroSpike expressions are, how to use them and keep this a very interactive conversation if we can.
All right, so what are expressions? I'm going to talk a lot about expressions today. Expressions are a very, very powerful part of AeroSpike. It's probably one of the most underused. People have started using them I've noticed, but only the very basics. So I'm going to take you through what is expressions, why do we want to use them and how to use them and take you from simple to complex. So there's a lot of stuff to cover, so I'll be moving fairly fast today and apologies about that. So what are expressions? Interestingly enough, this very morning, I received an email from a customer and they said, "Look, we want to do something like this. We've got a lock, we want to put a lock on a record and we want to expire it after five seconds." And if the record is locked and the expiry is in the future, I want to return failure. If the lock isn't there or it's in the past, then I want to be able to overwrite the lock. Now this is somewhat complex. They want to do this atomically so that you can't go and interrupt it.
So you can't really just retrieve the record and then write it back because you'd have to worry about check and set and doing things concurrently. So they want to do this atomically. Now there's two ways of doing this atomically in AeroSpike. The first is using a user-defined function, and that would work well, but it's really an ideal situation for using an expression, and these expressions are designed to be atomic. They're designed to be very, very efficient on the server and executed on the server. So we're going to look at expressions, and this is just a teaser at the end. I'll tell you how we actually could do this.
So what is an expression? An expression is a strongly-typed domain-specific language for manipulating and comparing bins and metadata. That's a lot of words just to say; I want to be able to take a record and do things to it on the server atomically and efficiently. The expressions are... they're built-in code. And so you see a whole bunch of examples as I go through, but you'll see that we can do all sorts of complex manipulations as we go through. When you do these, you define these on a per-call basis. So you'll see that we have different sorts of expressions. We've got filter expressions which allow you to say I want the operation to apply to this record or not, and operation expressions: these are complex things I'll talk about in a bit. But these operations run on the server so they're not run on the client, but I pass some per call from the client to the server.
So I've got a lot of flexibility. Different calls can have different expressions and they can do different things. The other sort of expression we're going to touch on is an XDR expression, and this is a way of filtering data, so that you can transmit it between different data centers based on criteria. We'll cover that in a bit. All right, so that's a very high level overview, but how do we use them? So the first thing I want to do is talk about the expression language.
Now the expression language is done in prefix notation. So you define the operation first and then the arguments. So instead of a > b, which would be in fixed notation, you'd do something like gt (a, b). And they'll do the same thing. It'll compare A and B and return a Boolean. The data types that are supported in the expressions are all the data types that AeroSpike supports. So it's things like the scalars, the strings, the Booleans, the integers, the floats and things like that. But it's also got complex types. I can look at lists and maps of the complex, sorry, container data types. I can look at GeoJSON information and even HyperLogLog and bitmaps, things like that.
Each one of the major supported languages on the client side we have has expressions. The syntax is slightly different depending on your language. We're going to go through and use Java and you'll notice that we use Java style syntax. So we've got the lowercase letters for the first word of the method and things like that. If I was using C#, they'd be very similar, but it'll be capitalized letter for the language. So even though this is specific to Java, these examples, those same examples will apply just with slightly different syntax across all the major languages we use.
All right, so let's look at accessing a bin. Obviously an expression manipulates a record, or can play with the information in a record. A record contains bins like columns in a relational database. And we want to do something to it. The first thing I need to do is access that. So if I will have a Boolean bin, I can say exp.boolBin and give it the name of The Bin. And so I'm telling the expression language what type is in that bin? After that in Java, I need to build that expression and that gives me an expression that I can use. I can use that as a filter expression or do things with that. If I want to use a constant, then I use Val. And so I want to compare my integer bin of count to the number 10. And that will return a Boolean. So I've got my filter expression is exp.build and then greater than integer bin of count value of 10.
So that prefix notation is slightly less readable than if you're using infix notation, but it's still very readable and you can see obviously, I'm comparing integer bin of count to 10. Do note that in the expression language, you have to explicitly do a conversion. So if I've got a float bin and an integer bin, so I'm trying to compare the bin of count and the bin of average. Average is a float, count is an integer. If I just do this I will get an exception thrown saying the bins aren't compatible types. So we've got things like typecasts which are explicit here. So I'm going to have to put an exp to float on that integer bin, turn everything into a float and then it will work just fine.
So I wanted to look very quickly at the features of the language, or the sort of operations and power that you have, the language, a lot of this will be reference, I'm not going to cover this in a lot of detail. There's a lot of information to go through in this webinar. So moving fairly quickly, but we've got all the major comparison operators. So =, ≠, >, < and so on. They all exist. You've just got to know the normal prefix. And these are fairly standard in the industry. We also have other comparison operators. For example, on a string, I can do a regular expression comparison. So I could do something like I want to know if the string bin of name starts with A to M.
So I've given it a regular expression. $ is the start of string and the first character either has to be between A and M. I've passed a flag which says ignore case, so it doesn't matter if it's uppercase or lowercase, and this will return true if it matches that. So in my case, the name of Tim would fail, but if my name was Bob, then obviously that would pass. There's other comparisons to do with GeoJSON. For example, I can do point in region, I can compare points and do comparisons within the GeoJSON expressions as well.
There's a lot of arithmetic operations we can do too. I can do things like adding and subtracting, dividing, multiplying, anything to manipulate those bins. And you can see a set of those arithmetic operations there. So it's not like I can just compare two bins. I can say is this bin minus this bin times that bin greater than 17? I can put a very flexible style expression there. Do things like means and maxes and so on. There's also bitwise operators. These are probably less common than the comparison operators and the arithmetic operations we saw, but there's a very full suite of I want to manipulate bits so I can do bitwise ands, bitwise or's and things like that.
Obviously without any logical operators these would be very limited in flexibility. So I want to do a logical and so I take the Boolean bin of valid and the Boolean bin of active, and I and them together. I can take two arguments, or I can take more arguments, it doesn't really matter. And the and will return true if all of those operations are true. Same thing with or. So, I can say if any of these are true, return true. I can negate a Boolean bin and say return true if the value is false and so on.
Or I can use an exclusive, which is probably a very specialized thing. It's a bit like an exclusive or, if you're used to using NAND gates like I am. It really says if exactly one of these arguments is true, then return true, otherwise return false. So in this case, if both pending and valid were true, then it would return false. But if pending was true and valid was false, it would return true.
So they're the fairly obvious operators and they're the ones I see people using most. It is the basics I've got at bin and I want to compare to a number. That's a very standard use case, but people miss out on a lot of power. And so we do have these control operators that allow us to manipulate the flow of the expression. Now the first one is basically a complex IF / ELSE, IF / ELSE, IF / ELSE style operation. I have a condition. And so it starts with exp.cond, and then it takes a condition and an action. So if this condition evaluates to true, it will execute this action. Otherwise, if the second expression, second condition evaluates to true, it'll evaluate the second action and so on. And if none of them default to true, then it will use this default action.
So maybe I've got a grade bin, it has a numeric value in the range of zero to 100 and I want to return a letter grade A, B, C, D or F, depending on the value in that bin, I could do something like this. So if the expression is greater than or equal, for the integer bin of grade to 90, so in other words is grade greater than or equal to 90, then return A. Otherwise if the grade is greater than or equal to 80 return, B. Grade is greater than, equal to 70, return C, and so on. And if none of those were true, then I'm going to return F. So I don't actually need to store that letter grade, I can determine it on the fly.
The other one, which is sadly underutilized is the Let / Def / Var. Now imagine I've got a slightly complex expression and so I've got something like this where I've got a list of items. So my list bin is items, and I want to work out the average size of the list given a running count. Now that works well, but to do that I need to do a division. And I need to divide the count by the total running total by the count. But if count is zero, I can't do the division. So what I'm going to have to do is something a bit more complex which says if the count is zero, do return zero otherwise return the total divided by the count. But I need to know how many items are in the list. That's going to be my count. So what I'm going to do is I'm going to set a let, which says I'm going to create a local variable.
It's not going to be persistent, it's just going to be stored in memory. I'm going to define that as count. So the name of that variable will be count, and the value of that will be the size of the list in this list bin. Now I could evaluate this each time, and you can see here that I'm using that expression Exp.var ("count") twice. I could copy this expression and put it there, but first it becomes more unwieldy, and secondly it's less efficient. AeroSpike would evaluate that twice, but I already know the answer. So using this syntax allows count to be a local variable, which is the number of items in the bin and then my expression becomes quite readable.
So remember our contra up here is an if. So if this count is greater than zero, then what we want to do is return the running total divided by the number of items in the list. Otherwise, we're going to return zero. So it makes it safe. We're not ever dividing by zero. Nice simple syntax. And using that let and def allows you to have very powerful expressions in there. We can also do met record level metadata, and if you're used to getting the information back from the client, you've done a go and get the record, bring me the record back, you'll be used to seeing things like the time to live and the generation counter. But using expressions, we can do a lot more. Not only can we get these operations back to the client, but we can look at them on the server and evaluate them.
So I can look at say... I want to take action if the key was stored against this record. I want to take action if the size of this record on the device or in memory is bigger than a megabyte. So it's very powerful for doing things on the fly, where you conditionally say, "I want to take action if this criteria on the metadata is true." Now some of these are a bit funky, so you might notice for example that the last update returns the last update time in nanoseconds.
The since update returns the time in milliseconds. That's not a big issue. So long as we know what's returned, we know how to manipulate it. We'll see an example of that in a moment. We can also do list and map expressions. So if you're used to AeroSpike's lists and maps, you'll know they can be nested hierarchically and I can have maps within maps and lists within maps and so on. I can do these at any level. There's a set of operations. I've got my map operations and my list operations to manipulate them. Those operations are mirrored in the expressions. So my expressions can do things like: in the map bin, I want to get any values between 10 and this integer bin of end. And I want to just pass that back to the client. So it gives you this very flexible level of manipulating lists and maps as well as using operations which can manipulate the lists and maps in a slightly less detailed way.
So to give you a complex example, let's assume that I've got credit card transactions hypothetically. My primary key is the primary key of the card, my credit card, and a day. So I'm storing all the transactions for a particular credit card on a particular day in a single record. And I've chosen to store them in a particular format. So I've got a bin which is a map. The key of the map is the timestamp of that transaction. The values in the map are a list, and the first item in that list is going to be the amount of that transaction. So at this time, I spent, let's say it's $100, the cents amount, on a new car tire. What do I want to do is let's say I want to get the top five transactions by amount for a given time period. Sorry, time period in a given day, so maybe between 3:00 PM and 5:00 P.M.
I've sorted by timestamp. So the first thing I need to do is say get me the time period. So I'm going to do a map, get by key range. So this is my map, it's sorted by the key and the key is a timestamp. And I'm going to say between the start time and the end time, give me all the transactions that map. That's great and it gives me a set of these transactions. In fact, it'll give me a list of these transactions. But now I want to say I only want the top ones. And you might notice that in my map return type, I didn't say give me the key and the value. I just want the value. And so really I get a list of lists and each one of these sub lists tells me as the first element, what is the transaction amount?
Now I want to get the top five of those, so I can do a list expression, get by rank range, in other words, this value amount and sorted by this first value. So in other words, the transaction amount. Get me the value and from zero to five and because it's the rank range, that will return me the top five transactions within the time period I've specified. So it gives me a very, very flexible mechanism to be able to return just the information I want back to the client.
All right, so we know what expressions are, let's have a look at how they work. And this is kind of clever. I was very happy when I saw how they did this. So expressions are evaluated only if the record exists. If the record doesn't exist, the expression isn't evaluated. So in my server I've got an Aerospike server. This part up here does expression evaluation. And the first thing it's going to do is go and load the record. So it finds the record and loads the record. Sorry, let me try that again. Finds the record and loads the primary index. Now remember that we said some of this data was metadata. It's things like the time to live. It's time the size of the record on disk and stuff like that. It's information that is stored in that primary index. Sometimes we want to do something like say; I want to get the value of the bin thread, if the value of thread is greater than 12 and the record was last updated in the last day, for example.
Some of that we could evaluate from the metadata. So I've got the metadata, all I've done now is traverse memory, typically. I haven't gone to disk to get the record. But that part about was the metadata or was the record last updated in the last day? If that is false, there's no point evaluating the record. There's no point reading that record from the drive. And so what Aerospike will do is it use trinary expressions. After I've loaded the primary index, I can then go and evaluate my expression. If it returns false or true, I know that I can return false or true back to the client. If it returns unknown because it needs that record off the drive, then it will go and load the record in the next phase and then re-evaluate that expression.
So in my example, return Fred, if Fred is greater than 17 and the record was updated in the last day, if the record was updated in the last day, it would evaluate the metadata first, return unknown because it doesn't have all the data, load the record off the drive, reevaluate and then return the right result. If however the record updated a week ago, it would only need to load the metadata, the information in memory, that primary index and then it would be able to return it back to the client and say, "No, this record didn't match."
So there's three sorts of expressions we're going to go through. Filter expressions, operation expressions and XDR expressions. They will use the stuff we've spoken about. Now, filter expressions are very useful. If I'm doing any operation in Aerospike. So it doesn't matter whether it's a point read, a batch read, a query from drive, they will all take filter expressions. The filter expressions say yes or no, return this, or don't return it. So they must return a Boolean. If the record doesn't exist again, that expression isn't run, so if I'm doing a batch yet and I've given it a key and that batch key doesn't exist, Aerospike won't bother running the filter expression. But if the expression fails, null is returned, unless we explicitly set a client policy flag which says fail on, filtered out. If we said that's true, then an exception will be thrown.
Now, let's say I've got a record. I want to update the status of that record to be completed, but only if the current status is active. If the current status is not active, something's gone horribly wrong with my state management. So I want to throw an exception. So in this case, it's a fairly simple example. I just need an equality and I'm going to say is my string bin of status equal to active? I'm going to put that on my client policy. So I'm going to create my client policy. I'm going to set fail on, filtered out because remember in this case I do want an exception thrown. If this is false, I'm going set my filter expression to be this expression. And then I'm going to go and put that record. Aerospike will atomically on the server, lock the record, apply the filter expression and say, "Yes, this is right," or, "No, I need to throw an exception."
And then if it is true, and so this condition that we've set is true, it will set the status to be completed and it will work fine. So it's a combination thing in this case. Again, if I'm doing a batch get or a query, I can put a filter expression on and say, "Just return me the information that matches this." So to give you an example of that, let's say we have a customer and a customer has accounts. Accounts obviously have a balance. Let's assume they're not aggregated, so I haven't gone and stored the accounts within the customer. They're separate level objects and normally accounts and customers would both be top-level objects. So I've loaded my customer, I know the keys of the accounts. So what I want to do is go and do a batch get and the batch get says, "For each one of my accounts, go and get me that record."
But let's say I only want the accounts whose account balances aren't greater than $1,000. So first thing I'm going to do is I've got my account list associated with my customer. I'm going to iterate through and create a list of all those accounts. I don't know which of those have account balances greater than $1,000, but they're the only ones I care about. I don't necessarily want to bring all the accounts back to the client because maybe out of 100 accounts, only five of them have account balances greater than 1,000. I'll do a lot of work, which I don't need to on retrieving that information over the network, bring it back to the client. So what I'm going to do is I'm going to set a batch policy on the filter expression. So my filter expression is the integer value of the balance greater than 1,000.
If it is, this will return true, and I've attached this onto that batch policy. Now what that means is that AeroSpike client will go through, send these keys to the appropriate servers. The server will evaluate if each of these matches and return the record if it does. If it doesn't, I'll get a null back. And so as I iterate through my records, I'll still get 1,000 keys back. But of those, only the records that are not null will have the account balance greater than 1,000. If my account balance was $500, then the server would return null back to my client and I wouldn't be able to retrieve that record. So it gives you that level of filtering that is very useful.
Now some of you are used to secondary indexes In AeroSpike. You'll know that they have the ability to filter on a single predicate. As we add more filters, you can now filter on multiple predicates. So I can say, "Show me everyone who lives in Colorado, whose name is Tim, and I can combine multiple predicates into a single expression." All right, operation expressions, and these I find are not well used and they're very, very powerful. So this is the important stuff. There's two different types of expressions. There's read expressions and write expressions. And a read expression effectively allows you to create pseudo bins and say if you're used to SQL, you're used to doing something like select A + B as result. Read expressions effectively give you that power. They allow you to do those combinatorial effects that allow you to manipulate a record and create bins in the result set that aren't stored in the database that combine information from those bins.
And so I don't have to bring all that information back to the client just to do that derivation. Write expressions are similar, but they are useful for writing new information to bins. So we'll see some examples of these, but they're both very powerful. They're both used with the operate command and they can be used on single record transactions and background scan operations. Only write expressions can be used on those background scan operations. Background scans don't return any information, so there's no point to using a read expression on those.
So let's have a look at this. Let's say I've got a key, and I don't actually want the information back associated with that key. I want to know how much memory size, device size and last update time was from that record. So just metadata. Well, what I can do is I could do my client operate, passing at my key and I'm going to give it three expression operations. These are all going to be reads. So the first one is I'm going to create a new fake bin called memory size. And the value of that is simply going to be the metadata memory size. Bear in mind that in version seven, memory size and device size have gone away and it's record size. So this code would work pre-version seven.
I can do the same thing for all these other operations. The last update time, however, if you remember I mentioned that last update time was somewhat unusual, it's measured in nanoseconds. I might want to be back in milliseconds. So I'm going to take my nanosecond time, my last update and divide it by a million and this will give me my last update time. When I go and look at this record, what I can see is I've got the memory size, the device size, and last update time. The nice thing about this is Aerospike didn't even have to read the record off Drive, to give me this information. So it's very fast and it allowed me to combine parts of the metadata as well as bin data if I'd wanted, into information that came back to my client in new bins that don't exist on the server. They're pseudo bins similar to a select as in SQL.
So let's look at another example. I've got a bin, it's a very contrived example, and so I've got a record, it's got three bins, A, B, and C, With these three values. Instead of getting the record back and doing my computations on the client, I might want to do things like I want to know the minimum, the maximum and the average and I want the server to work it out for me. So again, I'm going to do an expression read, passing the bin I want returned, so the minimum, and in this case I'm going to use exp.min of the integer bin A, B and C. Same thing for the max. And then for the average I'm going, I'm divide those sums by the value three and that'll give me the average. So I'm using the server power and the fast evaluation of these expressions to derive me these quantities.
So that's read expressions. Really useful for deriving information from server quantities. The other part I can use is write expressions, the same thing applies. So instead of reading them back and creating these pseudo bins that are brought back to the client, the difference is a write expression will go and update the record. So I've got my bin values and now the only difference I've done is instead of using the read, I'm now using write. You might notice also that I've got these flags at the end. They're all default, but these have changed from read to write flags. The rest of the expression is identical. So the actual thing that computes the value is identical. But now because I'm doing a write, what I'm doing is I'm going and updating the record, I'm affecting that record on the database. So now if I go and read the record at the end of my update, I've got my bins A, B and C.
But the same record now actually has persist on the server, the min, the max, and the average. And so this very useful if you're doing something like I want a running average and I want to know if the average is greater than this value. Maybe you're doing a shopping cart application and you want to know if the sum of the values in the cart is greater than a certain threshold. Maybe you're offering a special discount if they're spending over $1,000 in the cart. Something like that. You could do this on the server side and just return a flag which says yes, it's true, but we could even do a scan for: show me all the records on the server that have a value in the cart of greater than $1,000. It's not persisted, but you could still do a query on it by scanning through the records and applying a filter on each one of those. But you'd have to use the write expression to update that as you went.
The last sort of filters we're going to talk about is XDR filters. Now I'm not going to go into a lot of detail of this. They're very similar to all the other filters we've looked at. But what happens is as XDR goes to ship a record from a source to a destination, if you've put a filter on it, it will evaluate that filter before it sends the record. So it's a final go/no go decision. Should I ship this record to the destination or not? Now this can give very fine grain control over what data you ship between data centers. This can reduce your network storage and the bandwidth, but it's also very useful for things like compliance. So many people are familiar with say GDPR, which is a privacy rule, which says if data in the UK was originated in the UK, it can't leave the UK.
But if the data originated outside of the UK and I modified in the UK, I can send that away from the UK. So imagine I've got two data centers, one in the UK, one in the US, a record originates in the US and we've put a flag in a bin which says origin. So we might say US. Same record if it was created in the UK would have a flag saying UK. Well, when we go and update that record or write that record, we could set a rule which says, if that origin bin is the UK, don't ship. If it's not, then ship. And so this would allow us to comply with GDPR very simply, without having to have any code. It's just one expression you create and you put attached to the shipping on the XDR filters. Very powerful and there's a lot of information on the link that I've provided there.
All right, so let's look at some examples. Imagine I've got a shopping cart. The shopping cart keeps items. So I've got my item ID. In this case, the item ID is a blue sofa and its price that I paid was $340 and I've only got one of those in my cart. Another item ID might be cough drops and they're $5.95 and I've got three of those in my cart. I want to write a method. Now the method should have this sort of signature, and if the item doesn't exist in this map, then I want to add it to the map in the same format as I see here. But if the item already exists in the map, I just want to increase the quantity. So if I've already got the cough drops in, they try and add another cough drops. I'm not going to put another entry in the map, I'm just going to increase this quantity.
But if they're putting in, I don't know, Coke product or something like that, it'll be a new entry and a new entry in the map and we'll go from there. So how are we going to do this? Well, the first thing to think about when you're designing an expression is I want the pseudocode. What does the pseudocode look like? How should I use it? So my pseudocode is if the item exists in the map, then add the quantity we're passing in to the existing quantity. If that was false, then I need to insert an item into the map. So I've got an IF/ELSE/ENDIF. So, if you remember those conditionals we spoke about earlier, then that's what we want to use. That's what gives us that IF/ELSE/ENDIF sort of syntax. So my solution would look like this. I'm going to create a map. The map contains the name, the price, the quantity, and I'm going to use this if that item doesn't already exist in the map.
So my expression says... the first thing I'm doing here, and this is a little contrived for this example, is I want to make my map key-ordered. Normally when I go and create these shopping carts, I would set that order to be key-ordered anyway. And if you don't do that, it would still work. It just won't be quite as efficient. But then I'm going to do a right, remember I'm changing something in the bin based on a condition. So I need to tell it that I'm using an expression which is doing a write. So I'm going to change the item's map and what my expression is I'm going to use that condition. So if the map key or the key exists, so does it already exist in the map? And I'm looking to see does this item ID already exists in the map? If it does, then all I want to do is put the items in.
So I'm going to change this and say insert these items into the map bin. I'm going to put this item ID. Sorry, I've lost my train of thought. So this map that I created with the data, I'm going to go and put that into the items. If that's not true, then I'm going to go and increase the quantity by the amount. And so you can see that IF/ELSE/ENDIF has worked really nicely. It's a few lines of code, but it means that my client code doesn't have to try and retrieve the items from the server, see if it's updated and send it back. And so it prevents read/modify/write. I did it without using a user-defined function, and it's very, very efficient. These are evaluated in hand-tuned C code. And so the efficiency of this is very high.
All right, expressions are great, but how do we debug them? How do we solve problems with them? We've seen some fairly simple examples. I've seen some very complex examples. Maybe half a page of expressions and they can get a bit ugly. So I'm going to be very honest here. When I was making this slide I was going through and say, "Well, I'm going to do something a bit contrived. I'm going to calculate the hypotenuse." So I've got, let's say a right-angled triangle with sides A and B and I want the hypotenuse. Okay, that should be fairly easy. I'll make an example and then I'll make a mistake in it and then I'll show people how to solve it from there. And I put this expression in and then I realized that even though I hadn't deliberately made a mistake, it was giving me definitely the wrong answer.
So it worked really nicely as an example that I want to debug. Now you might notice we don't have a square root function. So I've had to use a bit of maths and define a square root function of this horrible expression. But then what I try to do is while I've got a A², B², so I want to do something like I want to read, so I'm creating a read expression, which is going to give me the hypotenuse and then I'm going to square A and square B and take a square root of it. And just looking at this, it's hard to know exactly what it's doing. If you didn't have this comment up here, it'd be hard to work out exactly what it's doing. So this is wrong for a start. If I can evaluate this and print out the result as I'm doing here, you can see my A is 13, B is 16, and my hypotenuse is 1. That is very, very wrong.
So I screwed up my expression. This expression I would list as unmaintainable. It's really hard to read. You can't look at it and understand what it's doing. How am I going to look at that and work out why did my hypotenuse turn out wrong? So if I'm using a complex expression, what I will typically do is I will break it into smaller expressions. So you can see that I've created an expression called F2, a float value of two, which is just the number 2. I've got bin A, which is the integer bin A going to a float, same with bin B. I'm squaring it. So I've got square of A is take bin A and raise it to the power of bin B and so on. I can keep doing that. So each one of these is another more complex part of the expression. And my hypotenuse is the exponent of the power of F2. Multiply.
So that big horrible expression, which gives me that the square root and that works well. So this has simplified it. It is now much more readable. Still didn't solve my problem. So instead of doing this in one operation, what I can do is I've still got my answer, but I can now print out or I can read each one of these variables as I go through. So I can look at what is the value of A and B. Fairly self-explanatory. What is square A? So I just build it and then say, "Go and show it to me." What is square B? Same thing. The sum of the squares. Same thing. And this gives me each one of these steps as I go through. And I can go through and say, "Well, A is 13, B is 16, obviously. Square of A is right? It's 13². Square of B is 16²."
The sum of those squares, if I add those two together, this is right. My hypotenuse is horribly wrong. So it went wrong in this part here. And then as I inspected it, I worked out that log takes the number first, not the base. I'd passed at the base, base two, and then the number. And so I'd screwed up the order. But on my previous syntax using this, that would've taken me forever to try and work out. I hadn't broken it down into little pieces that helped me maintain it. Okay, so this is a very powerful technique. Yes, it's a bit old school, it's like doing a print line on variables instead of having a debugger. But typically your expressions aren't that complex. And once you've got them working, they work really well from they're on.
Okay, so my other debugging tips. In general, Aerospike sends back error codes and not error messages from the server. So there's a server-side error code which says it's 37, sends that back to the client. So there's less traffic on the network and the client says, "Oh, 37 is this. I'll give you a nice error message that associates with this." The problem comes in if you screw up the expression. Aerospike will send a code back saying parameter error or something like that. Or not applicable error. These aren't terribly helpful. They don't tell you what's gone wrong. And so if you are debugging, what you can do is, particularly in development, you can set your logs to warning and all those warnings or... sorry, warnings or info or something higher than warning. And those errors in the details will appear inside the server logs.
And so you can see when I go and look at my server logs, I'm seeing things like error for invalid type list at condition one. And it's quite useful. It gives me something that says I know what's wrong with that expression. The drawback with this is it works really nicely in development. If you're using your own development server, which server to go to. And that's typically where I find almost all my problems. If I'm running in a cluster, however, you don't know which node to go to. The error will be on the node that evaluated the expression. Now if it's a scan or a batch or something like that, or a query, then it should apply on all nodes. But if it's a point expression, so I've gone and looked up one record and got the error, I need to know which node that's error appears on.
And so that's where I can use AQL's feature of explain. So I can explain, select record ID from this namespace and set where primary key equals whatever my primary key is, and it will tell me which node it evaluated on and then I can go and look in the log on that server. All right, so covered a lot of information. I gave you a bit of a teaser at the start about I actually had someone reach out with a real problem this morning and say, "How would I do this?" And so here's the solution. Remember, here's the requirements. I want a lock and I want to make sure it's atomic. I want to make sure if the lock has expired or didn't exist, then I can go and put a new lock. If the lock hasn't expired, then I won't want to return failure. So this is a bit tricky, because it's two parts.
One is updating the lock, and the other was did I update the lock and return back to the client? Was it successful or was it failure? So what I'm going to have to do is I'm going to have to have two operations. The first will be a read to say, "Yes, it was successful, or no it wasn't." And then I'm going to have a write which is going to update the lock if it's true. Now the first thing, if you look closely at this, the same predicate applies. I want to make sure a lock is held if the bin exists of lock, and that lock bin is greater than now. So the time stored against this lock will be future set. So if the lock has expired, that time will be in the past, and that's what this lock held is evaluating. Now I'm going to use that in both my read and write expression.
So my read expression, I'm going to set a new pseudo variable called success, and it will be was the lock held negated? Because lock held will return true if the lock is still valid. So I'm just going to note that and that will return false if the lock is not... sorry, they'll return true if the lock is not held and false, if the lock is held. I'm also going to update that lock. I'm going to do a write. Now, the first thing I'm going to do is I only want to update it if the lock is not held. So I'm going to say first is the lock held? If it is, I don't want to do anything, and I can pass unknown to it. And that will say, "Don't do anything if the lock is held." Otherwise, the lock is not held, I'm going to go and put in a new time of now plus however many seconds, and that will go and update the timestamp.
I do need to pass one right flag here. I have to say no fail. By default, this unknown, if you pass unknown, it will throw an exception. I don't want an exception. I just wanted to return this successful indicator. So I'm going to say no fail or eval no fail. And that will suppress that exception. This returns a record, it will have the success flag set to be true or false, and the lock will be updated. So in one operation, I've atomically done the requirements. That's only a few lines of code.
All right, that's all I want to talk about. Are there any questions that I can answer? Anything in chat or is there anyone wants to ask?
Stacey:
I'm not seeing anything as of yet. I would also... yeah, good. Thanks for popping up that slide, Tim. This gives you a couple of different options of places you can go. Somebody's raised their hands.
Tim Faulkes:
I do see a hand up. Are we able to give people ability to speak, or is it-
Stacey:
Yeah, let me do that.
Tim Faulkes:
Perfect. Thank you.
Stacey:
Hang on here. Yes, go ahead.
Speaker 3:
Hey. Hi, Tim, can you hear me?
Tim Faulkes:
Hi. Yes.
Speaker 3:
Yeah. It was our team that asked this question about the locking. So I'm very thankful for these expressions and all. I think it's something that we'll do in the future, but the question that we actually had is that can we achieve this with the operate keyword and predicate expressions on a 3.9.16 client version in C#? Because we are right now with that version, and we have some type deadlines to get this feature going, and we really can't upgrade at this point. So I did try some conditional writes with predicate expressions and operate. I think we can achieve something very similar with the same thing, right? We can say write with the lock only if the timestamp is this, or if the timestamp is empty or it's in the past. I mean, it'll be an operate keyword with predicate expressions in the bin. So would that still work out and is it atomic, was my question?
Tim Faulkes:
Okay, great question. And just to set a bit of background context, because not everyone knows the history, Aerospike started out without any form of expressions. The first sort of expressions we got was something called predicate expressions, which is what this person's using. And the predicate expressions were good, but they had some limitations. And so we threw away predicate expressions and created these expressions and the expressions have a bit more power. So you're testing my memory here on predicate expressions.
My understanding and just thinking about this, I think predicate expressions didn't have that ability to do the right. I'll have to check. So let me take that offline because it's been more years than I want to admit since I've dealt with predicate expressions. I've done the expressions a lot. But yeah, if you could come and join our Discord community or send me a personal message, I'm very happy to look at that and say, "How would I do this with predicate expressions?" But just because it's so old, I wasn't anticipating the question. I hadn't gone and refreshed my memory on them. So I apologize about that. I like being able to answer all questions off the top of my head, but yeah, that's going to need some research.
Speaker 3:
Sure, thanks, Tim.
Stacey:
That comes up. Yeah, so go ahead and certainly in Discord you can get the whole team chiming in too, so there might be some viewpoints that would be helpful to you.
Speaker 3:
Sure.
Tim Faulkes:
Perfect. Are there any other questions from anyone at this point? All right, well, thank you very much for your time. I really appreciate you staying on and listening to this. I hope it was helpful. As Stacey said, we would love feedback.
Stacey:
Always open conversation on our Discord server. That's the easiest and quickest way to get ahold of us. Okay, and with that, thanks Tim, so much for this. This is really, really helpful and definitely informative. So we have this out here for people that can reference it in the future. So appreciate everybody's attendance today. Definitely give us feedback. We always like to hear other topics that you want, or that you need to learn or other solutions that you're looking for. So please let us know. With that, have a great day.
Tim Faulkes:
Thanks everyone.
About this webinar
Aerospike Expressions allow server-side filtering and altering of records, providing an effective, easy alternative to check-and-set patterns or user-defined functions.Many Aerospike developers are aware of the basic functionalities that expressions provide, but this just scratches the surface in terms of capabilities. In this talk we will show the benefits of expressions and how to maximize their power.Watch this DevChat team to learn:
What Expressions are
How to use them
Real-world examples of their power