Database clustering: Use cases
Which database technology should you use for your specific use case?
When building a global cluster, an architect will choose between two options – both of which are available within the Aerospike database:
Strong consistency, or synchronous cluster technology. The data is always consistent throughout the cluster. However, synchronous clusters typically run much slower than asynchronous ones. In addition, some data may be temporarily unavailable.
High availability, or asynchronous cluster technology. The data is always available, yet it may be stale and inconsistent, or even non-existent, and Writes may get lost; still, it is good for many use cases. This technology is often selected because it is faster than synchronous, though less accurate.
Wait, what? Is it OK to use stale or incorrect data? Or even lose some Writes? How is that possible?This is a great question, and a legitimate one, so let’s answer it first.
High availability use cases
The common denominator for all High Availability use cases – the emphasis is on having a quick user experience. For that to happen, it’s Ok if the application gets stale or incomplete data, including losing some Writes. The application will still function and produce good business results overall.There are many use cases for high availability. We’ll look at three of these: recommendation engines, AdTech, and fraud detection.
Recommendation engine
You enter your favorite ecommerce site, and it recommends which items the website thinks you’ll enjoy. You enter your account details on a different shopping or movies website, and they will immediately show you products or movies they think you’d be interested in.
First, the website identifies you. Then, it instantaneously provides you with recommendations – which items to buy, which movies to watch, etc. You may be anywhere around the world, so the website needs very quick Reads of your user profile, to provide quick recommendations for you to act upon. To achieve that, they’ll want to perform Reads from a local server, hence a distributed database system.
Let’s suppose you are on a home furnishings website, for example. You recently bought a new computer chair, yet that Write got lost, and it wasn’t recorded to your user profile. Did any catastrophe happen? From previous purchases you’ve made, the site knows your preferred furniture style (modern), that you have a home office, and probably live in a house or a spacious apartment (you bought a few large pieces of furniture). That missed Write will not drastically change your profile; they still have plenty of statistical information on you, and know what to recommend. In the worst case scenario, you won’t buy the recommended items.
To conclude this section, we see that High Availability use cases place emphasis on quick user experience; if the data is stale or even missing – they can still provide the user with a great user experience.
AdTech
You surf the web, and are shown ads on various pages. Many ads are tailored exactly for you, offering things you’ve recently searched for or researched.
Welcome to AdTech. Within a split second of entering a website/page, you’re presented with an ad, often relevant to you. It literally happens in real-time (we’ll address ‘real-time’ in a separate blog post). What happens behind the scenes? And how is that related to our discussion?AdTech companies keep “user profiles” for each of us. They identify us via the hardware’s MAC address (unique identifier for each piece of hardware connected to the Internet). AdTech used to rely solely on cookies, but cookie usage is crumbling for various reasons. As a result, third-party data sources are now used to guesstimate the user and their behavior.
The user is anonymized, so AdTech uses categories to define each user. There are many categories for each of us – things they learn about us over time, as we surf the web.
Take Alex, for example. AdTech knows Alex is a man, bald, probably single, an engineer, 35-49 years old, lives in the San Francisco Bay Area, loves gadgets, action movies, an avid NBA fan, and likes to grill. That’s a narrow profile; AdTech probably has many more categories associated with Alex.
Alex now enters a website. Behind the scenes, multiple AdTech companies compete on serving ads to Alex, on the page he just clicked. The entire process takes 100 milliseconds, shorter than the blink of an eye:
Identify the user, get his user profile, and see if there’s a relevant campaign for him (all that within 5-10 milliseconds.)
Submit a bid.
Win the bid.
Serve the ad on Alex’s screen.
Speed is crucial. The database must be ultrafast. Moreover, all user profiles must be stored in many locations around the world. They must be close to the bidding servers, which are also spread around the globe. This way, the Read is local, to minimize latency. AdTech databases are therefore highly distributed, across many geographies.
Which ad will be served? The common scenario is the AdTech company found Alex’s user profile, saw where he’s been surfing lately and what he’s interested in (perhaps researching for a new grill), and served a relevant ad to Alex’s profile.
But there’s another possible scenario. The database near San Francisco lost connection, and the AdTech company doesn’t know anything about Alex. They can still serve him an ad – maybe a cruise to the Caribbean, or for a new SUV. These ads may be less relevant, and Alex may not click them – yet they might still be served in a lost connection scenario.
For that reason, High Availability and speed is key in AdTech – we need very fast Reads, and we’re Ok if the data is not fully updated, some of it gets lost, or even if it doesn’t exist.
AdTech systems use lots of Reads and much fewer Writes. There will be a Write once the user profile updates with new data on Alex (he searches for something new). There will also be a Write once the AdTech company wins a bid – as that information needs to be communicated to the entire network. However, it’s acceptable if New York didn’t see that Alex just got an ad served out of London – the systems will sync eventually; and even if some Writes are lost, a relatively accurate ad still got placed.
Fraud detection
At a conceptual level, it’s pretty similar to AdTech. There are many users, and each has a profile with lots of data regarding the user’s past behavior (purchases, amounts, locations, etc.)
The user now tries to perform a certain transaction – buy an item (online or in-person), ask for a loan, etc. The financial provider (PayPal, credit card company, etc.) operates a fraud detection system. That system must decide, within a split second, whether that transaction is legitimate and hence approve it; or whether it’s a fraudulent transaction that should be rejected. This happens every time you use your credit card – both online and in person. Similar to AdTech, the fraud detection system must be globally distributed. If I’m a user located in Boston, I may buy something from Amazon (US servers), AliExpress (Chinese servers), or try to book a hotel in Paris (French servers). I may also be traveling on business or for vacation in a different geography. For the fraud detection system to provide quick, instantaneous feedback, the user profiles must be located in many geographies to minimize Read latency.
As mentioned, the fraud detection system goes and reads the user profile from the nearest user profile database server. If it finds that data, then it bases its decision (legit/fraudulent transaction) on that user data. But what if, for some reason, the fraud detection system cannot get all the data it needs? In this case, it will take whatever data it has and make some assumptions, based on sophisticated algorithms. Very partial data will be enough for the model to decide whether that transaction is fraudulent or not.
Strong consistency use cases
Strong Consistency use cases solve a different challenge. Here, the data provided to the application/user must always be current and accurate, even if there’s a delay in providing the data. Missing a Write, or providing stale or inaccurate data, is never an option.
In technical terms, Strong Consistency means:
Low latency Reads
Reads are always correct/updated version, and consistent across geographies (not stale or inaccurate)
High latency Writes are Ok
Never lose a Write
Let’s examine three Strong Consistency use cases: money-related actions, inventory management, and social graphs.
Money-related actions
The record for anything money-related must always be accurate and up to date. It can never be stale or incorrect.
Here’s an imaginary scenario. You live in Denver Colorado, and your spouse is in Tokyo for business. Your spouse just withdrew $200 worth of Japanese Yen from your joint bank account, via an ATM in Tokyo. You both immediately and simultaneously check your bank account. One of you sees $200 more in the same account, compared to what the other sees.
If that happens with a fraudster, then they can poach that money before the two systems sync. Such a scenario is catastrophic and can never happen (and it doesn’t, because everything money-related always uses Strong Consistency.)
Money-related actions go well beyond money transfers or ATM withdrawals. This includes payment systems, brokerage accounts, anything around crypto currency markets, and more. For all of these applications, the database (and data) is divided across multiple regions. It uses a synchronous active-active database topology, where we can write from each and every region – so nobody Reads stale data, and nobody Writes to data that’s not current. This gives money-related businesses a highly robust solution because the data is always current.
Inventory management
Your favorite retailer just announced a special sale. Excited, you go to their website and order a new coat. A few days later you get an email from the retailer. They apologize for a mistake that took place in their inventory system – they actually ran out of stock, and hence cannot deliver your new coat (yes, they’ll refund you, but that doesn’t compensate for your frustration). This is a hypothetical example that should never happen in real life.
Inventory must always be accurate, at each and every split second. Inventory, by the way, also includes ticketing systems – the inventory in that case are seats/services. If the inventory is inaccurate, then the retailer may sell a product they don’t really have; an airline may sell a seat that was already sold; and the same for the movie/concert hall, selling the same seat to more than one customer.
Imagine a special sale that attracts many thousands, perhaps millions of shoppers; or a new concert tour announced by a highly popular artist, which attracts a huge number of fans. They all compete on products/seats at the exact same time. To prevent the situation of selling something that doesn’t exist (i.e. it was already sold) – inventory systems always use Strong Consistency, so data shown is always accurate and up to date.
Social graphs
Are you using Facebook, LinkedIn, Snapchat, or any other app that is built around your interconnections with other people, groups, and organizations? Social graph is the depiction of these connections. Using Facebook as an example, the social graph includes all of your friends, the people and organizations you follow, etc. Social graphs also exist in gaming platforms, where you’re connected to your friends, and also in other modern apps.
Why is it critical for social graphs to never miss a Write, and always show current data? The reason is that determining whether you are friends with someone, or not friends with that person, makes a huge difference. Let’s look at a simple example.
You decide to hold a party. You want to invite all your SnapChat friends who live in your area, say Toronto. You review the list of your friends and realize that Mike is still there. However, you had a major fight with Mike a couple weeks ago and no longer view him as a friend. Obviously, you don’t want to invite him to your party. You unfriend Mike on SnapChat, and send an invitation to all your friends, announcing the big party you’ll hold next weekend.
Imagine that Write to the database, unfriending Mike, gets lost or is stale. Mike may show up at your party because your social graph was not updated and Mike received your invitation. This is one small example; there are many other scenarios, including potentially dangerous ones. For that reason, Writes related to social graphs cannot be lost or stale.
In conclusion
Above were just a handful of use cases, to demonstrate the difference between High Availability and Strong Consistency database settings. The Aerospike database is one of a few that supports both of these settings; the specific implementation will depend on your specific use case.
What is your use case? Is it one of those mentioned above, or a different one? Contact us to discuss your specific use case, and to learn how Aerospike can help you solve your business challenges related to managing your data.
More about this topic
Aerospike’s Multi-site Clustering for Synchronous Active-Active Replication
Aerospike’s Cross-Data Center Replication for Asynchronous Active-Active Replication