Turning significant, complex inputs into fast and simple decisions
Some data workloads don’t require real-time results. And not all applications pull data from dozens of diverse sources or require the flexibility of different data models. But a rapidly increasing share of organizations do need this performance and flexibility. In this blog, we examine one use case that illustrates the need for a real-time, multi-model database: fraud detection. As consumers, we see evidence of fraud and fraud detection daily and can appreciate the complexity and need for speed with each retail and financial interaction.
Fraudsters innovate: financial fraud models must adapt, databases must be flexible
Fraud and fraud detection are complex and constantly evolving due to the use of sophisticated tactics by fraudsters and the rapid advancements in technology. Typically, detection must happen in under a second, and accessing and processing data can only consume a few hundred milliseconds of that. The longer it takes to load the data, the less time there is to apply the fraud detection algorithms.
Detection requires analyzing large amounts of data that can come in varied forms – transaction data, user profile data, device data, geolocation data, and behavioral data, to name just a few types. This data can be structured or unstructured and must be analyzed over a period ranging from a few days to months to identify anomalies uncharacteristic of genuine customer transactions. This data has to come together in a machine learning model that assesses the validity of a transaction in real time.
The flexibility of the underlying database is critical when it comes to incorporating the diverse data types required for fraud models. Aerospike can handle the different data types for fraud models, including document data like JSON, Protobuf, or Avro. It uses Collection Data Types (CDTs) and stores each row as a document, allowing the flexibility to model lists, maps, and sets containing any number of nested columns and fields. Relational models and columnar databases are less suited to handle this.
To illustrate the value of a document model and CDTs, consider a fraud detection model whose inputs include the customer’s average spending over the past 24 hours and the past 30 days. Since these cover a sliding window from the instant of fraud assessment, the data cannot be pre-computed. The application would need to read all that customer’s transactions for the past 30 days, upwards of 100 reads for some customers, but potentially 1000s for the outliers with the highest fraud risk. Even with reads of 1 millisecond, these 100+ milliseconds alone could be prohibitively slow for fraud detection.
CDTs, however, let you build real-time aggregations to minimize the reads to a predictable number. For example, daily or more granular transaction amounts and counts can be stored within each record as lists. In this example, the number of reads would be in the single digits, even for the most active accounts, significantly improving the performance of fraud detection models. Watch this to learn how this works.
Here are a few examples of how Aerospike’s flexibility and performance help its customers detect fraud.
PayPal: reducing financial fraud exposure by 30x
PayPal is a global online payments company that enables its users to send, receive, and hold funds. Aerospike has helped PayPal reduce fraud exposure by 30x.
Aerospike’s scalability and reliability play a vital role in PayPal’s fraud detection solution. PayPal processes billions of transactions each year, and it needs a database that can handle this volume of traffic. Aerospike’s ability to scale efficiently with its patented Hybrid Memory Architecture (HMATM), combining memory and SSDs, means that PayPal can seamlessly manage increased traffic.
PayPal initially deployed Aerospike as a key-value model, increasing the amount of data their machine learning models analyzed to 100 terabytes, a 10x increase. However, they recently layered on a graph capability to supplement its fraud model, pushing the total amount of data to petabytes scale. The graph approach helps determine linkages across data points to indicate influence, strength, and probability levels. This has effectively uncovered hidden ultimate beneficiaries or dishonest nodes in the graph. Learn more about PayPal’s graph approach here.
Barclays: efficient architecture for machine learning at scale
Barclays provides businesses and consumers with a wide range of financial products and services worldwide. Barclays uses Aerospike to power its card fraud detection solution, which incorporates floating window calculations as described above.
Barclays processes millions of transactions daily, and its fraud detection solution takes advantage of the Aerospike database’s scalability and reliability. It uses Aerospike HMA to handle this volume in real time, storing indexes in RAM and data on disk. As a result, Aerospike has simplified Barclays’ architecture, reducing the number of platforms needed while seamlessly handling 6x data growth.
Aerospike has helped Barclays reduce latency in its fraud detection while delivering the strong consistency and security requirements essential to Barclays. Learn more about how Aerospike is helping Barclays fight financial fraud.
TransUnion: 80% latency reduction to improve banks’ financial fraud models
TransUnion is a global information and insights company that provides a wide range of services to businesses and consumers, including credit reporting, identity theft protection, and financial fraud prevention.
TransUnion uses Aerospike to power its fraud detection solution. Aerospike’s scalability, reliability, and security make it a good fit for this demanding application.
TransUnion’s fraud detection solution uses a variety of data sources, including transactions, user profiles, devices, IP address intelligence, geolocation, telecommunications, and behavioral data. Aerospike stores all this data in a single database, making it easy to analyze different data points and identify fraudulent activity.
As a result of using Aerospike, TransUnion has been able to securely incorporate this into its customers’ real-time fraud detection models. Aerospike’s flexible architecture supports this at scale, cutting the latency by 80%.
Importance of real-time and multi-model
Aerospike has a strong reputation for real-time performance at scale. Our roots are as a key-value database, but we continue to expand the data models we support, driven by our real-time world and customer input. We offer strong support for document-based models and the ability to access Aerospike data with SQL, and we continue to work on graph and time series.
This adds up to a powerful, multi-model database that these and other customers use to detect financial fraud in real time. When you can load 10x the data in 1/10th the time, you can increase machine learning algorithm accuracy to both reduce fraud and customer delays.
Hear from TransUnion
Gaurav Bairaria, TransUnion’s Senior Director of Product Engineering, has been instrumental in delivering their fraud detection solution TruValidate. Please check out Gaurav and Aerospike CTO and Founder Srini Srinivasan on demand to learn more about TransUnion’s journey and impact. Register here.
Additional Resources
Analyst Report:
Webinar On Demand:
How a Better Fraud Data Layer can Serve as a Competitive Advantage
White Paper:
The Rise of Payments Fraud: How to Supercharge Fraud Engines to Beat it!