We are excited to be a part of AWS re:Invent 2024. Visit us at booth #1844 in Las Vegas.More info
Blog

Seven case studies where Cassandra performance fell short

Explore seven case studies where companies switched from Cassandra to Aerospike. Discover how these transitions transformed business operations and enhanced customer satisfaction.

george-demarest-600x600-1
George Demarest
Director of Product Marketing
September 10, 2024|10 min read

As companies continue to seek more cost-effective technical solutions, Cassandra’s open source licensing makes it an attractive option. However, despite the absence of licensing fees, many organizations find that their Cassandra cluster can create higher operational expenses due to the performance issues it generates, especially around latency, throughput, and overall consistency. In these seven case studies, we explore how companies struggled with Cassandra's challenges, including monitoring inefficiencies and scalability concerns, ultimately leading them to migrate to Aerospike for better performance and cost efficiency.

1. TransUnion TruAudience: Superior TCO and operations

The challenge: Credit reporting company TransUnion used its marketing platform TruAudience for identity resolution, among other capabilities. Its existing data store ran on a Cassandra database and was becoming increasingly expensive, unreliable, and underperforming, hurting TransUnion’s profitability. The company faced large and unpredictable data latencies, frequent database incidents, and downtime that challenged many of its business processes.

While the Cassandra open-source software was free, operating it was not. Maintaining Cassandra clusters was difficult and time-consuming, diverting resources from higher-value projects. Its performance was unacceptably inconsistent, with p99 latency of nearly four seconds. Moreover, its low reliability was causing increasing incidents and downtime. Combined, this caused missed SLAs to TransUnion’s customers.

"Before Aerospike, we were spending more and more of our time on the care and feeding of Cassandra, and less and less time building new product offerings,” says Jason Yanowitz, Executive Vice President and Chief Technology Officer for TransUnion TruAudience. "With Aerospike, we’ve now cleared the roadmap and we’re just focused on adding new functionality to our platform for our customers."

The solution: TransUnion replaced Cassandra with Aerospike, enabling it to reduce its server count from 450 to 60. Consequently, this reduced TransUnion’s operational footprint, reduced its infrastructure expenses, supported real-time data replication across regions, improved its p99 SLAs to less than 10 milliseconds (ms) reads and less than one second writes, and overall improved reliability and uptime.

The outcome:

  • 68% TCO reduction over three years

  • Performance improved 100x at the 99th percentile

  • Business processes executed 90%+ faster

Read the user story “Debunking the free open source myth” to find out more about TransUnion’s experience. 

2. LexisNexis Risk Solutions: Achieving scale while saving costs

The challenge: LexisNexis Risk Solutions, formerly ThreatMetrix, performs fraud detection for its customers – typically in e-commerce and financial services. Their existing Cassandra cluster struggled to meet the low-latency requirements, often leading to customer dissatisfaction. With such performance metrics failing, LexisNexis sought a database solution with higher throughput and more reliable performance.

With Cassandra, LexisNexis Risk Solutions faced performance challenges, particularly latency and consistency of response times. Having a low latency response is important so customers can make the right decisions to avoid fraud quickly. The longer an operation takes on an online website, the more cart abandonment there will be, which costs customers money. 

"We were basically running against the wall with our previous database,” says Matthias Baumhof, Vice President of Worldwide Engineering, LexisNexis Risk Solutions. "The request duration was hurting our customers because they couldn't make a risk decision within the allocated time limit.”

The solution: Migrating LexisNexis Risk Solutions from Cassandra to Aerospike reduced the number of servers from 96 to 28. This all came with a reduction in average latency from 120 ms to 30 ms.

The outcome:

  • 3.4x reduction in server footprint

  • 4x reduction in average request latency

  • $3.3M cost savings

Watch the following video, "Replacing Cassandra: Digital transformation at the world’s largest digital identity network" to learn more about LexisNexis' migration story.

3. The Trade Desk: Running hot and cold data

The challenge: The Trade Desk is an advertising technology company providing a self-service omni-channel platform through which media buyers purchase digital advertising. 

The Trade Desk has a vast pool of data stored in long-term storage. It used Cassandra for cold store, but the data structures available using Cassandra to get the needed high write throughput weren’t as effective for some read cases that they had. To get the needed level of writes, the company had to use compression and tombstoning— which required a lot of CPU utilization relative to the size of the data.

"To get the throughput that we needed [with Cassandra], we needed to scale the number of machines to a high number of machines with a lot of CPU compared to the disk they had,” says Matt Cochran, Director of Engineering, The Trade Desk. “Aerospike gave us another alternative."

The solution: Moving to Aerospike meant The Trade Desk changed its architecture to combine a hot cache with a cold store, providing it with more bid opportunities and a more efficient infrastructure.

The outcome:

  • Reduced number of servers from 300+ to 60, a 9X reduction in node count

  • Takes less than 8 ms to thaw data from cold store

  • Writes 30 million key-value tuples/sec into 1 PB cold cache

For more details, read The Trade Desk user story

4. Adform: Quadrupling scale while halving nodes

The challenge: After stabilizing its Cassandra environment, Adform’s IT team faced the daunting task of scaling the Cassandra cluster fourfold. However, with increasing Cassandra performance degradation, especially in terms of latency and throughput, it became clear that the system would require extensive database monitoring and tuning. This operational overhead was diverting resources from more strategic projects.

Consultants told engineers at Adform, a European ad-serving software company, that they must be doing something wrong if they were having problems with their Cassandra system. But they were already spending all their time tuning the Cassandra system, trying to keep it stable, when the company wanted to launch a big expansion. 

"With Aerospike, we have been able to drastically cut down on the number of Cassandra servers, which provided a great cost reduction,” says CTO Jakob Bak. “Even more important is the super fast key-value store and extraordinary predictability we get with Aerospike, providing the responsiveness our clients require to compete in the crowded Internet and mobile markets.”

The solution: IT was facing the reality of having to scale its Cassandra environment by four times. Instead, Adform moved from a 32-node Cassandra cluster to a 3-node Aerospike cluster in each of its two data centers.

The outcome:

  • Managed 1TB of data on Intel SSDs

  • Processed 120k reads per second and 8k writes per second

  • Handled 200k TPS peaks

Read the Adform customer story, "Adform divorces Cassandra: Scales performance by 4x with 2x fewer servers," to learn more.

5. Wayfair: A technology company that sells furniture

The challenge: Wayfair’s AdTech group required a database capable of low latency, scalability, and high reliability. Their Cassandra cluster required extensive tuning of the Java virtual machine, making operations complex and resource-intensive. The company struggled with maintaining consistent read performance, leading them to seek a high-performance alternative.

Cassandra required more operational effort, such as tuning the Java virtual machine. Wayfair had trouble finding employees with that skill. In addition, Cassandra’s reliance on horizontal scaling meant the number of servers required was higher due to the limited amount of data each node could handle. 

"With Cassandra, there's a lot more configuration and tuning needed,” says Ken Bakunas, Wayfair’s NoSQL Data Architect. “Out-of-the-box, Aerospike pretty much just required a few changes and you’re good to go.”

The solution: When Wayfair switched to Aerospike, the result was faster read and write operations at a lower cost. Over three years, overall licensing, hardware, operational costs, and data center real estate costs were lower.

The outcome:

  • A server reduction from 60 to 7

  • An individual on-premise cluster averages about 100k reads and 20k writes per second, occasionally bursting to 1 million

  • Almost all latencies are under one millisecond

Watch Wayfair’s Aerospike Summit video "Moving sofas in millisecond time." 

6. InMobi: Powering a mobile advertising platform

The challenge: InMobi is a global mobile advertising network that serves user requests from more than 100 countries. Speed and low latency are vital considerations. 

The project had three basic requirements: It needed to handle a heavy write load—up to 500k requests per second—a heavy read load, and the ability to handle redundancy and other failure mechanisms out of the box without requiring a lot of operator intervention. And it all had to happen in real time. With Cassandra, the company found it needed to make tradeoffs it didn’t want to make.

"We evaluated other technologies like HBase and Cassandra but found that they required significant performance tuning,” says InMobi’s Senior Vice President of Technology, Sachin Kanodia. In contrast, Aerospike had nearly zero operational overhead and worked out of the box.”

The solution: Aerospike’s low latency means InMobi responds to ads faster and makes better decisions quickly, resulting in a better user experience and higher conversion rates for its advertisers. Its cross-data center syncing via Aerospike’s XDR feature and customer support were also crucial.

The outcome:

  • Ingests 10 to 12 billion events daily

  • Database delivers user data in less than 5 ms

  • Responds to most ads within 30-50 ms

Interested in learning more? Read "How InMobi took the lead in mobile AdTech" for details.

7. Medical device manufacturer: Keeping its data pumping

The challenge: A multinational medical device company and its partner produce a continuous blood glucose-monitoring solution featuring wearable sensors that read glucose levels and integrate the data directly into a reader and a mobile tracking app on a smartphone. New glucose readings are delivered wirelessly to a cloud system to monitor diseases such as diabetes in real time. Their previous system, based on Cassandra, suffered from performance bottlenecks, particularly in terms of replication factor and read latency.

Consequently, database performance and availability are critical success factors. The company's previous database solution experienced problems when nodes went down, causing health messages to be delayed until the nodes could be restored from a backup. This meant developers had to invest significant time adding code around the database to ensure that data would be replicated. 

"There’s no such thing as a quiet environment in our business,” says the company’s general manager of cloud solutions. "In the old days, when volume wasn’t very high, we could count on late-night maintenance to fix any problems. But today, we can’t afford to have any part of the system down at all." 

The solution: Aerospike’s high availability architecture sold this multinational medical device company on the overall solution. In addition, Aerospike “killed Cassandra” in terms of performance.

The outcome:

  • 40,000 objects at any moment

  • 10 terabytes of unique data

  • Cost savings by “at least a factor of 2” 

Check out the customer story, "Multinational medical device company now can process and deliver health measurements quickly, accurately, and globally," to learn how Aerospike kept it up and running.

Ready for a replacement?

These seven case studies showcase Cassandra’s performance and reveal that alternative data solutions may deliver higher performance, increased efficiency, and substantial cost savings. By thoughtfully selecting optimal data management tools, businesses drive transformation and maintain a competitive edge in the market.

Five signs you have outgrown Cassandra

Does your organization offer real-time, mission-critical services? Do you require predictable performance, high uptime and availability, and low TCO?

If you answered yes to one or both of these questions, it is likely that your Cassandra database solution isn’t cutting it. Check out our white paper and learn how Aerospike can help you.