Aerospike Vector opens new approaches to AI-driven recommendationsWebinar registration
Blog

Rethinking the role of MongoDB and AWS in a real-time architecture

August 22, 2023 | 7 min read

This guest blog is written by Denis Anoykin and Alexey Emelyanov of Aerospike implementation partner Asteriosoft, a systems integrator based in Montenegro, based on their experience with a European Ad Technology customer.

In the early stages of developing information systems, the focus is often on speed and supporting multiple use cases rather than on the quality of the architecture. Investors expect a minimum viable product (MVP) quickly, followed by meeting target metrics.

Development is usually outsourced, and communication between the owners and the development team is not optimal. The team tries to cut costs by hiring less experienced professionals, while the owner aims to create as much functionality as possible without setting clear priorities, as they respond to market changes.

Over time, this approach accumulates technical debt, leading to increased expenses for infrastructure maintenance and software development, and slowing down the entire process. Eventually, the system reaches a point where it cannot progress further without significant increases in equipment and development costs.

When facing this issue, the business owner and the development team may have different views on how to resolve it. The owner believes that the system can evolve within the existing architecture, while the dev team may lack the experience to handle heavy traffic and large volumes of data. On the other hand, the dev team believes that the existing architecture no longer fits the changing requirements due to the rapid expansion of system functionality. They insist on a complete rebuild of the system. This is where the assistance of a third party becomes necessary to conduct an architectural analysis and propose an evolutionary solution without rewriting the entire system.

In this article, we will discuss a real example where architectural analysis helped solve accumulated issues in a large AdTech project.

Initial design and function

Initially, the project used PHP and ran on the AWS cloud. MongoDB served as the primary storage for business data and statistics. The system operated on multiple on-demand servers, with MongoDB and the PHP-based user interface hosted on them (see figure 1 below). Spot instances were used to handle incoming requests, and their quantity scaled automatically based on workload. Each spot instance contained PHP code for processing queries.

aws-for-adtech-figure2

Figure 1: Initial architecture on AWS for Ad tech ad serving and bidding use case

The system had two main types of requests: displaying text ad banners and processing clicks on those banners. Each ad banner impression had a unique ID, and when a user clicked on a banner, the system searched for the corresponding impression information in the database and executed various business logic, redirecting the click to the advertiser’s website. MongoDB stored all details about impressions and clicks for generating analytical and financial reports. Therefore, when processing a click and searching for impression details by ID, MongoDB was also involved.

Cost problems at scale with MongoDB

Initially, the system performed well under low traffic, with the main focus on developing the user interface and adding new parameters for advertising campaigns. The total infrastructure cost was about $10K per month, UI responded quickly, reports were generated in seconds, clicks were processed in less than a second. It was assumed that MongoDB could handle all requests by scaling. However, as traffic increased to 700 million requests per day, the infrastructure costs for the MongoDB cluster exceeded $30K per month that was far beyond the allocated budget. Additionally, under high load, database-searched impression details by ID slowed dramatically. As a result, the report generation took several minutes, and click processing did not occur in real-time that caused bad user experience.

The indexes in MongoDB grew to several terabytes due to the large number of records (around 30 billion). System crashes occurred when the quantity and volume of queries exceeded the server’s capabilities. To cope with the traffic, developers had to severely limit report generation and focus only on searching impression details by ID. Horizontal and vertical scaling of MongoDB provided temporary relief, but index rebuilding was required due to constantly adding new documents and deleting old ones.

Initial attempt with Redis proved even more expensive

The developers attempted to use Redis for indexing document searches, but it required a massive amount of RAM, leading to increased infrastructure costs. Since this attempt failed, the developers suggested scaling MongoDB further to handle all requests and reports in real-time. However, calculations showed that approximately $50K per month was needed for the system to function properly, which the business owner was reluctant to invest without a solid solution.

New alternative: Migrating MongoDB workload to Aerospike

The system owner started looking for alternatives and approached Asteriosoft for possible optimizations in the code and MongoDB queries. After investigating the situation, Asteriosoft experts suggested conducting an in-depth analysis to find ways to improve the existing architecture. This analysis would help resolve the conflict between the owner and the development team by proposing a solution that satisfied both parties without a costly rebuild of the whole system.

Asteriosoft conducted research and proposed a solution to relieve MongoDB from the overload of analytical reports and document searches by ID. While MongoDB is suitable for storing business data, it is not the best option for key-value storage or statistical data storage. Specialized solutions like Redis, Aerospike, or Memcached offer faster response times for key-value storage. Similarly, appropriate tools should be used for storing and processing statistical data.

As the system developers correctly noted, Redis and Memcached were not viable options in this case due to the high RAM requirements for storage. In this case Aerospike is a better choice because it provides a unique solution by using SSDs instead of RAM without a significant loss in query processing speed. Aerospike doesn’t access storage devices through the operating system, it goes directly to the NVMe driver for the device, and treats it as a block memory device. It allows use of the storage devices and gains near in-memory access speeds.

The impressions and clicks storage was successfully migrated to Aerospike, and searches for records by ID were performed through this database. Click processing speed returned to tens of milliseconds, and only a few Aerospike servers with large SSD capacities were needed. Aerospike provides client libraries for multiple programming languages, making it easy to convert the PHP code to use Aerospike instead of MongoDB. The PHP code in the handlers was later replaced with Java code, reducing the number of servers required for request processing. For report generation and analytics, AWS Athena was suggested, which processed data stored in AWS S3.

aws-for-adtech-figure1

Figure 2: Revised Architecture

As a result, all impressions and click records were removed from MongoDB, which now operated on a single server managing only business data. Several new Aerospike servers were introduced, requiring significantly fewer resources than before the architectural revision. Analytical reports were successfully transitioned to Amazon Athena. After the architecture restructuring (see figure 2), infrastructure costs decreased to $5K per month, query processing throughput increased, and analytical reports could be generated at any time. Furthermore, the client was able to increase traffic to 1 billion requests per day without any additional infrastructure investments.

This example demonstrates how architectural analysis can address accumulated problems and reduce infrastructure costs. When we deal with larger and larger scales (in terms of data volumes and transactions per second), anything that degrades performance, latency, or throughput is multiplied to the point where it no longer makes technical or financial sense. You can’t really resolve these problems without the deep architecture review that Asteriosoft company does.

Also, the example emphasizes the importance of utilizing specialized tools instead of relying on generic solutions. The developers originally believed that MongoDB could efficiently function as both a storage for business data and a key-value storage, as well as a repository for statistical data. However, this led to increased infrastructure costs and degraded system performance.

Solutions like Aerospike help solve accumulated problems by employing specialized and unique algorithms. Proper utilization of these tools is best left to specialists who can conduct architectural analysis and assist business owners in making their software systems more efficient while reducing costs.