Customer Story

Trading predictions with ML and large-scale analysis

About the company

Leading quantitative research firm

This leader in algorithmic trading strategies utilizes the latest methods in scientific data analysis for making new discoveries, predicting movements in global financial markets, conducting arbitrage, and looking for investable trends and global macro strategies. The company’s researchers perform deep analysis of large and often noisy historical and current market time series data, enriching this data with 3rd party sources and calculating ‘implied’ pricing for less liquid or rarely trading Securities. Using rigorous scientific methodology, including artificial intelligence, neural nets, natural language processing (NLP), deep learning, robust statistical analysis and pattern recognition, these researchers analyze an extensive and varied global financial data ecosystem to extract deep insights from truly massive datasets.


Moving and working on massive data sets within millisecond latencies

In the world of finance, few organizations move faster than quantitative research. With their SQL server and caching layer, this research firm struggled to obtain reliable data in a timely fashion and keep up with increasing data volumes. The data came in from multiple sources and often needed to be cleaned up or otherwise resolved. They could not keep up with the demands or provide the efficiencies needed to keep costs manageable. However, for algorithmic traders like this firm, milliseconds matter and a machine learning model is only as good as the amount and accuracy of the data it is fed.

The firm knew it was time to make a major change to accommodate the scale and speed needed to move more data, more quickly, to where it was needed to do the work. They needed a modern, optimized, high performance, scalable system of record that prioritized efficient storage and retrieval of data.

Requirements included:


Need to monitor 200,000 products, each with 200 unique attributes, trading on different global exchanges, and with data point refreshes every 5 minutes

Ability to handle nearly 13.9PB of data

Handling massive amounts of data, 10 versions of each data point, and 20 years of data online

Zero downtime, high performance

Low latency, high throughput, and compute on data-in-place. Resiliency to keep pace with global markets.

Efficient hardware

Highly efficient use of hardware results in lower costs and a smaller footprint

Superior, predictable active-active performance and scale

The firm chose Aerospike over other NoSQL options due to advantages provided by Aerospike’s Hybrid Memory Architecture (HMA). With only indexes in-memory and persisting data to SSD, they could count on Aerospike for predictability, superior performance, and TCO savings — all with a significantly smaller server footprint.

Aerospike provided:

1 million transactions per second

Predictable performance, high write throughput of 1M TPS at low latency with 95% of reads and writes done in <1ms using nine cloud-based VMs.

Easy scalability

Initial testing proved linear scalability beyond 1PB while maintaining performance, with the potential to handle 13.9PB of data, maintain 10 versions of each data point, and 20 years of data online.

Enterprise-grade security

Analysis by InfoSec experts determined there were no viable 'attack vectors' within Aerospike, allowing the firm to stay protected without sacrificing performance.

Exponentially smaller footprint

With optimized hardware efficiency from Aerospike’s hybrid memory architecture and data compression opportunities, the firm could compress data by as much as 90% - moving one data set from a footprint of 260Mb to 17.5Mb.

Global active-active deployment

With Aerospike’s node awareness features, each node knows what data all the other nodes contain, regardless of datacenter geographic location.

Lower costs

The firm can now keep 20 years of data online, update it every five minutes, and count on predictable expenses as data storage needs grow.

A system of record and engagement today and in the future

A year after implementation, the research firm loves how well and reliably Aerospike performs the work they need to do. Used as a system of record and a system of engagement, Aerospike has enabled them to handle double the time series data they originally started with, gain instant access to data at both data center locations, and interact with the data in new ways.

The firm appreciates being able to perform rolling upgrades during office hours with zero interruptions and that Aerospike fits their security model requirements. Looking to the future with Aerospike, they plan to expand their use of the database to other teams.

10-20% increased accuracy based on refreshed data

Able to keep up to 20 years of data online and update every five minutes

Predict expenses around data storage growth needs

Able to plan expenditures due to Aerospike’s efficient storage and predictable performance

Scalable beyond 1PiB while maintaining performance

Projected to handle 13.9 PB of data with by keeping indexes in-memory indexes and persistent data on SSD

Over 1 million transactions per second

Able to conduct transactions at under 1 ms for 95% of reads and writes

Surpass limits with Aerospike