Blog

Go from zero to insight rapidly with Aerospike SQL Powered by Starburst

Aerospike square logo
Product Management
June 15, 2022|6 min read

Sound business decisions lean heavily on insights, which are harnessed from the various sources of data in your organization. Generating insights is not just about creating premier dashboards or fetching targeted records but also addressing the speed of analysis. Ad-hoc queries and interactive dashboarding demand near real-time response times to be actionable, not to mention impactful. Analytics workloads are read-heavy and require low latency and high throughput. Large datasets are generally persistent in a database and are queried on a need basis. No matter how optimized your underlying SQL engine is, all performance gains are wiped out by slower reads from an underlying database.

Aerospike is a fast NoSQL database, which is extensively used for low latency and high throughput use cases in Ad Tech, Financial Services, Telco, and several other domains. With its hybrid memory architecture, it serves as an ideal database for data analysis. SQL is the lingua-franca for data analysis. Running ANSI SQL queries against Aerospike data offers a rapid and cost-effective way to generate valuable insights in your dashboards, ad-hoc queries, or Python applications.

We are excited to announce Aerospike SQL powered by Starburst. Starburst Enterprise is a fully supported, production-tested, and enterprise-grade distribution of open-source Trino (formerly Presto SQL). It enables you to run Trino on a single machine, a cluster of machines, on-prem, or in the cloud. Aerospike SQL uses the Aerospike Presto (Trino) connector to reconcile the data model differences that manifest themselves while accessing a NoSQL database such as Aerospike using SQL via a SQL engine like Trino. The Presto connector seamlessly bridges the Aerospike Enterprise Edition (EE) with Starburst Enterprise. It allows you to get answers to your business questions rapidly with the convenience of ANSI SQL while leveraging the scale of Aerospike and Starburst platforms and best-in-class support. It’s based on the principle of separation of computing and storage to enable you to right-size your compute and storage clusters independently to achieve maximum performance with lower TCO. The time spent to deal with infrastructure issues is the time not spent on issues that matter most to your business. Aerospike SQL comes with best-in-class support from both Aerospike and Starburst so that you can focus on generating valuable insights at scale from the data stored in Aerospike using ANSI SQL to drive your critical business decisions.

Here’s how Aerospike SQL works:

Figure 1 presents the high-level architecture.

  1. A user can submit a SQL query using the Starburst Enterprise UI (see Figure 2) or one of the other

    clients

    to the Trino coordinator in the Aerospike SQL cluster.

  2. The coordinator constructs a query plan and distributes portions of the plan among workers.

  3. The connectors, which run in the workers, send parallel partition scan requests or push down predicates wherever possible to your Aerospike database cluster as a part of the data load stage of query execution. The Trino connector loads the scanned data from the 4,096 Aerospike partitions into the configured number of Presto splits. Higher performance can be achieved by configuring the numbers of splits in the connector to match up the compute power in the Trino cluster.

  4. The Trino workers process the splits and execute the remaining stages to generate the result set.

  5. The coordinator fetches results from the workers and returns them to the client application.

1diagram-Aerospike-SQL-powered-by-Starburst-1300w-rev

Figure 1: Aerospike SQL – High Level Architecture

4blog-screenshot-Starburst-Enterprise-UI

Figure 2: Starburst Enterprise UI

Why you should care:

Aerospike SQL enables you to:

  1. Run ANSI SQL queries to perform in-place on massive amounts of data in the Aerospike database.

  2. Browse namespaces/sets easily in Aerospike using the Cluster Explorer to discover data in your cluster.

  3. Federate queries clusters to create a single point of access across multiple Aerospike clusters.

  4. Create dashboards using data stored in Aerospike and familiar business-intelligence (BI) tools, such as Tableau, Power BI, etc.

  5. Boost your query performance by configuring the Trino connector for massive parallelism, predicate pushdown, and secondary indexes. Queries with Aerospike secondary indexes run roughly 80x faster than queries without them.

  6. Leverage Presto’s cost-based optimization (CBO) via row count for query optimization. Aerospike connector is one of the two Presto connectors that support Presto CBO.

  7. Leverage Schema inference if you do not know a priori the schema of the data stored in ASDB.

  8. Secure your data with TLS between clients all the way to Aerospike clusters, LDAP and PKI authentication of Presto users with ASDB, and support for server quotas to guarantee fair usage.

  9. Deploy anywhere, on-prem or cloud (AWS and GCP).

  10. Operationalize your use cases quickly with best in-class support offered by Aerospike and Starburst.

How Aerospike SQL can help you:

  • Data analysis and dashboarding

  • Run ad-hoc, interactive SQL queries on massive datasets, e.g. “Count the number of users that have clicked the new banner ad?”, “What are some categories of ads they’ve seen?”, etc., or create insightful dashboards using BI tools such as Tableau, Power BI, etc. to speed up decision making.

  • Audit and Compliance

  • Run SQL queries to discover PII in your datasets stored in Aerospike to ensure compliance and address potential compliance issues proactively. You can also leverage the audit trail capability to understand who accessed sensitive data, when they accessed it, and what they accessed i.e. the SQL query that they ran.

  • Data Science

  • Explore and analyze your dataset in Aerospike to understand the distribution, anomaly, correlation, and other data characteristics before applying machine learning models.

  • Data Engineering

  • You can run complex ETL queries programmatically using Python and Jupyter notebook and develop complex data models using Aerospike Collection data types (CDT) i.e. maps and lists and query them performantly using Trino JSON Functions.

Learn More

Aerospike SQL Powered by Starburst

What’s next?

With Aerospike SQL powered by Starburst and the best in class support that it offers, you can go from zero to insight rapidly thereby accelerating your decision-making process. The time that you would otherwise spend dealing with infrastructure issues can now be spent on tasks that matter most to your business.

Please contact us to learn more about how Aerospike SQL can help you with your use case.

Aerospike is hiring! If you’re interested in technical challenges within a growing team, take a look at our available jobs.