Aerospike Telemetry: A First Peek into the Data

Psi Mankoski
Psi Mankoski
Senior Engineering Manager and Head of Quality Engineering at Aerospike
August 1, 2016|6 min read

Aerospike CTO and Co-founder Brian Bulkowski:

“On a particular warm and beautiful fall day in 2014, while sitting in a windowless conference room in Palo Alto at The Hive listening to Alistair Croll talk about quantified business and engagement metrics in consumer mobile applications, I recall thinking that enterprise software is not at all the same as consumer apps and games.

When Alistair reached the topic of cohort analysis, however, I was hit by a sudden realization of the contrary. A developer’s experience when using a new database – first trying one spiffy feature, then switching to optimizing performance, and then perhaps trying to remedy a failure case – is precisely analogous to a gamer leveling up. That day, I decided to propose instrumenting our Aerospike database as fully as possible so we could spend more quality time on the features causing roadblocks for our developers. In meeting after meeting, and conference after conference, I was told users wouldn’t stand for a database that reports on use. Database software couldn’t ‘phone home’ – even if done with openness, verifiability, and anonymity; even if done to improve the product, our database would be rejected, and we’d become a laughingstock. This, in spite of multiple open source codebases – from Android to Eclipse – all reporting statistics. Despite these fears, I convinced Aerospike to implement this functionality in our open source Community Edition.”

We are happy to present the first results in Aerospike’s ongoing efforts to understand and serve those who need high-performance NoSQL databases through our Telemetry feature. The Telemetry feature is opt-out (i.e., by default, anonymized reporting back to an Aerospike cloud service is enabled). Since turning on the feature, we’ve seen a steady growth in the number of nodes reporting, and so, as we promised to do when we released Telemetry, we’re ready to show some early results of investigation, which may be of note to the software industry as a whole.

The Aerospike forum has been a major back channel between Aerospike’s community and Aerospike the company. While the forum permits direct feedback regarding the use and proposed direction of the Aerospike database software, opening a broader, quantified channel has great potential for going beyond the anecdotes taken from forum interactions. Thus on April 15, in Aerospike Community Edition 3.8.1, we shipped the first release of Telemetry, our anonymized performance statistics data collection service. And now, as promised, we are publishing the initial results of our research into the Telemetry data.

What’s All This (Data) Then?

In our first blog post on Telemetry, titled, Open Source Databases: The Unknown Community, we observed that when building an open source software project, a great deal of information about how the software is actually being used, by whom, and for what purposes, is largely unknown. While for compelling reasons of privacy, some of this information may (and probably should) never be known, there is still a lot to be gained by looking – anonymously – into numerous non-sensitive areas related to the use of the software.

Our Initial Findings

Since releasing Telemetry in Aerospike Community Edition 3.8.1, we have received a steady stream of interesting data back from the field. In this first blog entry, we look at few areas of interest:

  • Which Linux distros are most popular for running Aerospike?

  • What types of computational platforms (both physical and virtualized hardware) are most popular?

In order to answer these questions, we compiled the first 90 days’ worth of Telemetry data in graphical form, below.

Operating System Distro Popularity

Aerospike runs on the GNU/Linux Operating System. Linux comes in many flavors or “distros”. Our initial peek has shown that distros enjoying widespread popularity are also popular as the OS used to run Aerospike Community Edition. We see that Ubuntu Linux currently holds the lead as the most popular distro, with CentOS coming up close behind. Amazon Linux AMI, the standard RPM-based distro on Amazon AWS instances, is in third place. A number of other distros are used as well, including Debian, Red Hat Enterprise Linux (RHEL), Linux Mint, Fedora, and Scientific Linux.

Linux Distro Popularity

Currently, the most popular versions of these popular distros are CentOS 6.7, CentOS 7.2, Ubuntu 14.04, Debian 8.4, and RHEL 7.2. Looking just at the Ubuntu family, you see the following breakdown by version:

donut tele

Over time, we will be able to see how new releases of a distribution like Ubuntu are taken up. But from what we can see, unsurprisingly, the majority are on Ubuntu 14.04, followed by a large percentage on the much older 12.04 release. In contrast, a small number of early adopters have already taken up 15.04 and 16.

Popular Computational Platforms

Telemetry allows some visibility into the computational platform used to run Aerospike’s Community Edition. This allows us to answer with some degree of confidence whether the platform is “bare metal” hardware, or whether some sort of virtualization is involved.

Machine Types: Bare Metal vs. Virtual Machine

Initial Telemetry data shows that Virtual Machines (VMs) outnumber physical hardware nodes by better than 3 to 1. This information is significant; while Aerospike performs well in both cloud and datacenter environments, users are happy to spin up virtualized clusters.

Physical vs. Virtual Machines

Virtual Machine Type and Virtualization / Private Cloud Providers

Within the virtualized Aerospike nodes, we see that Amazon AWS is in the majority, with Google GCE in second place. Oracle VM, VMware, OpenStack, and Microsoft Azure are also represented, along with a number of others.

virtual machine provider popularity

Finally, while we can see from the data that Docker containers are popular, it will take some more effort to tease out Docker-related numbers, since container reporting looks quite similar to standard Linux distros running on physical hardware.

Physical Hardware Providers

Next, as we look into the hardware deployments, we see some major brands (Dell, HP, Supermicro, Asus, and Lenovo) being used, as well as a variety of systems from disparate motherboard manufacturers; the latter is not unexpected, as Aerospike runs well on common off-the-shelf Linux boxes.

physical hardware brand popularity

There were some surprises on the hardware front – for example, the existence of a number of Aerospike nodes hosted on Apple Macintosh PowerBooks running Linux natively (e.g., Debian 8.5, not under Mac OS X).

Thanks for Your (Continued) Support

As emphasized in our Unknown Community blog article, we value your right to privacy. While we have taken steps to make Telemetry transparent, both in how it works and how to disable it if/when desired, it is only through voluntary community participation that there actually *is* any data to be peeked into. Therefore, we thank you for making our Aerospike community a little less unknown through your continued participation in this effort.

So… stay opted in and keep sending in that good data…And if you’re running a release prior to Community Edition 3.8.1, consider upgrading to the current release to get the latest features and to become part of the data collection torrent. We’ll report back to you with more juicy details in a few months!