# Aggregation

Use Aerospike Stream UDFs to aggregate query results in a distributed fashion. The Aerospike aggregation framework allows fast and flexible query operations. This programmatic framework is similar to a MapReduce system, in that an initial map function runs over a collection and emits results in a highly parallel fashion. Results traverse a pipeline of either subsequent map steps or reduction steps and aggregation steps.

Unlike Hadoop or other frameworks using Java, Aerospike aggregation is implemented using Lua. Each client sends an aggregation request to all servers in the cluster, which independently count the results and return individual results to the requesting client.

The Aerospike aggregation framework differs from other systems because Aerospike recommends running aggregation against an index; this is essentially a `where` clause. Filtering against an index maintains high performance. Aerospike supports aggregation for tables and the entire namespace. The client then runs a final reduce phase, also in Lua, to sum results.

## Use cases

-   Implementing aggregate functions such as SUM, COUNT, MIN, and MAX as user defined stream UDFs.

-   Real-time dashboarding: Using secondary indexes on a bin with an update time, aggregations quickly gather statistics on recently changed records. Aerospike aggregation touches fewer records compared to standard MapReduce systems, which act on the entire unindexed dataset.

## Performing Aggregation

![Aggregation flow](https://aerospike.com/docs/_astro/query_stream_mapaggregatereduce.e8obgkGx_Z1NoJ75.png)

Figure 1: Aggregation flow

To implement aggregation in your application:

1.  Create a [query application](https://aerospike.com/docs/develop/learn/queries.md).
2.  Create indexes on a bin.
3.  Insert a record in an indexed bin.
4.  Create an [aggregation module](https://aerospike.com/docs/database/advanced/udf/modules/stream/develop.md) in Lua.
5.  Set the aggregation module path.
6.  Register the module with the Aerospike cluster.
7.  Construct an aggregate query with a predicate (`where` clause).

### AQL

This aggregation process uses the [aql](https://aerospike.com/docs/database/tools/aql.md) command line tool.

1.  Create an index.

    The following example script creates a string index on the `user_profile` namespace, `west` set, and `location` bin. These examples use the in-memory storage engine. Queries can also run in on-disk namespaces.

    ``` bash
    Admin+> manage sindex create string ix2 ns user_profile set west bin location
    Use 'show sindex' to confirm ix2 was created successfully.
    ```

    ::: note
    Before running this query, verify that the `aerospike.conf` file contains the `user_profile` namespace. If `user_profile` is not present, use the following script to add it.
    :::

    ``` bash
    namespace user_profile {
        replication-factor 2
        storage-engine memory
    }
    ```

2.  Insert data.

    ::: note
    Although `aql` is not intended to be used by applications, you can use a Java code snippet to insert data (see the [Java Client—Synchronous](https://aerospike.com/docs/develop/client/java/usage/atomic/create.md) database write example).
    :::

    ``` bash
    aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_100','MA',342)
    OK, 1 record affected.
    aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_101','AZ',345)
    OK, 1 record affected.
    aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_102','CA',345)
    OK, 1 record affected.
    aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_103','AL',340)
    OK, 1 record affected.
    aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_104','TX',347)
    OK, 1 record affected.
    aql> INSERT INTO user_profile.west (PK,location,last_activity) VALUES ('cookie_105','MA',323)
    OK, 1 record affected.
    ```

3.  Create an aggregation module.

    The following Lua code example counts the number of users in a location, using map-reduce instead of an aggregation step. The `aggregate` function is more efficient, but map-reduce is more flexible.

    ``` lua
    file: aggregate.lua


    function count(s)
        function mapper(rec)
                return 1
        end
        local function reducer(v1, v2)
            return v1 + v2
        end
        return s : map(mapper) : reduce(reducer)
    end
    ```

    Where the stream operations are:

    -   `mapper`: Returns an integer (1) for each profile record.
    -   `reducer`: Adds all 1s for a total count.

4.  Set the aggregation module path.

    You must set the aggregation module path on the client side because the final reduce phases run on the client side after a response is returned from all cluster nodes. Use `aql` to set the relative path to the Lua aggregation module.

    The following example `aql` script sets the path. *aggregate.lua* is in `/home/user/lua_code`. `aql` started in `/home/user`.

    ``` bash
    aql> set LUA_USERPATH  'lua_code'
    ```

5.  Register the module with the cluster.

    ``` bash
    aql> register module 'lua_code/aggregate.lua'
    OK, 1 module added.
    ```

    ::: note
    After the module registers with one cluster node, Aerospike replicates it on all cluster nodes.
    :::

6.  Construct the aggregate query.

    ``` bash
    aql> AGGREGATE aggregate.count() ON user_profile.west WHERE location='MA'
    +-------+
    | count |
    +-------+
    | 2     |
    +-------+
    1 row in set (0.007 secs)
    ```

::: tip
For more information about aggregation with AQL, see [AGGREGATE](https://aerospike.com/docs/database/tools/aql/records#aggregate.md).
:::