# Geospatial index and query

Use Aerospike geospatial storage and indexing to enable fast queries on points within a region, on a region containing points, and points within a radius.

## Underlying technologies

The Aerospike geospatial feature relies on these technologies:

-   **GeoJSON format** to specify and store [GeoJSON geometry objects](https://www.rfc-editor.org/rfc/rfc7946#section-3), and achieve data standardization and interoperability. See the [GeoJSON Format Specification](https://www.rfc-editor.org/rfc/rfc7946) and [GeoJSON IETF WG](https://github.com/geojson/draft-geojson).
-   The Aerospike [**AeroCircle**](#aerospike-geojson-extension) data type extends the GeoJSON format to store circles as well as polygons.
-   **S2 Spherical Geometry Library** to map points and regions in single-dimension, 64-bit CellID representation.

::: note
The [S2 Geometry Library](https://code.google.com/archive/p/s2-geometry-library/) is a spherical geometry library, very useful for manipulating regions on a sphere (commonly on Earth) and indexing geographic data. The [S2 blog](http://blog.christianperone.com/2015/08/googles-s2-geometry-on-the-sphere-cells-and-hilbert-curve/) provides more details about S2, Hilbert Curve, and geospatial mapping.
:::

-   Aerospike [secondary indexes](https://aerospike.com/docs/database/learn/architecture/data-storage/secondary-index) and [queries](https://aerospike.com/docs/develop/learn/queries) to achieve performance and scale for index inserts/updates and queries.

## Use cases

-   Vehicle tracking systems that require high-throughput updates of vehicle location and frequently query vehicles within a region.
-   A mapping application could find different amenities within a certain distance of a given location.
-   Location-targeted bidding transactions to discover persons or devices within the location with an active ad campaign.

See these [Aerospike examples](https://github.com/aerospike/geospatial-samples).

## Geospatial data

Aerospike supports the GeoJSON geospatial data type. All geospatial functionality (indexing and querying) only execute on GeoJSON data types.

GeoJSON data incurs this additional processing on data reads:

-   GeoJSON text is parsed for validity and support (see [GeoJSON Parsing](#geojson-parsing)).
-   GeoJSON text is converted into S2 CellID coverings.
-   Aerospike saves both the covering CellIDs and the original GeoJSON in the database.
-   Only GeoJSON data is accessible to the application through the client APIs and the UDF subsystem.

## Geospatial index

In addition to integers and strings, Aerospike supports [Geo2DSphere](https://www.mongodb.com/docs/manual/core/2dsphere/) data types for indexes.

Use [asadm](https://aerospike.com/docs/database/tools/asadm) to create and manage secondary indexes in an Aerospike cluster. For instructions, see [Secondary Index (SI) Query](https://aerospike.com/docs/develop/learn/queries).

The following command creates a secondary index called `geo-index` using `geo2dsphere` data on the namespace `user-profile` using the set name `geo-set` and the bin `geo-bin`.

Terminal window

```bash
Admin+> manage sindex create geo2dsphere geo-index ns user_profile set geo-set bin geo-bin
```

Geo2DSphere indexes behave as other index types to:

-   Scan existing records to inspect the indexed bin (`geo-bin` in the example above) to build an in-memory geospatial index.
-   Create an independent index for data on each node.
-   Update the index on all subsequent data inserts and updates.
-   Rebuild the index when a node restarts.

## Geospatial query

Aerospike supports two Geospatial queries:

-   Points exist within a region (including circle)
-   Region contains point

### Points-within-region query (circle)

This example Python script is a points-within-a-region query.

```python
def query_pwr(args,client):

    """Construct a GeoJSON region."""

    region = aerospike.GeoJSON({

      'type': 'Polygon',

      'coordinates': [[[-122.500000,37.000000],

                       [-121.000000, 37.000000],

                       [-121.000000, 38.080000],

                       [-122.500000, 38.080000],

                       [-122.500000, 37.000000]]]})

    """Construct the query predicate."""

    query = client.query(args.nspace, args.set)

    predicate = aerospike.predicates.geo_within_geojson_region(LOCBIN, region.dumps())

    query.where(predicate)

    """Define callback to process query result."""

    def callback((key, metadata, record)):

      records.append(record)

    """Make the actual query!"""

    query.foreach(callback)
```

#### Example using AQL

-   Insert data representing a point into Aerospike.

Terminal window

```bash
aql> INSERT INTO test.testset (PK, geo_query_bin) VALUES (2, GEOJSON('{"type": "Point", "coordinates": [1,1]}'))
```

-   Query a region to see if it contains that point.

Terminal window

```bash
aql> SELECT * FROM test.testset WHERE geo_query_bin CONTAINS GeoJSON('{"type":"Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}'))

+----------------------------------------------------+

| geo_query_bin                                      |

+----------------------------------------------------+

| GeoJSON('{"type": "Point", "coordinates": [1,1]}') |

+----------------------------------------------------+

1 row in set (0.004 secs)
```

### Region-contains-points query

This example C++ script is a region-contains-points query.

```c
// Callback function to process each record response

bool

query_cb(const as_val * valp, void * udata)

{

  if (!valp)

    return true;  // query complete

  char const * valstr = NULL;

  as_record * recp = as_record_fromval(valp);

  if (!recp)

    fatal("query callback returned non-as_record object");

  valstr = as_record_get_str(recp, g_valbin);

  __sync_fetch_and_add(&g_numrecs, 1);

  cout << valstr << endl;

  return true;

}

// Main query function

void

query_prcp(aerospike * asp, double lat, double lng)

{

  char point[1024];

    // Construct a GeoJSON point.

  snprintf(point, sizeof(point),

       "{ \"type\": \"Point\", \"coordinates\": [%0.8f, %0.8f] }",

       lng, lat);

    // Construct the query object.

  as_query query;

  as_query_init(&query, g_namespace.c_str(), g_set.c_str());

  as_query_where_inita(&query, 1);

  as_query_where(&query, g_rgnbin, as_geo_contains(point));

    // Make the actual query.

  as_error err;

  if (aerospike_query_foreach(asp, &err, NULL,

                &query, query_cb, NULL) != AEROSPIKE_OK)

    throwstream(runtime_error,

          "aerospike_query_foreach() returned "

          << err.code << '-' << err.message);

  as_query_destroy(&query);

}
```

#### Example using AQL

-   Insert data representing a region into Aerospike.

Terminal window

```bash
aql> INSERT INTO test.testset (PK, geo_query_bin) VALUES (1, GEOJSON('{"type": "Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}'))
```

-   Query for regions containing a certain point.

Terminal window

```bash
AQL> SELECT * FROM test.testset WHERE geo_query_bin CONTAINS GeoJSON('{"type":"Point", "coordinates": [1, 1]}')

+---------------------------------------------------------------------------------------------+

| geo_query_bin                                                                               |

+---------------------------------------------------------------------------------------------+

| GeoJSON('{"type": "Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}') |

+---------------------------------------------------------------------------------------------+

1 row in set (0.017 secs)
```

## Query filters

To extend the capabilities of both queries, use User-Defined Functions (UDFs) to filter the result set.

This example Python script demonstrates using a filter UDF.

```python
def query_circle(args, client):

    """Query for records inside a circle."""

    query = client.query(args.nspace, args.set)

    predicate = aerospike.predicates.geo_within_radius(LOCBIN,

                                                       args.longitude,

                                                       args.latitude,

                                                       args.radius)

    query.where(predicate)

    # Search with UDF amenity filter

    query.apply('filter_by_amenity', 'apply_filter', [args.amenity,])

    query.foreach(print_value)
```

Where the `apply_filter` Lua function is the following:

```lua
local function select_value(rec)

  return rec.val

end

function apply_filter(stream, amen)

  local function match_amenity(rec)

    return rec.map.amenity and rec.map.amenity == amen

  end

  return stream : filter(match_amenity) : map(select_value)

end
```

## Index on list/map

It is also possible to index and query on list or map elements with GeoJSON data type:

```python
# create a secondary index for numeric values of test.demo records whose 'points' bin is a list of GeoJSON points

client.index_list_create('test', 'demo', 'points', aerospike.INDEX_GEO2DSPHERE, 'demo_point_nidx')

predicate = aerospike.predicates.geo_within_radius('points',

                                                    args.longitude,

                                                    args.latitude,

                                                    args.radius,

                                                    aerospike.INDEX_GEO2DSPHERE)

query = client.query('test', 'demo')

query.where(predicate);
```

The above creates an index on GeoJSON list elements and constructs the query predicate using the index.

## Aerospike GeoJSON extension

Use the Aerospike `AeroCircle` geometry object to store circles along with regular polygons.

This example script specifies a circle with a radius of 300 meters at longitude/latitude `-122.250629, 37.871022`.

Terminal window

```bash
{"type": "AeroCircle", "coordinates": [[-122.250629, 37.871022], 300]}
```

## GeoJSON parsing

On data insert/update, Aerospike only recognizes `Point`, `Polygon`, `MultiPolygon`, and `AeroCircle` [GeoJSON geometry objects](http://geojson.org/geojson-spec#geometry-objects), which are indexable objects. Unsupported GeoJSON objects return an `AEROSPIKE_ERR_GEO_INVALID_GEOJSON` result code 160 (for example, `LineString` or `MultiLineString` fail on insert). Holes can be `Polygon` objects, per the [GeoJSON Format Specification](http://geojson.org/geojson-spec#polygon).

::: note
`Polygon` loop definitions must wind counter-clockwise.
:::

Aerospike supports the `Feature` operator, which allows groups of geometry objects and user-specified properties; however, `Feature Collection` is not supported.

Invalid GeoJSON objects are caught on insert/update. For example, an object defined as `point` instead of `Point` fails.

Per the GeoJSON IETF recommendation, the Coordinate System is WGS84. Explicit specification of a coordinate reference system (CRS) is ignored.

## Configuration parameters

| Parameter | Datatype | Default | Description |
| --- | --- | --- | --- |
| [`max-cells`](https://aerospike.com/docs/database/reference/config#namespace__max-cells) | Integer | 8 | Defines the maximum number of cells used in the approximation. Increasing this value improves accuracy but may affect query performance. |
| [`max-level`](https://aerospike.com/docs/database/reference/config#namespace__max-level) | Integer | 1 | Defines the minimum size of the cell to be used in the approximation. Tuning this can make query results more accurate. |
| [`min-level`](https://aerospike.com/docs/database/reference/config#namespace__min-level) | Integer | 1 | Defines the size of the maximum cell to be used in the approximation. Should generally be set to `1`; increasing too much may cause queries to fail. |
| [`earth-radius-meters`](https://aerospike.com/docs/database/reference/config#namespace__earth-radius-meters) | Integer | 6371000 | Specifies Earth’s radius in meters. Used for geographical calculations. |
| [`level-mod`](https://aerospike.com/docs/database/reference/config#namespace__level-mod) | Integer | 1 | Specifies the multiple for levels to be used, effectively increasing the branching factor of the S2 Cell Id hierarchy. |
| [`strict`](https://aerospike.com/docs/database/reference/config#namespace__strict) | Boolean | true | When `true`, performs additional validation on results to ensure they fall within the query region. When `false`, returns results as-is, which may include points outside the query region. |

### `max-cells` visualization

Here’s an example that shows how RegionCoverer covers a specified region with `max-cells` set to different values. With a higher value of `max-cells`, the approximation becomes more accurate.

With `max-cells` = 10:

 ![max-cells = 10](https://aerospike.com/docs/_astro/Geo10maxcells.NicZFRmL_Z1hvpcl.png)

With `max-cells` = 30:

 ![max-cells = 30](https://aerospike.com/docs/_astro/Geo30maxcells.BiGIPquK_Z9uLso.png)

With `max-cells` = 100:

 ![max-cells = 100](https://aerospike.com/docs/_astro/Geo100maxcells.CcapkB9U_ZT4kry.png)

### `max-level` visualization

Here’s an example to see RegionCoverer covering a specified region and how tuning `max-level` can make query results more accurate. For this example, `min-level` is set to 1, and `max-cells` is set to 10.

With `max-level` = 12,

 ![max-level = 12](https://aerospike.com/docs/_astro/Geo12maxlevel.CxsVHHD8_20uIBK.png)

With `max-level` = 30,

 ![max-level = 30](https://aerospike.com/docs/_astro/Geo30maxlevel.DhHgFqWT_ZCELoH.png)

## Create a geospatial application

To develop a geospatial application:

1.  [Install](https://aerospike.com/docs/database/install) and configure the Aerospike server.
2.  Create a Geo2DSphere index on a namespace-set-bin combination.
3.  Construct and insert GeoJSON `Point` data.
4.  Construct a Points-within-Region predicate (`where` clause), make a query request, and process the records returned.
5.  (alternate) Construct and insert GeoJSON `Polygon`/`MultiPolygon` data.
6.  (alternate) Construct a Region-contains-Point predicate, make a query request, and process the records returned.

## Known limitations

-   Using UDFs to insert or update GeoJSON data types is not supported.
-   Duplicate records can be returned.
-   For namespaces with `data-in-memory true`, GeoJSON particles allocate up to 2KB more than the reported particle size, which can lead to high memory consumption in some cases. This problem was corrected in Aerospike Database versions 4.9.0 and later.