Geospatial index and query
Use Aerospike geospatial storage and indexing to enable fast queries on points within a region, on a region containing points, and points within a radius.
Underlying technologiesโ
The Aerospike geospatial feature relies on these technologies:
- GeoJSON format to specify and store GeoJSON geometry objects, and achieve data standardization and interoperability. See the GeoJSON Format Specification and GeoJSON IETF WG.
- The Aerospike AeroCircle data type extends the GeoJSON format to store circles as well as polygons.
- S2 Spherical Geometry Library to map points and regions in single-dimension, 64-bit CellID representation.
The S2 Geometry Library is a spherical geometry library, very useful for manipulating regions on a sphere (commonly on Earth) and indexing geographic data. The S2 blog provides more details about S2, Hilbert Curve, and geospatial mapping.
- Aerospike secondary indexes and queries to achieve performance and scale for index inserts/updates and queries.
Use casesโ
- Vehicle tracking systems that require high-throughput updates of vehicle location and frequently query vehicles within a region.
- A mapping application could find different amenities within a certain distance of a given location.
- Location-targeted bidding transactions to discover persons or devices within the location with an active ad campaign.
See these Aerospike examples.
Geospatial dataโ
Aerospike supports the GeoJSON geospatial data type. All geospatial functionality (indexing and querying) only execute on GeoJSON data types.
GeoJSON data incurs this additional processing on data reads:
- GeoJSON text is parsed for validity and support (see GeoJSON Parsing).
- GeoJSON text is converted into S2 CellID coverings.
- Aerospike saves both the covering CellIDs and the original GeoJSON in the database.
- Only GeoJSON data is accessible to the application through the client APIs and the UDF subsystem.
Geospatial indexโ
In addition to integers and strings, Aerospike supports Geo2DSphere data types for indexes.
Use asadm to create and manage secondary indexes in an Aerospike cluster. For instructions, see Secondary Index (SI) Query.
The following command creates a secondary index called geo-index
using geo2dsphere
data on the namespace user-profile
using the set name geo-set
and the bin geo-bin
.
Admin+> manage sindex create geo2dsphere geo-index ns user_profile set geo-set bin geo-bin
Geo2DSphere indexes behave as other index types to:
- Scan existing records to inspect the indexed bin (
geo-bin
in the example above) to build an in-memory geospatial index. - Create an independent index for data on each node.
- Update the index on all subsequent data inserts and updates.
- Rebuild the index when a node restarts.
Geospatial queryโ
Aerospike supports two Geospatial queries:
- Points exist within a region (including circle)
- Region contains point
Points-within-region query (circle)โ
This example Python script is a points-within-a-region query.
def query_pwr(args,client):
"""Construct a GeoJSON region."""
region = aerospike.GeoJSON({
'type': 'Polygon',
'coordinates': [[[-122.500000,37.000000],
[-121.000000, 37.000000],
[-121.000000, 38.080000],
[-122.500000, 38.080000],
[-122.500000, 37.000000]]]})
"""Construct the query predicate."""
query = client.query(args.nspace, args.set)
predicate = aerospike.predicates.geo_within_geojson_region(LOCBIN, region.dumps())
query.where(predicate)
"""Define callback to process query result."""
def callback((key, metadata, record)):
records.append(record)
"""Make the actual query!"""
query.foreach(callback)
Example using AQLโ
- Insert data representing a point into Aerospike.
aql> INSERT INTO test.testset (PK, geo_query_bin) VALUES (2, GEOJSON('{"type": "Point", "coordinates": [1,1]}'))
- Query a region to see if it contains that point.
aql> SELECT * FROM test.testset WHERE geo_query_bin CONTAINS GeoJSON('{"type":"Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}'))
+----------------------------------------------------+
| geo_query_bin |
+----------------------------------------------------+
| GeoJSON('{"type": "Point", "coordinates": [1,1]}') |
+----------------------------------------------------+
1 row in set (0.004 secs)
Region-contains-points queryโ
This example C++ script is a region-contains-points query.
// Callback function to process each record response
bool
query_cb(const as_val * valp, void * udata)
{
if (!valp)
return true; // query complete
char const * valstr = NULL;
as_record * recp = as_record_fromval(valp);
if (!recp)
fatal("query callback returned non-as_record object");
valstr = as_record_get_str(recp, g_valbin);
__sync_fetch_and_add(&g_numrecs, 1);
cout << valstr << endl;
return true;
}
// Main query function
void
query_prcp(aerospike * asp, double lat, double lng)
{
char point[1024];
// Construct a GeoJSON point.
snprintf(point, sizeof(point),
"{ \"type\": \"Point\", \"coordinates\": [%0.8f, %0.8f] }",
lng, lat);
// Construct the query object.
as_query query;
as_query_init(&query, g_namespace.c_str(), g_set.c_str());
as_query_where_inita(&query, 1);
as_query_where(&query, g_rgnbin, as_geo_contains(point));
// Make the actual query.
as_error err;
if (aerospike_query_foreach(asp, &err, NULL,
&query, query_cb, NULL) != AEROSPIKE_OK)
throwstream(runtime_error,
"aerospike_query_foreach() returned "
<< err.code << '-' << err.message);
as_query_destroy(&query);
}
Example using AQLโ
- Insert data representing a region into Aerospike.
aql> INSERT INTO test.testset (PK, geo_query_bin) VALUES (1, GEOJSON('{"type": "Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}'))
- Query for regions containing a certain point.
AQL> SELECT * FROM test.testset WHERE geo_query_bin CONTAINS GeoJSON('{"type":"Point", "coordinates": [1, 1]}')
+---------------------------------------------------------------------------------------------+
| geo_query_bin |
+---------------------------------------------------------------------------------------------+
| GeoJSON('{"type": "Polygon", "coordinates": [[[0,0], [0, 10], [10, 10], [10, 0], [0,0]]]}') |
+---------------------------------------------------------------------------------------------+
1 row in set (0.017 secs)
Query filtersโ
To extend the capabilities of both queries, use User-Defined Functions (UDFs) to filter the result set.
This example Python script demonstrates using a filter UDF.
def query_circle(args, client):
"""Query for records inside a circle."""
query = client.query(args.nspace, args.set)
predicate = aerospike.predicates.geo_within_radius(LOCBIN,
args.longitude,
args.latitude,
args.radius)
query.where(predicate)
# Search with UDF amenity filter
query.apply('filter_by_amenity', 'apply_filter', [args.amenity,])
query.foreach(print_value)
Where the apply_filter
Lua function is the following:
local function select_value(rec)
return rec.val
end
function apply_filter(stream, amen)
local function match_amenity(rec)
return rec.map.amenity and rec.map.amenity == amen
end
return stream : filter(match_amenity) : map(select_value)
end
Index on list/mapโ
It is also possible to index and query on list or map elements with GeoJSON data type:
# create a secondary index for numeric values of test.demo records whose 'points' bin is a list of GeoJSON points
client.index_list_create('test', 'demo', 'points', aerospike.INDEX_GEO2DSPHERE, 'demo_point_nidx')
predicate = aerospike.predicates.geo_within_radius('points',
args.longitude,
args.latitude,
args.radius,
aerospike.INDEX_GEO2DSPHERE)
query = client.query('test', 'demo')
query.where(predicate);
The above creates an index on GeoJSON list elements and constructs the query predicate using the index.
Aerospike GeoJSON extensionโ
Use the Aerospike AeroCircle
geometry object to store circles along with regular polygons.
This example script specifies a circle with a radius of 300 meters at longitude/latitude -122.250629, 37.871022
.
{"type": "AeroCircle", "coordinates": [[-122.250629, 37.871022], 300]}
GeoJSON parsingโ
On data insert/update, Aerospike only recognizes Point
, Polygon
, MultiPolygon
, and AeroCircle
GeoJSON geometry objects, which are indexable objects. Unsupported GeoJSON objects return an AEROSPIKE_ERR_GEO_INVALID_GEOJSON
result code 160 (for example, LineString
or MultiLineString
fail on insert). Holes can be Polygon
objects, per the GeoJSON Format Specification.
Polygon
loop definitions must wind counter-clockwise.
Aerospike supports the Feature
operator, which allows groups of geometry objects and user-specified properties; however, Feature Collection
is not supported.
Invalid GeoJSON objects are caught on insert/update. For example, an object defined as point
instead of Point
fails.
Per the GeoJSON IETF recommendation, the Coordinate System is WGS84. Explicit specification of a coordinate reference system (CRS) is ignored.
Configuration parametersโ
Parameter | Datatype | Default | Description |
---|---|---|---|
max-cells | Integer | 8 | Defines the maximum number of cells used in the approximation. Increasing this value improves accuracy but may affect query performance. |
max-level | Integer | 1 | Defines the minimum size of the cell to be used in the approximation. Tuning this can make query results more accurate. |
min-level | Integer | 1 | Defines the size of the maximum cell to be used in the approximation. Should generally be set to 1 ; increasing too much may cause queries to fail. |
earth-radius-meters | Integer | 6371000 | Specifies Earth's radius in meters. Used for geographical calculations. |
level-mod | Integer | 1 | Specifies the multiple for levels to be used, effectively increasing the branching factor of the S2 Cell Id hierarchy. |
strict | Boolean | true | When true , performs additional validation on results to ensure they fall within the query region. When false , returns results as-is, which may include points outside the query region. |
max-cells
visualizationโ
Hereโs an example that shows how RegionCoverer covers a specified region with max-cells
set to different values. With a higher value of max-cells
, the approximation becomes more accurate.
With max-cells
= 10:
With max-cells
= 30:
With max-cells
= 100:
max-level
visualizationโ
Hereโs an example to see RegionCoverer covering a specified region and how tuning max-level
can make query results more accurate.
For this example, min-level
is set to 1, and max-cells
is set to 10.
With max-level=12,
With max-level=30,
Create a geospatial applicationโ
To develop a geospatial application:
- Install and configure the Aerospike server.
- Create a Geo2DSphere index on a namespace-set-bin combination.
- Construct and insert GeoJSON
Point
data. - Construct a Points-within-Region predicate (
where
clause), make a query request, and process the records returned. - (alternate) Construct and insert GeoJSON
Polygon
/MultiPolygon
data. - (alternate) Construct a Region-contains-Point predicate, make a query request, and process the records returned.
Known limitationsโ
- Using UDFs to insert or update GeoJSON data types is not supported.
- Duplicate records can be returned.
- For namespaces with
data-in-memory true
, GeoJSON particles allocate up to 2KB more than the reported particle size, which can lead to high memory consumption in some cases. This problem was corrected in Aerospike Database versions 4.9.0 and later.