Map performance

The following 𝓞 run time performance analysis is worst-case.

Operation	Return Type	Unordered (1)	K / KV-Ordered (2)	K-Ordered (3)	KV-Ordered (4)
Set Type Ops
`set_type` to Unordered		1	1	1	1
`set_type` to K-Ordered		N log N	1	1	1
`set_type` to KV-Ordered		N log N	N log N	N log N	1
Modify Ops
`put`		N	N	log N	log N
`put_items`		(N+M)log M + N	M(log M + log N) + N	M(log M + log N) + N	M(log M + log N) + N
`increment` `decrement`		N	N	log N	log N
`clear`		1	1	1	1
`remove_by_key`	Base	N	N	log N	log N
	+Index	+N	+N	+N	+N
	+Ordered Map	+U log U
`remove_by_index`	Base	L log N + N	N	1	1
	+Rank	+N	+N	+N	+log N
	+Ordered Map	+U log U
`remove_by_rank`	Base	R log N + N	R log N + N	R log N	1
	+Index	+N	+1	+1	+1
	+Ordered Map	+U log U
`remove_by_key_interval`	Base	N + M	N + M	log N + M	log N + M
	+Rank	+N log N	+N log	+N log N	+N log N
	+Ordered Map	+U log U
`remove_by_index_range`	Base	L log N + N	N + M	M	M
	+Rank	+N log N	+N log	+N log N	+N log N
	+Ordered Map	+U log U
`remove_by_value_interval` `remove_all_by_value`	Base	N + M	N + M	N + M	log N + M
	+Index	+N + M	+M	+M	+M
	+Ordered Map	+U log U
`remove_by_rank_range`	Base	R log N + N	R log N + N	R log N	M
	+Index	+N log N	+M	+M	+M
	+Ordered Map	+U log U
`remove_all_by_key_list`	Base	(M+N)logM + N	M log N + N	M log N	M log N
	+Ordered Map	+U log U
`remove_all_by_value_list`	Base	(M+N)logM + N	(M+N)log M + N	(M+N)log M	M log N
	+Ordered Map	+U log U
Read Ops
`size`		1	1	1	1
`get_by_key`	Base	N	N	log N	log N
	+Index	+N	+N	+N	+log N
	+Ordered Map	+U log U
`get_by_index`	Base	L log N + N	N	1	1
	+Rank	+N	+N	+N	+log N
`get_by_index_range`	Base	L log N + N	N + M	M	M
	+Rank	+N log N	+N log	+N log N	+N log N
	+Ordered Map	+U log U
`get_by_rank`	Base	R log N + N	R log N + N	R log N	1
	+Index	+N	+1	+1	+1
	+Ordered Map	+U log U
`get_by_key_interval`	Base	N	N	log N + M	log N + M
	+Rank	+N log N	+N log	+N log N	+N log N
	+Ordered Map	+U log U
`get_by_value_interval` `get_all_by_value`	Base	N + M	N + M	N + M	log N + M
	+Index	+N + M	+M	+M	+M
	+Ordered Map	+U log U
`get_by_rank_range`	Base	R log N + N	R log N + N	R log N	M
	+Index	+N log N	+M	+M	+M
	+Ordered Map	+U log U
`get_all_by_key_list`	Base	(M+N)logM + N	M log N + N	M log N	M log N
	+Ordered Map	+U log U
`get_all_by_value_list`	Base	(M+N)logM + N	(M+N)log M + N	(M+N)log M	M log N
	+Ordered Map	+U log U
Mode Modifiers
Not-in-memory On-Disk	+D + W
Data-in-memory On-Disk	+W

Legend

1: Unordered Map
2: K-Ordered or KV-Ordered map, namespace stores its data on SSD or PMem.
3: K-Ordered Map, namespace stores its data in memory. For 7.1+, it’s for no PERSIST_INDEX.
4: KV-Ordered Map, namespace stores its data in memory. For 7.1+, it’s for PERSIST_INDEX bins.
D: Cost of load from disk.
W: Cost of write to disk.
N: Count of elements in the Map.
M: Count of elements in the range/interval or elements added to the Map.
R: for rank r, R = min(r, N - r - 1).
L: for index i, L = min(i, N - i - 1).
C: Cost of memcpy on packed Map.
U: Count of elements in the result when return type is Ordered Map.

Every modify op has a +C for copy on write for allowing rollback on failure.

Notes

In an in-memory namespace, K-ordered maps have an offset index, KV-ordered maps have a full index.
When a namespace is data-on-SSD or data-in-PMem, choosing column (2) K-ordered gives the best performance for all map operations at the cost of 4 extra bytes on disk. This is because maps in these namespaces do not carry indexes around (to save space on disk) and because being K-ordered has no performance drawbacks for all map operations.
The interval-based operations have parameters [start, stop) where the start value is inclusive and stop value is exclusive. They are generally faster than range-based operations when there is no map ordering and/or when data is not in memory (due to lack of map index metadata).
The range-based operations with parameters (rank, count) or (index, count) are very fast for ordered maps within data-in-memory namespaces.
Key-based operations are those with the suffixes by_index, by_index_range and suffixes with the word key. These operations generally incur additional performance penalties when returning rank type results. They are usually sped up by being K-ordered and having offset index metadata (available when namespace is data-in-memory).
Value-based operations are those with the suffixes by_rank, by_rank_range and suffixes with the word value. These operations generally incur additional performance penalties when returning index type results. They are usually sped up by being KV-ordered and having full index metadata (namespace is data-in-memory).
The Ordered Map return types (added in Database 6.3) carries the extra cost of ordering the result when the Map operation worked on Unordered Map bin data. There’s no extra cost when the Map is already K-Ordered or KV-Ordered. There’s no extra cost for the Unordered Map return type, because the result is returned as-is.
Reminder: worst-case, average-case and best-case performance are different categories. You should compare like to like.

Performance Analysis Example

Consider a storage model where the map key is a name and the map value is the score associated with the name.

map: {name1: score1, name2: score2, ...}

Let’s say you want to optimize performance for getting the names with the top 10 scores. The operation for getting the top 10 would be get_by_rank_range.

Data in memory

From the complexities table, one can see choosing KV-Ordered (4) for get_by_rank_range results in O(M) performance. Since this is data-in-memory, KV-Ordered has full map indexing. Also, because we’re getting top 10, the performance is O(10) or constant time.

There is an additional +C cost due to the copy on write nature of the database, and +W if there is disk backing for this namespace.

Performance: O(1) + C [+ W]

Return type extra cost

Looking at get_by_rank_range again, if the return type was index, there would be extra performance cost as stated on the row below get_by_rank_range with +Index under the Return Type column.

For a K-ordered data-in-memory map for example, the base performance of get_by_rank_range is: O((R + M) log N)

For get_by_rank_range(rank, count, opFlags=index): O((R + M) log N + M)