Map performance
The following 𝓞 run time performance analysis is worst-case.
Operation | Return Type | Unordered (1) | K / KV-Ordered (2) | K-Ordered (3) | KV-Ordered (4) |
---|---|---|---|---|---|
Set Type Ops | |||||
set_type to Unordered | 1 | 1 | 1 | 1 | |
set_type to K-Ordered | N log N | 1 | 1 | 1 | |
set_type to KV-Ordered | N log N | N log N | N log N | 1 | |
Modify Ops | |||||
put | N | N | log N | log N | |
put_items | (N+M)log M + N | M(log M + log N) + N | M(log M + log N) + N | M(log M + log N) + N | |
increment decrement | N | N | log N | log N | |
clear | 1 | 1 | 1 | 1 | |
remove_by_key | Base | N | N | log N | log N |
+Index | +N | +N | +N | +N | |
+Ordered Map | +U log U | ||||
remove_by_index | Base | L log N + N | N | 1 | 1 |
+Rank | +N | +N | +N | +log N | |
+Ordered Map | +U log U | ||||
remove_by_rank | Base | R log N + N | R log N + N | R log N | 1 |
+Index | +N | +1 | +1 | +1 | |
+Ordered Map | +U log U | ||||
remove_by_key_interval | Base | N + M | N + M | log N + M | log N + M |
+Rank | +N log N | +N log | +N log N | +N log N | |
+Ordered Map | +U log U | ||||
remove_by_index_range | Base | L log N + N | N + M | M | M |
+Rank | +N log N | +N log | +N log N | +N log N | |
+Ordered Map | +U log U | ||||
remove_by_value_interval remove_all_by_value | Base | N + M | N + M | N + M | log N + M |
+Index | +N + M | +M | +M | +M | |
+Ordered Map | +U log U | ||||
remove_by_rank_range | Base | R log N + N | R log N + N | R log N | M |
+Index | +N log N | +M | +M | +M | |
+Ordered Map | +U log U | ||||
remove_all_by_key_list | Base | (M+N)logM + N | M log N + N | M log N | M log N |
+Ordered Map | +U log U | ||||
remove_all_by_value_list | Base | (M+N)logM + N | (M+N)log M + N | (M+N)log M | M log N |
+Ordered Map | +U log U | ||||
Read Ops | |||||
size | 1 | 1 | 1 | 1 | |
get_by_key | Base | N | N | log N | log N |
+Index | +N | +N | +N | +log N | |
+Ordered Map | +U log U | ||||
gete_by_index | Base | L log N + N | N | 1 | 1 |
+Rank | +N | +N | +N | +log N | |
get_by_index_range | Base | L log N + N | N + M | M | M |
+Rank | +N log N | +N log | +N log N | +N log N | |
+Ordered Map | +U log U | ||||
get_by_rank | Base | R log N + N | R log N + N | R log N | 1 |
+Index | +N | +1 | +1 | +1 | |
+Ordered Map | +U log U | ||||
get_by_key_interval | Base | N | N | log N + M | log N + M |
+Rank | +N log N | +N log | +N log N | +N log N | |
+Ordered Map | +U log U | ||||
get_by_value_interval get_all_by_value | Base | N + M | N + M | N + M | log N + M |
+Index | +N + M | +M | +M | +M | |
+Ordered Map | +U log U | ||||
get_by_rank_range | Base | R log N + N | R log N + N | R log N | M |
+Index | +N log N | +M | +M | +M | |
+Ordered Map | +U log U | ||||
get_all_by_key_list | Base | (M+N)logM + N | M log N + N | M log N | M log N |
+Ordered Map | +U log U | ||||
get_all_by_value_list | Base | (M+N)logM + N | (M+N)log M + N | (M+N)log M | M log N |
+Ordered Map | +U log U | ||||
Mode Modifiers | |||||
Not-in-memory On-Disk | +D + W | ||||
Data-in-memory On-Disk | +W |
Legend
1
: Unordered Map2
: K-Ordered or KV-Ordered map, namespace stores its data on SSD or PMem3
: K-Ordered Map, namespace stores its data in memory4
: KV-Ordered Map, namespace stores its data in memoryD
: Cost of load from diskW
: Cost of write to diskN
: Count of elements in the MapM
: Count of elements in the range/interval or elements added to the MapR
: for rank r, R = min(r, N - r - 1)L
: for index i, L = min(i, N - i - 1)C
: Cost of memcpy on packed MapU
: Count of elements in the result when return type is Ordered Map
Every modify op has a +C
for copy on write for allowing rollback on failure.
Notes
- In an in-memory namespace, K-ordered maps have an offset index, KV-ordered maps have a full index.
- When a namespace is data-on-SSD or data-in-PMem, choosing column (2) K-ordered gives the best performance for all map operations at the cost of 4 extra bytes on disk. This is because maps in these namespaces do not carry indexes around (to save space on disk) and because being K-ordered has no performance drawbacks for all map operations.
- The interval-based operations have parameters [start, stop) where the start value is inclusive and stop value is exclusive. They are generally faster than range-based operations when there is no map ordering and/or when data is not in memory (due to lack of map index metadata).
- The range-based operations with parameters (rank, count) or (index, count) are very fast for ordered maps within data-in-memory namespaces.
- Key-based operations are those with the suffixes
by_index
,by_index_range
and suffixes with the wordkey
. These operations generally incur additional performance penalties when returningrank
type results. They are usually sped up by being K-ordered and having offset index metadata (available when namespace is data-in-memory). - Value-based operations are those with the suffixes
by_rank
,by_rank_range
and suffixes with the wordvalue
. These operations generally incur additional performance penalties when returningindex
type results. They are usually sped up by being KV-ordered and having full index metadata (namespace is data-in-memory). - The Ordered Map return types (added in Database 6.3) carries the extra cost of ordering the result when the Map operation worked on Unordered Map bin data. There's no extra cost when the Map is already K-Ordered or KV-Ordered. There's no extra cost for the Unordered Map return type, because the result is returned as-is.
- Reminder: worst-case, average-case and best-case performance are different categories. You should compare like to like.
Performance Analysis Example
Consider a storage model where the map key is a name and the map value is the score associated with the name.
map: {name1: score1, name2: score2, ...}
Let's say you want to optimize performance for getting the names with the top 10 scores. The operation for getting the top 10 would be get_by_rank_range
.
Data in memory
From the complexities table, one can see choosing KV-Ordered (4) for get_by_rank_range
results in O(M) performance. Since this is data-in-memory, KV-Ordered has full map indexing. Also, because we're getting top 10, the performance is O(10) or constant time.
There is an additional +C
cost due to the copy on write nature of the database, and +W
if there is disk backing for this namespace.
Performance: O(1) + C [+ W]
Return type extra cost
Looking at get_by_rank_range
again, if the return type was index, there would be extra performance cost as stated on the row below get_by_rank_range
with +Index
under the Return Type
column.
For a K-ordered data-in-memory map for example, the base performance of get_by_rank_range
is:
O((R + M) log N)
For get_by_rank_range(rank, count, opFlags=index)
:
O((R + M) log N + M)