Skip to main content
Loading

Map performance

The following 𝓞 run time performance analysis is worst-case.

OperationReturn TypeUnordered (1)K / KV-Ordered (2)K-Ordered (3)KV-Ordered (4)
Set Type Ops
set_type to Unordered1111
set_type to K-OrderedN log N111
set_type to KV-OrderedN log NN log NN log N1
Modify Ops
putNNlog Nlog N
put_items(N+M)log M + NM(log M + log N) + NM(log M + log N) + NM(log M + log N) + N
increment
decrement
NNlog Nlog N
clear1111
remove_by_keyBaseNNlog Nlog N
+Index+N+N+N+N
+Ordered Map+U log U   
remove_by_indexBaseL log N + NN11
+Rank+N+N+N+log N
+Ordered Map+U log U   
remove_by_rankBaseR log N + NR log N + NR log N1
+Index+N+1+1+1
+Ordered Map+U log U   
remove_by_key_intervalBaseN + MN + Mlog N + Mlog N + M
+Rank+N log N+N log+N log N+N log N
+Ordered Map+U log U   
remove_by_index_rangeBaseL log N + NN + MMM
+Rank+N log N+N log+N log N+N log N
+Ordered Map+U log U   
remove_by_value_interval
remove_all_by_value
BaseN + MN + MN + Mlog N + M
+Index+N + M+M+M+M
+Ordered Map+U log U   
remove_by_rank_rangeBaseR log N + NR log N + NR log NM
+Index+N log N+M+M+M
+Ordered Map+U log U   
remove_all_by_key_listBase(M+N)logM + NM log N + NM log NM log N
+Ordered Map+U log U   
remove_all_by_value_listBase(M+N)logM + N(M+N)log M + N(M+N)log MM log N
+Ordered Map+U log U   
Read Ops
size1111
get_by_keyBaseNNlog Nlog N
+Index+N+N+N+log N
+Ordered Map+U log U   
gete_by_indexBaseL log N + NN11
+Rank+N+N+N+log N
get_by_index_rangeBaseL log N + NN + MMM
+Rank+N log N+N log+N log N+N log N
+Ordered Map+U log U   
get_by_rankBaseR log N + NR log N + NR log N1
+Index+N+1+1+1
+Ordered Map+U log U   
get_by_key_intervalBaseNNlog N + Mlog N + M
+Rank+N log N+N log+N log N+N log N
+Ordered Map+U log U   
get_by_value_interval
get_all_by_value
BaseN + MN + MN + Mlog N + M
+Index+N + M+M+M+M
+Ordered Map+U log U   
get_by_rank_rangeBaseR log N + NR log N + NR log NM
+Index+N log N+M+M+M
+Ordered Map+U log U   
get_all_by_key_listBase(M+N)logM + NM log N + NM log NM log N
+Ordered Map+U log U   
get_all_by_value_listBase(M+N)logM + N(M+N)log M + N(M+N)log MM log N
+Ordered Map+U log U   
Mode Modifiers
Not-in-memory On-Disk+D + W
Data-in-memory On-Disk+W

Legend

  • 1: Unordered Map
  • 2: K-Ordered or KV-Ordered map, namespace stores its data on SSD or PMem
  • 3: K-Ordered Map, namespace stores its data in memory
  • 4: KV-Ordered Map, namespace stores its data in memory
  • D: Cost of load from disk
  • W: Cost of write to disk
  • N: Count of elements in the Map
  • M: Count of elements in the range/interval or elements added to the Map
  • R: for rank r, R = min(r, N - r - 1)
  • L: for index i, L = min(i, N - i - 1)
  • C: Cost of memcpy on packed Map
  • U: Count of elements in the result when return type is Ordered Map

Every modify op has a +C for copy on write for allowing rollback on failure.

Notes

  • In an in-memory namespace, K-ordered maps have an offset index, KV-ordered maps have a full index.
  • When a namespace is data-on-SSD or data-in-PMem, choosing column (2) K-ordered gives the best performance for all map operations at the cost of 4 extra bytes on disk. This is because maps in these namespaces do not carry indexes around (to save space on disk) and because being K-ordered has no performance drawbacks for all map operations.
  • The interval-based operations have parameters [start, stop) where the start value is inclusive and stop value is exclusive. They are generally faster than range-based operations when there is no map ordering and/or when data is not in memory (due to lack of map index metadata).
  • The range-based operations with parameters (rank, count) or (index, count) are very fast for ordered maps within data-in-memory namespaces.
  • Key-based operations are those with the suffixes by_index, by_index_range and suffixes with the word key. These operations generally incur additional performance penalties when returning rank type results. They are usually sped up by being K-ordered and having offset index metadata (available when namespace is data-in-memory).
  • Value-based operations are those with the suffixes by_rank, by_rank_range and suffixes with the word value. These operations generally incur additional performance penalties when returning index type results. They are usually sped up by being KV-ordered and having full index metadata (namespace is data-in-memory).
  • The Ordered Map return types (added in Database 6.3) carries the extra cost of ordering the result when the Map operation worked on Unordered Map bin data. There's no extra cost when the Map is already K-Ordered or KV-Ordered. There's no extra cost for the Unordered Map return type, because the result is returned as-is.
  • Reminder: worst-case, average-case and best-case performance are different categories. You should compare like to like.

Performance Analysis Example

Consider a storage model where the map key is a name and the map value is the score associated with the name.

map: {name1: score1, name2: score2, ...}

Let's say you want to optimize performance for getting the names with the top 10 scores. The operation for getting the top 10 would be get_by_rank_range.

Data in memory

From the complexities table, one can see choosing KV-Ordered (4) for get_by_rank_range results in O(M) performance. Since this is data-in-memory, KV-Ordered has full map indexing. Also, because we're getting top 10, the performance is O(10) or constant time.

There is an additional +C cost due to the copy on write nature of the database, and +W if there is disk backing for this namespace.

Performance: O(1) + C [+ W]

Return type extra cost

Looking at get_by_rank_range again, if the return type was index, there would be extra performance cost as stated on the row below get_by_rank_range with +Index under the Return Type column.

For a K-ordered data-in-memory map for example, the base performance of get_by_rank_range is: O((R + M) log N)

For get_by_rank_range(rank, count, opFlags=index): O((R + M) log N + M)