# Map Performance

The following 𝓞 run time performance analysis is worst-case.

Operation | Return Type | Unordered (1) | K / KV-Ordered (2) | K-Ordered (3) | KV-Ordered (4) |
---|---|---|---|---|---|

Set Type Ops | |||||

`set_type` to Unordered | 1 | 1 | 1 | 1 | |

`set_type` to K-Ordered | N log N | 1 | 1 | 1 | |

`set_type` to KV-Ordered | N log N | N log N | N log N | 1 | |

Modify Ops | |||||

`put` | N | N | log N | log N | |

`put_items` | (N+M)log M + N | M(log M + log N) + N | M(log M + log N) + N | M(log M + log N) + N | |

`increment` `decrement` | N | N | log N | log N | |

`clear` | 1 | 1 | 1 | 1 | |

`remove_by_key` | Base | N | N | log N | log N |

+Index | +N | +N | +N | +N | |

+Ordered Map | +U log U | ||||

`remove_by_index` | Base | L log N + N | N | 1 | 1 |

+Rank | +N | +N | +N | +log N | |

+Ordered Map | +U log U | ||||

`remove_by_rank` | Base | R log N + N | R log N + N | R log N | 1 |

+Index | +N | +1 | +1 | +1 | |

+Ordered Map | +U log U | ||||

`remove_by_key_interval` | Base | N + M | N + M | log N + M | log N + M |

+Rank | +N log N | +N log | +N log N | +N log N | |

+Ordered Map | +U log U | ||||

`remove_by_index_range` | Base | L log N + N | N + M | M | M |

+Rank | +N log N | +N log | +N log N | +N log N | |

+Ordered Map | +U log U | ||||

`remove_by_value_interval` `remove_all_by_value` | Base | N + M | N + M | N + M | log N + M |

+Index | +N + M | +M | +M | +M | |

+Ordered Map | +U log U | ||||

`remove_by_rank_range` | Base | R log N + N | R log N + N | R log N | M |

+Index | +N log N | +M | +M | +M | |

+Ordered Map | +U log U | ||||

`remove_all_by_key_list` | Base | (M+N)logM + N | M log N + N | M log N | M log N |

+Ordered Map | +U log U | ||||

`remove_all_by_value_list` | Base | (M+N)logM + N | (M+N)log M + N | (M+N)log M | M log N |

+Ordered Map | +U log U | ||||

Read Ops | |||||

`size` | 1 | 1 | 1 | 1 | |

`get_by_key` | Base | N | N | log N | log N |

+Index | +N | +N | +N | +log N | |

+Ordered Map | +U log U | ||||

`gete_by_index` | Base | L log N + N | N | 1 | 1 |

+Rank | +N | +N | +N | +log N | |

`get_by_index_range` | Base | L log N + N | N + M | M | M |

+Rank | +N log N | +N log | +N log N | +N log N | |

+Ordered Map | +U log U | ||||

`get_by_rank` | Base | R log N + N | R log N + N | R log N | 1 |

+Index | +N | +1 | +1 | +1 | |

+Ordered Map | +U log U | ||||

`get_by_key_interval` | Base | N | N | log N + M | log N + M |

+Rank | +N log N | +N log | +N log N | +N log N | |

+Ordered Map | +U log U | ||||

`get_by_value_interval` `get_all_by_value` | Base | N + M | N + M | N + M | log N + M |

+Index | +N + M | +M | +M | +M | |

+Ordered Map | +U log U | ||||

`get_by_rank_range` | Base | R log N + N | R log N + N | R log N | M |

+Index | +N log N | +M | +M | +M | |

+Ordered Map | +U log U | ||||

`get_all_by_key_list` | Base | (M+N)logM + N | M log N + N | M log N | M log N |

+Ordered Map | +U log U | ||||

`get_all_by_value_list` | Base | (M+N)logM + N | (M+N)log M + N | (M+N)log M | M log N |

+Ordered Map | +U log U | ||||

Mode Modifiers | |||||

Not-in-memory On-Disk | +D + W | ||||

Data-in-memory On-Disk | +W |

## Legend

`1`

: Unordered Map`2`

: K-Ordered or KV-Ordered map, namespace stores its data on SSD or PMem`3`

: K-Ordered Map, namespace stores its data in memory`4`

: KV-Ordered Map, namespace stores its data in memory`D`

: Cost of load from disk`W`

: Cost of write to disk`N`

: Count of elements in the Map`M`

: Count of elements in the range/interval or elements added to the Map`R`

: for rank r, R = min(r, N - r - 1)`L`

: for index i, L = min(i, N - i - 1)`C`

: Cost of memcpy on packed Map`U`

: Count of elements in the result when return type is Ordered Map

Every modify op has a +`C`

for copy on write for allowing rollback on failure.

## Notes

- In an in-memory namespace, K-ordered maps have an offset index, KV-ordered maps have a full index.
- When a namespace is data-on-SSD or data-in-PMem, choosing column (2)
**K-ordered gives the best performance**for all map operations at the cost of 4 extra bytes on disk. This is because maps in these namespaces do not carry indexes around (to save space on disk) and because being K-ordered has no performance drawbacks for all map operations. - The interval-based operations have parameters [start, stop) where the start value is inclusive and stop value is exclusive. They are generally faster than range-based operations when there is no map ordering and/or when data is not in memory (due to lack of map index metadata).
- The range-based operations with parameters (rank, count) or (index, count) are very fast for ordered maps within data-in-memory namespaces.
- Key-based operations are those with the suffixes
`by_index`

,`by_index_range`

and suffixes with the word`key`

. These operations generally incur additional performance penalties when returning`rank`

type results. They are usually sped up by being K-ordered and having offset index metadata (available when namespace is data-in-memory). - Value-based operations are those with the suffixes
`by_rank`

,`by_rank_range`

and suffixes with the word`value`

. These operations generally incur additional performance penalties when returning`index`

type results. They are usually sped up by being KV-ordered and having full index metadata (namespace is data-in-memory). - The Ordered Map return types (added in Database 6.3) carries the extra cost of ordering the result when the Map operation worked on Unordered Map bin data. There's no extra cost when the Map is already K-Ordered or KV-Ordered. There's no extra cost for the Unordered Map return type, because the result is returned as-is.
- Reminder: worst-case, average-case and best-case performance are different categories. You should compare like to like.

## Performance Analysis Example

Consider a storage model where the map key is a name and the map value is the score associated with the name.

`map: {name1: score1, name2: score2, ...}`

Let's say you want to optimize performance for getting the names with the top 10 scores. The operation for getting the top 10 would be `get_by_rank_range`

.

### Data in memory

From the complexities table, one can see choosing KV-Ordered (4) for `get_by_rank_range`

results in O(M) performance. Since this is data-in-memory, KV-Ordered has full map indexing. Also, because we're getting top 10, the performance is O(10) or constant time.

There's is an additional +`C`

cost due to the copy on write nature of the database, and +`W`

if there is disk backing for this namespace.

Performance: O(1) + C [+ W]

### Return type extra cost

Looking at `get_by_rank_range`

again, if the return type was index, there would be extra performance cost as stated on the row below `get_by_rank_range`

with `+Index`

under the `Return Type`

column.

For a K-ordered data-in-memory map for example, the base performance of `get_by_rank_range`

is:
O((R + M) log N)

For `get_by_rank_range(rank, count, opFlags=index)`

:
O((R + M) log N + M)