Rack Awareness
In many cloud environments, it is a best practice to build a cluster which spans multiple availability zones. Aerospike’s Rack Awareness is a good fit when you need to split the database across racks or zones. For example, if you configure a replication-factor of 2, the master copy of the partition and its replica will be stored on separate hardware failure groups. Rack Awareness provides a mechanism that allows database clients to read on a preferential basis from servers in their closely rack or zone, which can significantly reduce traffic charges by limiting cross-AZ traffic as well as provide lower latency and increased stability.
For more details, visit Rack Awareness.
To add Rack Awareness
Rack specific config for the Aerospike cluster CR file.
rackConfig:
namespaces:
- test
racks:
- id: 1
zone: us-central1-b
aerospikeConfig:
service:
proto-fd-max: 18000
storage:
filesystemVolumePolicy:
initMethod: deleteFiles
cascadeDelete: true
blockVolumePolicy:
cascadeDelete: true
volumes:
- storageClass: ssd
path: /opt/aerospike
volumeMode: filesystem
sizeInGB: 1
- path: /opt/aerospike/data
storageClass: ssd
volumeMode: filesystem
sizeInGB: 3
- id: 2
zone: us-central1-a
aerospikeConfig:
service:
proto-fd-max: 16000
.
.
.
aerospikeConfig:
service:
feature-key-file: /etc/aerospike/secret/features.conf
security:
enable-security: true
namespaces:
- name: test
memory-size: 3000000000
replication-factor: 1
storage-engine:
type: device
files:
- /opt/aerospike/data/test.dat
filesize: 2000000000
data-in-memory: true
- name: testMem
memory-size: 3000000000
replication-factor: 1
storage-engine:
type: memory
Get full CR file here.
Deploy the cluster
Follow the instructions here to deploy this configuration.
Cluster node distribution in racks
Cluster nodes are distributed across racks as evenly as possible. The cluster size is divided by the number of racks to get nodes per rack. If there are remainder nodes, they are distributed one by one across racks starting from first rack.
For e.g.
Nodes: 10, Racks: 4
Topology:
- NodesForRack1: 3
- NodesForRack2: 3
- NodesForRack3: 2
- NodesForRack4: 2
Adding a new rack in config
Add a new rack section in config yaml file under rackConfig.racks
section and apply config file using kubectl.
rackConfig:
namespaces:
- test
racks:
.
.
.
- id: 3
zone: us-central1-c
Operator redistribute cluster nodes across racks whenever cluster size is updated or number or racks is changed. If user adds a rack without increasing cluster size then old racks will be scaled down and new new rack will be scaled up based on new nodes redistribution.
Setting rack lavel storage and aerospikeConfig
Rack also provide for setting local storage and aerospikeConfig. If local storage is given for rack then rack will use this storage otherwise common global storage will be used. Here aerospikeConfig is config patch which will be merged with common global aerospikeConfig and will be used for rack.
rackConfig:
namespaces:
- test
racks:
- id: 1
zone: us-central1-b
aerospikeConfig:
service:
proto-fd-max: 18000
storage:
filesystemVolumePolicy:
initMethod: deleteFiles
cascadeDelete: true
blockVolumePolicy:
cascadeDelete: true
volumes:
- storageClass: ssd
path: /opt/aerospike
volumeMode: filesystem
sizeInGB: 1
- path: /opt/aerospike/data
storageClass: ssd
volumeMode: filesystem
sizeInGB: 3
Merging AerospikeConfig
Local rack AerospikeConfig patch will be merged with common global base AerospikeConfig using given rules.
- New elements from the patch configMap then it will be added to base configMap
- Base element will be replaced with a new patch element if
- Element value type is changed
- Element value is a primitive type and updated
- Element value is primitive list type and updated
- Element key is
storage-engine
and its storage-engine type has been changed. (storage-engine can be ofdevice
,file
, andmemory
type.
- If element is of map type then patch and base elements will be recursively merged
- If elements are list of maps then new list elements in the patch list will be appended to the base list and corresponding entries will be merged using the same merge algorithm. Here the order of elements in the base list will be maintained. (corresponding entries are found by matching the special
name
key in maps. Here this list of maps is actually a map of map and main map keys are added in sub-map with key asname
to convert map of maps to a list of maps).
e.g.
Rack local aerospikeConfig and common global aerospikeConfig
rackConfig:
racks:
aerospikeConfig:
service:
proto-fd-max: 18000
namespaces:
- name: test
storage-engine:
type: device
devices:
- /dev/nvme0n2 /dev/sdf2
- name: bar
memory-size: 6000000000
storage-engine:
type: memory
.
.
.
aerospikeConfig:
service:
feature-key-file: /etc/aerospike/secret/features.conf
security:
enable-security: true
namespaces:
- name: test
memory-size: 3000000000
replication-factor: 2
storage-engine:
type: device
devices:
- /dev/nvme0n1 /dev/sdf
- name: bar
memory-size: 3000000000
replication-factor: 2
storage-engine:
type: device
devices:
- /dev/nvme0n10 /dev/sdf10
After merging rack local aerospikeConfig
aerospikeConfig:
service:
proto-fd-max: 18000
feature-key-file: /etc/aerospike/secret/features.conf
security:
enable-security: true
namespaces:
- name: test
memory-size: 3000000000
replication-factor: 2
# storage-engine type is not changed hence its merged recursively
storage-engine:
type: device
devices:
- /dev/nvme0n2 /dev/sdf2
- name: bar
memory-size: 6000000000
replication-factor: 2
# storage-engine type is changed hence its replaced
storage-engine:
type: memory
Removing rack from config
Remove the desired rack section in config yaml file under rackConfig.racks
section and apply config file using kubectl.
This will try to scale down the desired rack to size 0. One node at a time will be removed from rack. After removing all the nodes from the rack, rack will also be removed. If user is removing rack without decreasing cluster size then other racks will be scaled up based on new node redistribution.
Simultaneously add and remove rack
If operator has to scale up some racks and scale down some other racks in single call then operator will always scale up first then it will scale down the racks. Hence for a short duration actual cluster size may be more than desired cluster size.