# Troubleshooting startup problems

This page describes startup problems, and how to handle them.

## Overview

The `asd` process (daemon) is the main Aerospike process that gets started. In System V Linux variants, this is done using `/etc/init.d/aerospike start`. In SystemD Linux variants this is done through `systemctl`. See [Aerospike Daemon Management](https://aerospike.com/docs/database/8.0.0/manage/database/daemon).

## The asd daemon will not start

### Server aborts because it could not allocate enough shared memory

**ISSUE**

In Aerospike Database 7.0.0 or later, if you’re using an in-memory namespace, the following error means that your operating system’s `kernel.shmmax` or `kernel.shmall` is set too low to pre-allocate the in-memory data storage for this namespace.

```plaintext
May 09 2024 22:31:59 GMT: CRITICAL (drv-mem): (drv_mem_ee.c:1078) {test} could not allocate 1342177280-byte shmem stripe

May 09 2024 22:31:59 GMT: WARNING (as): (signal.c:259) SIGUSR1 received, aborting Aerospike Enterprise Edition build 7.0.0.8 os ubuntu20.0>
```

**SOLUTION** `kernel.shmmax` should be 1/8th the `data-size` if you’re using in-memory without storage-backed persistence. Otherwise, set it to at least the `filesize` or the `device` size of your storage-backed persistence.

Terminal window

```bash
sysctl -n kernel.shmmax

kernel.shmmax = 2147483648

# If data-size is 64GiB, 2GiB shmmax is too small for the 8GiB stripes

sysctl -w kernel.shmmax=17592186044416

sysctl -n kernel.shmall

kernel.shmall = 2097152

getconf PAGE_SIZE

4096

# 8GiB (2097152 * 4096) shmall is too small for a 64GiB data-size

sysctl -w kernel.shmall=4294967296
```

See [configuring namespace data storage](https://aerospike.com/docs/database/8.0.0/manage/namespace/storage/config) for more details.

### The header has not been zeroized

**ISSUE** If you try to start Aerospike Database 6.0.0 or later, the following error means that one of the devices configured as namespace storage is not recognized as an Aerospike device. This may be a good thing, as you would not want a misconfiguration to result in the server writing over the wrong device, such as the root partition.

```plaintext
Apr 13 2022 05:42:46 GMT: CRITICAL (drv_ssd): (drv_ssd.c:2216) /dev/nvme1n1p1: not an Aerospike device but not erased - check config or erase device
```

**SOLUTION**

1.  Verify that the devices in the namespace configuration are actually supposed to be used by the server.
    
2.  Verify that the devices have been properly [initialized](https://aerospike.com/docs/database/8.0.0/manage/planning/ssd/manage).
    

### Permission denied for /var/lock/subsys/aerospike

**ISSUE** You try to start `asd` and get the following error:

```plaintext
touch: cannot touch `/var/lock/subsys/aerospike`: Permission denied
```

**SOLUTION** Confirm you are starting the process as the correct user with the appropriate permissions, using `sudo` when needed.

::: note
You must be logged in as the root user to start the daemon.
:::

## Failed to get the feature-key

**ISSUE**

If you try to start `asd` and get the following error, you must provide a feature-key file:

```plaintext
Apr 09 2021 06:35:12 GMT: CRITICAL (config): (features_ee.c:142) failed to get feature key /etc/aerospike/features.conf
```

::: note
Starting with Database 6.1.0, a simple feature-key file is included. This feature-key file only allows deployment of a single-node cluster.
:::

**SOLUTION** For more information, see [Configuring the Feature-Key File](https://aerospike.com/docs/database/8.0.0/manage/planning/feature-key/).

## Problem with network interface

**ISSUE** The server won’t start due to inability to get the physical address, and you see the following message in the log file:

```plaintext
Jun 22 2014 02:34:10 GMT: WARNING (cf:misc): (id.c::249) Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno 19 No such device

Jun 22 2014 02:34:10 GMT: CRITICAL (config): (cfg.c:3363) could not get unique id and/or ip address

Jun 22 2014 02:34:10 GMT: WARNING (as): (signal.c::120) SIGINT received, shutting down

Jun 22 2014 02:34:10 GMT: WARNING (as): (signal.c::123) startup was not complete, exiting immediately
```

**SOLUTION**

1.  Check the name of your network interface:

Terminal window

```bash
ifconfig -a
```

2.  Specify the [`node-id-interface`](https://aerospike.com/docs/database/reference/config#service__node-id-interface) in the configuration.

The interface name in the following example is p2p1:

```plaintext
service {

        ...

          node-id-interface p2p1

        ...

    }

    ...
```

For more information on the configuration, see [Configuration Reference](https://aerospike.com/docs/database/reference/config).

## Not enough file descriptors error in log

**ISSUE** Look for the following message in the Aerospike log:

```plaintext
Aug 24 2012 16:43:10 GMT: INFO (as): (base/as.c:172) File descriptor limit is : 1024 and proto-fd-max is : 2048

Aug 24 2012 16:43:10 GMT: CRITICAL GLOBAL (as): (base/as.c:174) Not enough file descriptors, Starting with 1024 and needs 2048

critical error: backtrace: frame 0 /usr/bin/asd() [0x460cef]

critical error: backtrace: frame 1 /usr/bin/asd() [0x404b59]

critical error: backtrace: frame 2 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f620b53030d] critical error: backtrace: frame 3 /usr/bin/asd() [0x403e79]
```

At Aerospike server start, this value must not exceed the system’s file descriptor limit.

**SOLUTION** To avoid a startup problem, there are two alternatives:

-   Decrease the value of `proto-fd-max` in your Aerospike configuration file.
-   Increase the process file descriptor limit (Linux system setting).

Prior to Aerospike Database 4.9.0, for a dynamic change, this limit was enforced only if the new value was lower than the system setting.

## Stuck in defrag loop at startup

When Aerospike starts up it requires some percentage of storage available to join the cluster. We recommend keeping the SSD at 50% utilization to allow efficient defragmentation. The minimum percentage of storage that is required at startup depends on the value of [`defrag-startup-minimum`](https://aerospike.com/docs/database/reference/config#namespace__defrag-startup-minimum).

In Database 5.7.0 and later, this value is 0 by default, meaning the server will never get stuck in a defrag loop at startup. If nonzero, a typical value might be 10, which is the default in server versions prior to 5.7.0:

```plaintext
defrag-startup-minimum 10
```

If a node does not start up for a long time and the log file at /var/log/aerospike/aerospike.log shows something like:

```plaintext
Aug 22 2012 21:16:38 GMT: INFO (as): (base/as.c:265) waiting for defrag: namespace devices percent 0 waiting for 10

Apr 24 2013 17:19:59 GMT: INFO (drv_ssd): (storage/drv_ssd.c:1544) read_bin: could not read first

Apr 24 2013 17:20:00 GMT: WARNING (drv_ssd): (storage/drv_ssd.c:1390) **** ssd_read: record f8de4ae1d039c87 has no block associated, fail

Apr 24 2013 17:20:00 GMT: WARNING (drv_ssd): (storage/drv_ssd.c:1390) **** ssd_read: record f8de4ae1d039c87 has no block associated, fail
```

When the server starts, it tries to defrag and it won’t start until it has defragged enough space to get the space available to match the startup minimum. If the SSD is at too high a use percentage, the node may not be able to get enough free contiguous space to startup.

To get out of the defrag loop:

-   For an in-memory database with persistence, in general, the filesize should be 8x the amount of memory. For example, if the memory for data is 20 GB, then the filesize should be 160 GB. Increase the filesize to the 8x limit or higher. If your database has very high traffic, you may require higher than 8x.
-   To resolve this for nodes with SSDs, lower the high-water-memory-pct in the [Aerospike configuration file](https://aerospike.com/docs/database/8.0.0/manage/namespace/retention). Lower the high-water-memory-pct, so it starts evicting objects to create some free space. If the node still does not start up or does not evict, lower the high water percentage more so the node will evict more records and the server will be able to get enough free space to start up.

_Eviction_ is typically a back-stopping strategy – data should _expire_ before you reach the high water marks that trigger eviction. This problem is a symptom of a larger problem:

-   If your storage is insufficient, re-evaluate your storage/capacity strategy, as described in [Managing storage capacity](https://aerospike.com/docs/database/8.0.0/manage/namespace/storage/config).
-   Contact Aerospike for the capacity planning spreadsheet to help you re-configure your cluster capacity.

## With a non-standard network device, the server won’t start

**ISSUE**

On some distributions and with some unusual network devices, the server won’t start and the log shows the following error. This means the network interface name on the node is something other than “eth”, “bond”, or “wlan”.

```plaintext
Aug 22 2012 06:34:18 GMT: WARNING (cf:misc): (id.c:163) can't get physical address, tried eth, bond, wlan. fatal: 19 No such device
```

**SOLUTION**

Add the network-interface-name parameter to your configuration file. In the example below, the device is named vlan708:

```plaintext
network {

           service {

                   address 10.0.2.131

                   port 3000

                   network-interface-name vlan708

                   }

           }
```

If you have multiple network devices, such as `eth0` and `eth1`, Aerospike will choose one of them. A common situation is a node that has two Ethernet ports, one for the Internet and one for internal traffic. In this case, Aerospike needs to access the Internet port, but it may choose the wrong one, resulting in a node that does not see traffic correctly. To specify a specific network device, add the [`access-address`](https://aerospike.com/docs/database/reference/config#network__access-address) parameter to your configuration file.

## Warning messages on starting Aerospike

If you start Aerospike and get the following warnings, they are related to memory management in the Linux kernel.

1.  **Warning about SHMMAX:**

SHMMAX is the maximum size of a single shared memory segment. If you see the following output when you start a server node, it means that the system was configured with a shared memory maximum block size that’s less than the 1GB required by Aerospike. The start script dynamically raises the limit to 1GB.

```plaintext
sudo service aerospike start

kernel.shmmax too low, setting to 1GB

kernel.shmmax = 1073741824

Starting aerospike:     [OK]
```

Unless the machine is rebooted, you should only see this happen on the first start.

2.  **Warning about SHMALL:**

SHMALL is the sum of all shared memory segments on the whole system. If you see the following output when you start a server node, it means the start script dynamically raised the maximum number of shared memory pages. Unless the machine is rebooted, you should only see this happen on the first start. It’s possible to see both limits raised during a single (first) start.

```plaintext
sudo service aerospike start

kernel.shmall too low, setting to 4G pages

kernel.shmall = 4294967296

Starting aerospike:     [OK]
```

## Other problems

For other problems, check for consistent settings between all nodes in the cluster, including service and network settings. Make sure that the namespaces are configured the same on each node.

Verify that no firewall is interfering with communication between nodes (ports 3000 through 3004).