Data masking
This page describes Aerospike’s data masking feature, available in Database 8.1.1.
Data masking overview
Data masking obfuscates sensitive data, such as Personally Identifiable Information (PII), by applying dynamic transformations. Because this is dynamic data masking, the underlying data stored on disk remains unaltered and the data is obscured in real-time during queries. You can use encryption at rest to protect data in the storage devices.
Users without appropriate privileges are served the masked value, while authorized users maintain access to the unmodified original data. This provides a critical layer of protection against accidental data exposure within application results.
Administrators define a masking rule by selecting a data masking function. This function applies a dynamic transformation to a specific bin of a dataset located within a designated namespace.
Defining a data masking rule automatically enables it for all users except for those granted permission to unmask it.
The critical role of data masking
Data masking answers a number of critical issues in database security and management.
-
Regulatory compliance and security
Data masking is essential for adhering to various data privacy regulations and industry standards, such as:
- GDPR (General Data Protection Regulation): Requires protecting the personal data of EU residents.
-
Reserve Bank of India’s Cyber Security Framework: Mandates unauthorized access prevention for Indian financial institutions.
-
HIPAA (Health Insurance Portability and Accountability Act): Mandates the protection of patient health information (PHI).
-
PCI DSS (Payment Card Industry Data Security Standard): Requires protecting cardholder data.
Risk mitigation against unauthorized data exposure and legal penalties is achieved by applying a dynamic transformation to specific bins of PII, such as credit card details, Social Security, or Aadhaar numbers. Data masking ensures that sensitive data within a namespace is protected during real-time access.
-
-
Risk mitigation Masking minimizes the risk of a data breach when dealing with non-production environments.
-
In development or testing environments, developers, testers, and third-party vendors often need realistic data to ensure applications function correctly.
-
If they use copies of the real production data, any security oversight, accidental exposure, or malicious intent in these non-production systems could lead to a catastrophic data leak.
-
Data masking makes the data useless to an attacker even if the non-production environment is compromised, because the masked data is not the actual sensitive information.
Maintaining data utility and integrity
Unlike simply deleting or scrambling data randomly, effective data masking techniques ensure the masked data remains structurally and contextually similar to the original. This is vital because:
-
Application testing
The application still receives the correct data type, for example, a properly formatted email address or a valid-looking 16-digit credit card number, and data length, allowing features, performance, and logic to be tested accurately without using real customer data.
-
Enabling outsourcing and collaboration
When working with external vendors, contractors, or offshore teams for development, testing, or analytics, data masking is the best way to share necessary data for their tasks without granting them access to actual customer secrets. It enables secure collaboration and ensures proprietary or private customer data stays within the organization’s control.
-
When to use data masking
- You need to copy data to a less secure environment (dev, test, vendor).
- You need to prevent PII from being visible to human users and unauthorized applications.
Masking methods
Aerospike offers redactions and constant replacements for masking.
Redactions
Redaction replaces sensitive data elements with a placeholder character like an asterisk ’*’ or ‘X’, or a defined, irreversible pattern. It can be applied fully or partially.
You would use redactions in the following cases:
-
Verification
Used when an application needs to display a partial identifier for a user to confirm their identity, such as “Is this your credit card ending in 1234?”.
-
Compliance
Used to meet regulatory requirements (like PCI DSS) that mandate obscuring the majority of a card number from display, while showing only the last few digits as shown in the following example.
Constants (nulling out / fixed value)
The constant masking method replaces every original value in a sensitive bin with the exact same predefined fixed value or string across all records.
You would use constants in the following cases:
-
Complete removal of PII
Used when a sensitive field like a customer’s specific name as shown in the following example. This provides no utility for the testing or development environment, and its complete, irreversible removal is the goal.
-
Eliminating variance
Used to enforce a field is ignored by downstream analysis. For instance, replacing all “State of Origin” values with “UNKNOWN” ensures testing won’t be skewed by regional data variance.
-
Unneeded Data
Ideal for scenarios where data is mandatory for the database schema but is irrelevant or dangerous in the non-production environment, such as replacing the LastLoginIPAddress with 127.0.0.1 or 0.0.0.0).
---value: Peter Griffinmasking rule: Replace with John Doeresult: John Doe---value: Malemasking rule: Replace with ""result: ""Masking-related audit logs
Masking events, such as creating or modifying rules, are logged by the masking info command to the audit trail under the action masking.
When a masking rule is violated by an operation, the detail string will log the information about the attempt.
Built-in masking functions
| Function | Parameter | Type |
|---|---|---|
| constant | value | string |
| constant | value | integer |
| constant | value | float |
| constant | value | boolean |
| redact | position, length, value | string |
Related topics
- Quickstart: Data masking - Step-by-step tutorial to configure and test data masking
- Configure access control - Learn about RBAC privileges including
masking-admin,read-masked, andwrite-masked - Audit trail - Configure audit logging for masking events
- XDR security - Data masking considerations for cross-datacenter replication
- System limits - Data masking limitations and constraints