What is data tokenization?
Data tokenization is a process applied to datasets to protect sensitive information, most commonly data at rest. This technique replaces sensitive data with non-sensitive, stand-in data known as token data. The token data will be in the same format as the original data. For example, tokenized credit card numbers would still show in a sixteen-digit format.
With data tokenization, non-sensitive token data remains in the dataset, while the token’s reference to the original sensitive data is often stored securely outside of the system in a token server. When the original sensitive data is needed again, the token data’s relationship with the original sensitive data can be looked up on the token server; this is called a detokenization process.
Types of data tokenization
Companies have two options for data tokenization, which differ in the speeds needed for detokenization.
- Vault tokenization: Vault tokenization stores the relationship data between the original sensitive data and its corresponding token in a secure, separate token server vault to be referenced when the original data is needed. Detokenizing this data can take a while, so if detokenization has to happen at scale, companies can consider vaultless tokenization.
- Vaultless tokenization: Vaultless tokenization does not utilize a token server vault but rather keeps the data where it is but applies tokenization using cryptographic devices. Companies might choose this method to avoid long processing times to lookup relationships in a token server vault.
Benefits of using data tokenization
Data tokenization techniques are most commonly used in the payment processing industry. The Payment Card Industry Data Security Standard (PCI-DSS) requires that sensitive data such as credit card numbers be protected and that tokenization is recognized as a method to achieve this. However, data tokenization can be used to protect any kind of sensitive data.
One common use case for data tokenization methods is protecting patients' health information.
Companies use data tokenization to:
- Meet industry security standards: Companies use data tokenization to meet industry security standard requirements, such as protecting sensitive payment data to meet PCI-DSS compliance requirements.
- Reduce data misuse: Tokenized data removes a risk factor of misused or abused data. For example, without access to the token vault or cryptographic devices to detokenize data, the data is rendered useless outside of its system. For example, a bad actor cannot make purchases using tokenized credit card information.
- Improve customer confidence: Customers want to know that important information such as their payment information is securely stored. Data tokenization can help customers feel confident that the companies they do business with protect their data.
Impacts of using data tokenization
The most common impact of using data tokenization techniques is to secure sensitive information.
- Reduce threat vectors: Tokenizing data reduces the ability of bad actors to misuse sensitive data.
- Reduce the need for advanced security controls: Using tokenized data can reduce the amount of sensitive data that requires more advanced security controls.
Data tokenization vs. data masking
Data tokenization is more commonly used to protect data at rest. This technique can introduce wait times when having to reference the relationship between the token data and the original sensitive data in the token server vault.
Data masking is more commonly used to protect data in use, most commonly in production environments for testing or in applications used by employees with limited access to actual data.