This repository contains code samples and documentation for implementing data anonymization in AWS RDS PostgreSQL instances using the built-in pgcrypto extension. Since PostgreSQL Anonymizer is not available in AWS RDS, this approach provides an alternative solution for masking and anonymizing sensitive data.
AWS RDS PostgreSQL doesn't support custom extensions like PostgreSQL Anonymizer, but it does include the pgcrypto extension, which provides cryptographic functions that can be used as building blocks for data anonymization. This repository demonstrates how to leverage pgcrypto to implement common anonymization techniques.
- Blog Post - Detailed explanation of the approach and implementation
- SQL Scripts - Ready-to-use SQL scripts for implementing anonymization
- Examples - Example use cases and implementation scenarios
- Enable the pgcrypto extension in your AWS RDS PostgreSQL instance:
CREATE EXTENSION IF NOT EXISTS pgcrypto;- Create an anonymization schema to organize your functions:
CREATE SCHEMA IF NOT EXISTS anon;- Implement basic anonymization functions:
-- Create a function to hash strings (one-way anonymization)
CREATE OR REPLACE FUNCTION anon.hash_string(text)
RETURNS TEXT AS
$$
SELECT encode(digest($1, 'sha256'), 'hex');
$$
LANGUAGE SQL IMMUTABLE;
-- Create a function to partially mask email addresses
CREATE OR REPLACE FUNCTION anon.partial_email(email text)
RETURNS text AS
$$
BEGIN
RETURN substring(email, 1, 2) || '******' || substring(email from position('@' in email));
END;
$$
LANGUAGE plpgsql IMMUTABLE;
-- Create a function to generate fake first names
CREATE OR REPLACE FUNCTION anon.fake_first_name()
RETURNS text AS
$$
DECLARE
names text[] := ARRAY['John', 'Jane', 'Robert', 'Mary', 'William', 'Patricia', 'James', 'Jennifer', 'Michael', 'Linda',
'David', 'Elizabeth', 'Richard', 'Barbara', 'Joseph', 'Susan', 'Thomas', 'Jessica', 'Charles', 'Sarah'];
BEGIN
RETURN names[1 + (random() * (array_length(names, 1) - 1))::integer];
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Create a function to generate fake last names
CREATE OR REPLACE FUNCTION anon.fake_last_name()
RETURNS text AS
$$
DECLARE
names text[] := ARRAY['Smith', 'Johnson', 'Williams', 'Jones', 'Brown', 'Davis', 'Miller', 'Wilson', 'Moore', 'Taylor',
'Anderson', 'Thomas', 'Jackson', 'White', 'Harris', 'Martin', 'Thompson', 'Garcia', 'Martinez', 'Robinson'];
BEGIN
RETURN names[1 + (random() * (array_length(names, 1) - 1))::integer];
END;
$$
LANGUAGE plpgsql VOLATILE;- Create masked views of your sensitive data:
CREATE VIEW masked_users AS
SELECT
id,
anon.partial_email(email) AS email,
anon.fake_first_name() AS first_name,
anon.fake_last_name() AS last_name,
anon.hash_string(password) AS password_hash
FROM users;- Development and Testing: Provide realistic but anonymized data for development and testing environments
- Data Sharing: Share data with third parties while protecting sensitive information
- Compliance: Meet regulatory requirements like GDPR, HIPAA, and CCPA
- Analytics: Allow analysis of data patterns without exposing PII
- Less comprehensive than dedicated anonymization extensions
- Requires manual implementation of anonymization functions
- Limited built-in masking techniques
- No dynamic masking based on user roles
Contributions to improve the anonymization functions or add new techniques are welcome. Please submit pull requests or open issues to suggest improvements.
This project is licensed under the MIT License - see the LICENSE file for details.