Skip to content

sendtoshailesh/aws-rds-data-masking-data-anonymization

Repository files navigation

Data Anonymization in AWS RDS PostgreSQL using pgcrypto

This repository contains code samples and documentation for implementing data anonymization in AWS RDS PostgreSQL instances using the built-in pgcrypto extension. Since PostgreSQL Anonymizer is not available in AWS RDS, this approach provides an alternative solution for masking and anonymizing sensitive data.

Overview

AWS RDS PostgreSQL doesn't support custom extensions like PostgreSQL Anonymizer, but it does include the pgcrypto extension, which provides cryptographic functions that can be used as building blocks for data anonymization. This repository demonstrates how to leverage pgcrypto to implement common anonymization techniques.

Contents

  • Blog Post - Detailed explanation of the approach and implementation
  • SQL Scripts - Ready-to-use SQL scripts for implementing anonymization
  • Examples - Example use cases and implementation scenarios

Quick Start

  1. Enable the pgcrypto extension in your AWS RDS PostgreSQL instance:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
  1. Create an anonymization schema to organize your functions:
CREATE SCHEMA IF NOT EXISTS anon;
  1. Implement basic anonymization functions:
-- Create a function to hash strings (one-way anonymization)
CREATE OR REPLACE FUNCTION anon.hash_string(text) 
RETURNS TEXT AS 
$$
  SELECT encode(digest($1, 'sha256'), 'hex');
$$ 
LANGUAGE SQL IMMUTABLE;

-- Create a function to partially mask email addresses
CREATE OR REPLACE FUNCTION anon.partial_email(email text)
RETURNS text AS
$$
BEGIN
  RETURN substring(email, 1, 2) || '******' || substring(email from position('@' in email));
END;
$$ 
LANGUAGE plpgsql IMMUTABLE;

-- Create a function to generate fake first names
CREATE OR REPLACE FUNCTION anon.fake_first_name()
RETURNS text AS
$$
DECLARE
  names text[] := ARRAY['John', 'Jane', 'Robert', 'Mary', 'William', 'Patricia', 'James', 'Jennifer', 'Michael', 'Linda', 
                        'David', 'Elizabeth', 'Richard', 'Barbara', 'Joseph', 'Susan', 'Thomas', 'Jessica', 'Charles', 'Sarah'];
BEGIN
  RETURN names[1 + (random() * (array_length(names, 1) - 1))::integer];
END;
$$
LANGUAGE plpgsql VOLATILE;

-- Create a function to generate fake last names
CREATE OR REPLACE FUNCTION anon.fake_last_name()
RETURNS text AS
$$
DECLARE
  names text[] := ARRAY['Smith', 'Johnson', 'Williams', 'Jones', 'Brown', 'Davis', 'Miller', 'Wilson', 'Moore', 'Taylor', 
                        'Anderson', 'Thomas', 'Jackson', 'White', 'Harris', 'Martin', 'Thompson', 'Garcia', 'Martinez', 'Robinson'];
BEGIN
  RETURN names[1 + (random() * (array_length(names, 1) - 1))::integer];
END;
$$
LANGUAGE plpgsql VOLATILE;
  1. Create masked views of your sensitive data:
CREATE VIEW masked_users AS 
SELECT 
  id, 
  anon.partial_email(email) AS email, 
  anon.fake_first_name() AS first_name, 
  anon.fake_last_name() AS last_name, 
  anon.hash_string(password) AS password_hash 
FROM users;

Use Cases

  • Development and Testing: Provide realistic but anonymized data for development and testing environments
  • Data Sharing: Share data with third parties while protecting sensitive information
  • Compliance: Meet regulatory requirements like GDPR, HIPAA, and CCPA
  • Analytics: Allow analysis of data patterns without exposing PII

Limitations

  • Less comprehensive than dedicated anonymization extensions
  • Requires manual implementation of anonymization functions
  • Limited built-in masking techniques
  • No dynamic masking based on user roles

Contributing

Contributions to improve the anonymization functions or add new techniques are welcome. Please submit pull requests or open issues to suggest improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

many AWS users face a significant limitation: PostgreSQL Anonymizer is not available/compatible with Amazon Aurora or RDS PostgreSQL instances. This repo specifically addresses that gap, providing alternative solutions for AWS environments where custom extensions cannot be installed.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors