Masking when enabled masks all columns by default unless specified not to mask.
Specification of what to unmask is specified using a configuration file. Masking needs to be enabled in redshiftbatcher configuration:
mask: true
maskSalt: sample-salt
maskFile: "/usr/inventory.yaml"Mask all the columns in all the tables in inventory database except the column id in customers table.
/usr/inventory.yaml
non_pii_keys:
customers:
- idConditional NonPiiKeys unmasks columns if it matches any of the pattern in the pattern list.
conditional_non_pii_keys:
customers:
email:
- '%example.com'
- '%exampledev.com'Dependent NonPiiKeys unmask a column based on the values of other columns.
dependent_non_pii_keys:
customers:
# dependentColumnName
first_name:
# providerColumn
last_name:
- 'Jones'
- 'Dhoni'Creates extra column containing the length or original column. email_length gets created containing the length of data in email column.
length_keys:
customers:
- emailMobile keys, if specified, the first 4 digits of E164 formatted mobile numbers will be copied into an additional column.
Eg: If mobile_number is +919812345678, +9198 is stored in mobile_number_init5
mobile_keys:
customers:
- mobile_numberMapping PII Keys, if specified adds new columns with the masked values and when this key is specified it overrides all the keys and unmasks all the other columns
Eg: id will be as it is(unmasked) and hashed_id would be added with masked values.
mapping_pii_keys:
establishments:
- idSpecify one or more columns in a table as Redshift Sort Key.
sort_keys:
customers:
- created_atSpecify one or more columns in a table as Redshift Disk Key.
dist_keys:
customers:
- account_idrestrict tables that are allowed to be sinked. The operator shrinks the kafkaTopicRegex listed tables further using include tables. This feature is supported only if you are using RedshiftSink operator.
For example: if kafkaTopicRegex: ts.inventory.* lists 10 tables, then include_tables will shrink it to two tables.
include_tables:
- customers
- ordersHelps in keeping free text columns masked and adds a boolean column giving boolean info about the kind of value in the free text column.
For example: We add a boolean column favourite_quote_has_philosphy.
If value in column favourite_quote matches the regex 'life|time', then the value in extra column favourite_quote_has_philosphy is true else false.
Regex match is case insensitive.
regex_pattern_boolean_keys:
customers:
favourite_quote:
has_philosphy: 'life|time'
has_text_funny: 'funny'