Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,7 @@
* [Defining flat views with view definitions](modules/sql-on-fhir/defining-flat-views-with-view-definitions.md)
* [Migrate to the spec-compliant ViewDefinition format](modules/sql-on-fhir/migrate-to-the-spec-compliant-viewdefinition-format.md)
* [Query data from flat views](modules/sql-on-fhir/query-data-from-flat-views.md)
* [De-identification](modules/sql-on-fhir/de-identification.md)
* [Reference](modules/sql-on-fhir/reference.md)
* [Integration Toolkit](modules/integration-toolkit/README.md)
* [C-CDA / FHIR Converter](modules/integration-toolkit/ccda-converter/README.md)
Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,7 @@
* [Defining flat views with view definitions](modules/sql-on-fhir/defining-flat-views-with-view-definitions.md)
* [Migrate to the spec-compliant ViewDefinition format](modules/sql-on-fhir/migrate-to-the-spec-compliant-viewdefinition-format.md)
* [Query data from flat views](modules/sql-on-fhir/query-data-from-flat-views.md)
* [De-identification](modules/sql-on-fhir/de-identification.md)
* [Reference](modules/sql-on-fhir/reference.md)
* [Integration Toolkit](modules/integration-toolkit/README.md)
* [C-CDA / FHIR Converter](modules/integration-toolkit/ccda-converter/README.md)
Expand Down
6 changes: 6 additions & 0 deletions docs/modules/sql-on-fhir/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ Once your flat view is defined and materialized, you can query data from it usin

See [Query data from flat views](./query-data-from-flat-views.md).

## De-identification

ViewDefinition columns can be annotated with de-identification methods to transform sensitive data during SQL generation. Supported methods include redact, cryptoHash, dateshift, encrypt, substitute, perturb, and custom PostgreSQL functions.

See [De-identification](./de-identification.md).

## SQL on FHIR reference

To dive deeper into the nuances of using SQL on FHIR in Aidbox, consult the reference page.
Expand Down
294 changes: 294 additions & 0 deletions docs/modules/sql-on-fhir/de-identification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# De-identification

Aidbox supports per-column de-identification in ViewDefinitions via a FHIR extension. When a column has a de-identification extension, the SQL compiler wraps the column expression with a PostgreSQL function that transforms the value before it reaches the output.

This works with all ViewDefinition operations: `$run`, `$sql`, and `$materialize`.

## Extension format

Add the de-identification extension to any column in the `select` array:

```json
{
"name": "birth_date",
"path": "birthDate",
"extension": [
{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [
{"url": "method", "valueCode": "dateshift"},
{"url": "dateShiftKey", "valueString": "my-secret-key"}
]
}
]
}
```

The extension uses sub-extensions for the method and its parameters. The `method` sub-extension is required and specifies which de-identification method to apply.

## Methods

### redact

Replaces the value with NULL. No parameters.

```json
{"url": "method", "valueCode": "redact"}
```

### cryptoHash

Replaces the value with its HMAC-SHA256 hash (hex-encoded). Deterministic — same input always produces the same hash. One-way, cannot be reversed.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| cryptoHashKey | string | yes | HMAC secret key |

```json
[
{"url": "method", "valueCode": "cryptoHash"},
{"url": "cryptoHashKey", "valueString": "my-hash-key"}
]
```

### dateshift

Shifts date and dateTime values by a deterministic offset derived from the resource id. All dates within the same resource shift by the same number of days, preserving temporal relationships. The offset range is -50 to +50 days.

Year-only values (`"2000"`) and year-month values (`"2000-06"`) cannot be shifted meaningfully and are replaced with NULL.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| dateShiftKey | string | yes | HMAC key used to compute the per-resource offset |

```json
[
{"url": "method", "valueCode": "dateshift"},
{"url": "dateShiftKey", "valueString": "my-shift-key"}
]
```

### encrypt

AES-128-CBC encrypts the value and returns a base64-encoded string. Reversible with the key. Uses a zero initialization vector for deterministic output — same plaintext always produces the same ciphertext.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| encryptKey | string | yes | Hex-encoded encryption key (8-32 hex characters, even length) |

```json
[
{"url": "method", "valueCode": "encrypt"},
{"url": "encryptKey", "valueString": "0123456789abcdef0123456789abcdef"}
]
```

### substitute

Replaces the value with a fixed string.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| replaceWith | string | yes | Replacement value |

```json
[
{"url": "method", "valueCode": "substitute"},
{"url": "replaceWith", "valueString": "REDACTED"}
]
```

### perturb

Adds random noise to numeric values. The result is non-deterministic — each query produces different output.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| span | decimal | no | Noise magnitude. Default: 1.0 |
| rangeType | code | no | `fixed` (absolute noise) or `proportional` (relative to value). Default: `fixed` |
| roundTo | integer | no | Decimal places to round to. 0 means integer. Default: 0 |

With `fixed` range type, noise is in the range ±span/2. With `proportional`, noise is ±(span × value)/2.

```json
[
{"url": "method", "valueCode": "perturb"},
{"url": "span", "valueDecimal": 10},
{"url": "rangeType", "valueCode": "fixed"},
{"url": "roundTo", "valueInteger": 0}
]
```

### custom_function

Applies a user-provided PostgreSQL function. The function must already exist in the database. Its first argument is the column value cast to text. An optional second argument can be passed via `custom_arg`.

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| custom_function | string | yes | PostgreSQL function name. Must match `^[a-zA-Z][a-zA-Z0-9_.]*$` |
| custom_arg | string, integer, decimal, boolean, or code | no | Optional second argument |

```json
[
{"url": "method", "valueCode": "custom_function"},
{"url": "custom_function", "valueString": "left"},
{"url": "custom_arg", "valueInteger": 4}
]
```

This example uses the built-in PostgreSQL `left` function to keep only the first 4 characters (e.g. extracting just the year from a date string).

## Example ViewDefinition

A complete ViewDefinition that de-identifies Patient data:

```json
{
"resourceType": "ViewDefinition",
"id": "deident-patients",
"name": "deident_patients",
"status": "active",
"resource": "Patient",
"select": [{
"column": [{
"name": "id",
"path": "id",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "cryptoHash"
}, {
"url": "cryptoHashKey",
"valueString": "patient-hash-key"
}]
}]
}, {
"name": "gender",
"path": "gender"
}, {
"name": "birth_date",
"path": "birthDate",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "dateshift"
}, {
"url": "dateShiftKey",
"valueString": "date-shift-key"
}]
}]
}]
}, {
"forEach": "name",
"select": [{
"column": [{
"name": "use",
"path": "use"
}, {
"name": "family",
"path": "family",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "redact"
}]
}]
}]
}]
}, {
"forEach": "address",
"select": [{
"column": [{
"name": "state",
"path": "state"
}, {
"name": "postal_code",
"path": "postalCode",
"extension": [{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [{
"url": "method",
"valueCode": "substitute"
}, {
"url": "replaceWith",
"valueString": "000"
}]
}]
}]
}]
}]
}
```

In this example:

- `id` is replaced with a consistent hash
- `gender` passes through unchanged (no extension)
- `birthDate` is shifted by a deterministic offset per patient
- `name.family` is redacted (NULL)
- `name.use` passes through (code values are not PHI)
- `address.state` passes through (safe at state level)
- `address.postalCode` is replaced with "000"

## Using the UI

The ViewDefinition builder in Aidbox UI includes a de-identification picker on each column. Click the shield icon next to a column's path to open the configuration popover.

When a de-identification method is configured, the shield icon turns blue. Hovering shows the current method name.

## Writing custom PostgreSQL functions

Custom functions referenced via `custom_function` must:

- Accept `text` as the first argument (the column value)
- Optionally accept a second argument of any type (passed via `custom_arg`)
- Already exist in the database before the ViewDefinition is executed

Example:

```sql
CREATE OR REPLACE FUNCTION my_mask(value text)
RETURNS text LANGUAGE sql IMMUTABLE PARALLEL SAFE AS $$
SELECT CASE
WHEN value IS NULL THEN NULL
ELSE left(value, 1) || repeat('*', greatest(length(value) - 1, 0))
END;
$$;
```

Then reference it in a column:

```json
{
"url": "http://health-samurai.io/fhir/core/StructureDefinition/de-identification",
"extension": [
{"url": "method", "valueCode": "custom_function"},
{"url": "custom_function", "valueString": "my_mask"}
]
}
```

## Security considerations

### Key management

Cryptographic keys (`cryptoHashKey`, `dateShiftKey`, `encryptKey`) are stored as plaintext strings inside the ViewDefinition resource. Anyone with read access to the ViewDefinition can see the keys.

Restrict access to ViewDefinition resources using [AccessPolicy](../../access-control/authorization/README.md) to ensure only authorized users can view or modify de-identification configurations.

### SQL injection prevention

The `custom_function` parameter is validated against `^[a-zA-Z][a-zA-Z0-9_.]*$` — only letters, digits, underscores, and dots are allowed. This validation happens both in the Aidbox UI and in the SQL compiler. String arguments passed via `custom_arg` are safely escaped by the SQL generator.

### Encryption limitations

The `encrypt` method uses AES-128-CBC with a zero initialization vector. This makes encryption deterministic — the same plaintext always produces the same ciphertext, which is useful for consistent de-identification but leaks frequency information. This is not suitable for general-purpose encryption.

See also:

- [Defining flat views with view definitions](./defining-flat-views-with-view-definitions.md)
- [$run operation](./operation-run.md)
- [$materialize operation](./operation-materialize.md)
Loading