senzing-mapper

Overview

The Rzolut mapper converts Rzolut compliance/risk dataset records (JSONL) into Senzing-compatible JSON for entity resolution. It handles PEP, sanctions, watchlists, enforcement, and adverse press data.

Repository Contents

src/
  rzolut_mapper.py      # The mapper
  rzolut_codes.csv      # Identifier type classification (338 types)
sample/
  input.jsonl           # Example Rzolut source records
  input_pretty.json     # Pretty-printed example input
  output.json           # Example mapped Senzing JSON output
  output_pretty.json    # Pretty-printed example output

Usage

python3 src/rzolut_mapper.py -i <input_file> -o <output_file> -d <data_source_code> [-l <log_file>]

Arguments:

-i, --input_file -- Path to the Rzolut JSONL input file
-o, --output_file -- Path for the mapped Senzing JSON output
-d, --data_source -- Data source code (e.g., RZOLUT)
-l, --log_file -- (Optional) Path to write processing statistics as JSON

Example:

python3 src/rzolut_mapper.py -i data/rzolut_full.jsonl -o output/rzolut.json -d RZOLUT -l output/stats.json

Prerequisites

Python 3.10+
No external dependencies (stdlib only)

Identifier Codes (`src/rzolut_codes.csv`)

The mapper uses rzolut_codes.csv to classify identifier types. Each row maps a Rzolut identifier name to a Senzing feature type.

CSV columns:

Column	Description
`num`	Sequential row number
`code_type`	Always "Identifier"
`code`	The identifier name as it appears in the Rzolut data
`country`	Country where this identifier is used
`subject_type`	Individual, Organization, etc.
`senzing_feature`	Senzing feature type (NATIONAL_ID, TAX_ID, PASSPORT, etc.)
`senzing_type_value`	Sub-type value (AADHAAR, CPF, CIN, etc.)
`disposition`	How the mapper handles this type (see below)
`notes`	Description of the identifier

Dispositions:

FEATURE -- Mapped to a Senzing feature for entity resolution (e.g., PASSPORT, NATIONAL_ID, TAX_ID)
PAYLOAD -- Stored as record payload for human review, not used in matching (e.g., arrest warrant numbers, certificate numbers)
MISSING -- Auto-added by the mapper when an unknown identifier type is encountered; needs manual classification

Handling MISSING Codes

If the mapper encounters identifier types not in the CSV, it:

Stores the value as record payload (safe default -- no impact on entity resolution)
Appends the new type to rzolut_codes.csv with disposition MISSING
Prints a message at the end: N new identifier codes added to rzolut_codes.csv (disposition: MISSING)

What to do: Open rzolut_codes.csv, find the rows with disposition MISSING, and classify them:

Set senzing_feature and senzing_type_value and change disposition to FEATURE if the identifier is useful for entity resolution
Change disposition to PAYLOAD if it is not useful for matching

Statistics Log File

When you use the -l flag, the mapper writes a JSON file with processing statistics including:

!IDTYPE -- Counts and examples of every identifier type encountered
!INFO / BAD_DATE -- Any dates that could not be parsed

This is useful for reviewing the distribution of identifier types in your data and confirming that the codes CSV covers your dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
sample		sample
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

senzing-mapper

Overview

Repository Contents

Usage

Prerequisites

Identifier Codes (`src/rzolut_codes.csv`)

Handling MISSING Codes

Statistics Log File

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

senzing-mapper

Overview

Repository Contents

Usage

Prerequisites

Identifier Codes (src/rzolut_codes.csv)

Handling MISSING Codes

Statistics Log File

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Identifier Codes (`src/rzolut_codes.csv`)

Packages