For some applications, it would be nice to have a mapping back from token/mask (redacted value) to the original text, e.g:
patient [NAAM-1] met tel. nr. [TELEFOONNUMMER-1]
should have a mapping:
{'NAAM-1': 'Jan Jansen', 'TELEFOONNUMMER-1': '0612345678'}
That allows re-identification downstream. This requires some thought, because:
- Annotations are frozen dataclasses, so it's hard to set a value on them (i.e. the mask)
- It's not really straightforward to enforce all
docdeid.process.Redactor to add a mask to an annotations. For redactors like the RedactAllText, this does not even make sense.
- The current Deduce redactor does fuzzy matching, so a token potentially maps to multiple original values
For some applications, it would be nice to have a mapping back from token/mask (redacted value) to the original text, e.g:
patient [NAAM-1] met tel. nr. [TELEFOONNUMMER-1]should have a mapping:
{'NAAM-1': 'Jan Jansen', 'TELEFOONNUMMER-1': '0612345678'}That allows re-identification downstream. This requires some thought, because:
docdeid.process.Redactorto add a mask to an annotations. For redactors like theRedactAllText, this does not even make sense.