To add privacy-preserving PII redaction pipeline prototype

Hi @Gautam-Rajeev , I went through the README.md, and realized that preserving the privacy in the data is necessary we pipeline them into training the model.
So, I added a minimal PII redaction pipeline( #12  prototype as of now, I would love the chance to make it production grade given the chance) that handles common identifiers: emails, phone numbers, credit cards, UUIDs, and URLs with tokens. The pipeline normalizes heterogeneous log entries into a canonical event schema, audits for PII, applies consistent placeholder redaction, and exports SFT and DPO-ready datasets.

Also, I'm an dual degree IIT Madras student and I would really want to contribute to this, looking forward to it


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

To add privacy-preserving PII redaction pipeline prototype #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

To add privacy-preserving PII redaction pipeline prototype #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions