PII detection and transformation tools for LangChain, powered by Tonic Textual.
Detect sensitive data in text, JSON, HTML, and files — then synthesize it with realistic fakes, tokenize it with reversible placeholders, or extract the raw entities. Drop them into any LangChain chain or agent as standard tools.
pip install langchain-textualexport TONIC_TEXTUAL_API_KEY="your-api-key"from langchain_textual import TonicTextualRedactText
tool = TonicTextualRedactText()
tool.invoke("My name is John Smith and my email is john@example.com.")
# "My name is [NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx] and my email is [EMAIL_ADDRESS_xxxx]."| Tool | Input | Use for |
|---|---|---|
TonicTextualRedactText |
Plain text string | Synthesize or tokenize PII in raw text, .txt file contents |
TonicTextualRedactJson |
JSON string | Synthesize or tokenize PII in raw JSON, .json file contents |
TonicTextualRedactHtml |
HTML string | Synthesize or tokenize PII in raw HTML, .html/.htm file contents |
TonicTextualRedactFile |
File path | Synthesize or tokenize PII in PDFs, images (JPG, PNG), CSVs, TSVs |
TonicTextualExtractEntities |
Plain text string | Extract detected PII entities with type, value, location, and confidence |
TonicTextualPiiTypes |
None | List all supported PII entity types |
from langchain_textual import TonicTextualRedactText
tool = TonicTextualRedactText()
tool.invoke("My name is John Smith and my email is john@example.com.")
# "My name is [NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx] and my email is [EMAIL_ADDRESS_xxxx]."from langchain_textual import TonicTextualRedactJson
tool = TonicTextualRedactJson()
tool.invoke('{"name": "John Smith", "email": "john@example.com"}')
# '{"name": "[NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx]", "email": "[EMAIL_ADDRESS_xxxx]"}'from langchain_textual import TonicTextualRedactHtml
tool = TonicTextualRedactHtml()
tool.invoke("<p>Contact John Smith at john@example.com</p>")
# "<p>Contact [NAME_GIVEN_xxxx] [NAME_FAMILY_xxxx] at [EMAIL_ADDRESS_xxxx]</p>"from langchain_textual import TonicTextualRedactFile
tool = TonicTextualRedactFile()
tool.invoke({"file_path": "/path/to/scan.pdf"})
# "/path/to/scan_redacted.pdf"
tool.invoke({"file_path": "/path/to/photo.jpg", "output_path": "/tmp/redacted.jpg"})
# "/tmp/redacted.jpg"For .txt, .json, and .html/.htm files, read the file contents and pass them to the corresponding text, JSON, or HTML tool instead.
from langchain_textual import TonicTextualExtractEntities
tool = TonicTextualExtractEntities()
tool.invoke("My name is John Smith and my email is john@example.com.")
# '[{"label": "NAME_GIVEN", "text": "John", "start": 11, "end": 15, "score": 0.9}, ...]'Returns a JSON array of detected entities, each with label, text, start, end, and score fields.
All tools share the same configuration options.
Synthesis mode — replace PII with realistic fake data instead of placeholders:
tool = TonicTextualRedactText(generator_default="Synthesis")
tool.invoke("Contact Jane Doe at jane.doe@example.com.")
# "Contact Maria Chen at maria.chen@gmail.com."Per-entity control — set handling per PII type with generator_config:
tool = TonicTextualRedactText(
generator_default="Off",
generator_config={
"NAME_GIVEN": "Synthesis",
"NAME_FAMILY": "Synthesis",
"EMAIL_ADDRESS": "Redaction",
},
)
tool.invoke("Contact Jane Doe at jane.doe@example.com.")
# "Contact Maria Chen at chen@[EMAIL_ADDRESS_xxxx]."Use TonicTextualPiiTypes to list all supported entity type names:
from langchain_textual import TonicTextualPiiTypes
TonicTextualPiiTypes().invoke("")
# "NUMERIC_VALUE, LANGUAGE, MONEY, ..., EMAIL_ADDRESS, NAME_GIVEN, NAME_FAMILY, ..."Self-hosted deployment:
tool = TonicTextualRedactText(tonic_textual_base_url="https://textual.your-company.com")Explicit API key (instead of env var):
tool = TonicTextualRedactText(tonic_textual_api_key="your-api-key")Every tool in this package is a standard LangChain tool, so they work anywhere tools do. Give your agent whichever combination it needs:
from langchain_textual import (
TonicTextualRedactText,
TonicTextualRedactJson,
TonicTextualRedactFile,
TonicTextualExtractEntities,
)
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
llm = ChatOpenAI(model="gpt-4o-mini")
tools = [
TonicTextualRedactText(),
TonicTextualRedactJson(),
TonicTextualRedactFile(),
TonicTextualExtractEntities(),
]
agent = create_react_agent(llm, tools)# install dependencies
uv sync --group dev --group test --group lint --group typing
# install pre-commit hooks (auto-runs ruff on each commit)
uv tool install pre-commit
pre-commit install
# run unit tests
make test
# run integration tests (requires TONIC_TEXTUAL_API_KEY)
make integration_tests
# lint & format (run from the project root)
make lint
make formatNote: All make commands must be run from the project root (langchain-textual/), not from subdirectories like examples/.
MIT