From d2512c635204a1e9d6e617a7f7866c3179dc3d2f Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 08:36:19 -0400 Subject: [PATCH 01/23] Text edits. --- docs/source/index.rst | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 07bc9db..ce9df04 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -26,9 +26,9 @@ Before you get started, you must install the Textual Python SDK: Set up a Textual API key ------------------------ -To authenticate with Tonic Textual, you must set up an API key. After |signup_link|, to obtain an API key, go to the **User API Keys** page. +To authenticate with Tonic Textual, you must set up an API key. After |signup_link|, to obtain an API key, go to the **User API Keys** section of the **User Profile** page. -After, you obtain the key, you can optionally set it as an environment variable: +After you obtain the key, you can optionally set it as an environment variable: .. code-block:: bash @@ -40,7 +40,7 @@ You can can also pass the API key as a parameter when you create your Textual cl Creating a Textual client -------------------------- -To redact text or files, use our TextualNer client. To parse files, which is useful for extracting information from files such as PDF and DOCX, use our TextualParse client. +To redact text or files, use the TextualNer client. To parse files, which is useful for extracting information from files such as PDF and DOCX, use the TextualParse client. .. code-block:: python @@ -50,14 +50,14 @@ To redact text or files, use our TextualNer client. To parse files, which is use textual = TextualNer() textual = TextualParse() -Both client support several optional arguments: +Both clients support the following optional arguments: -* **base_url** - The URL of the server that hosts Tonic Textual. Defaults to https://textual.tonic.ai +* **base_url** - The URL of the server that hosts Tonic Textual. Default: `https://textual.tonic.ai` -* **api_key** - Your API key. If not specified, you must set TONIC_TEXTUAL_API_KEY in your environment. +* **api_key** - Your API key. If not specified, you must set `TONIC_TEXTUAL_API_KEY` in your environment. -* **verify** - Whether to verify SSL certification. Default is true. +* **verify** - Whether to verify SSL certification. Default: `true` .. |signup_link| raw:: html - creating your account + you create your account From 20ebdde6a26000d4f743dd286c47c39ee36fe94d Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 08:38:25 -0400 Subject: [PATCH 02/23] Fix code highlighting. --- docs/source/index.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index ce9df04..12feaf9 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -52,11 +52,11 @@ To redact text or files, use the TextualNer client. To parse files, which is use Both clients support the following optional arguments: -* **base_url** - The URL of the server that hosts Tonic Textual. Default: `https://textual.tonic.ai` +* **base_url** - The URL of the server that hosts Tonic Textual. Default: ``https://textual.tonic.ai`` -* **api_key** - Your API key. If not specified, you must set `TONIC_TEXTUAL_API_KEY` in your environment. +* **api_key** - Your API key. If not specified, you must set ``TONIC_TEXTUAL_API_KEY`` in your environment. -* **verify** - Whether to verify SSL certification. Default: `true` +* **verify** - Whether to verify SSL certification. Default: ``true`` .. |signup_link| raw:: html From 50485db7a5f638103d41b8ec92f98a9e88ee01d0 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 08:49:34 -0400 Subject: [PATCH 03/23] Text edits --- docs/source/audio/generate_redacted_audio.rst | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/docs/source/audio/generate_redacted_audio.rst b/docs/source/audio/generate_redacted_audio.rst index 92c8abe..66bf333 100644 --- a/docs/source/audio/generate_redacted_audio.rst +++ b/docs/source/audio/generate_redacted_audio.rst @@ -1,6 +1,8 @@ Generate redacted audio files ============================= -Textual can also generated a redacted audio file, where PII are replaced with 'beeps'. This can be accomplished via our :meth:`redact_audio_file` method. +Textual can generate a redacted audio file, where sensitive content is replaced with 'beeps'. + +To do this, use the :meth:`redact_audio_file` method. .. code-block:: python @@ -15,8 +17,10 @@ Textual can also generated a redacted audio file, where PII are replaced with 'b textual.redact_audio('input.mp3','output.mp3', generator_config=gc, generator_default='Off') -.. rubric:: Additional Remarks +.. rubric:: Additional remarks + +Before you call this method, in addition to the ``tonic_textual`` library, you must install pydub. -Calling this method requires that pydub be installed in addition to the tonic_textual library. +When you use Textual Cloud (https://textual.tonic.ai), file uploads are limited to 25MB or less. -When using the Textual Cloud (https://textual.tonic.ai) file uploads are limited to 25MB or less. Supported file types are m4a, mp3, webm, mpga, wav. \ No newline at end of file +Textual supports the following audio file types: m4a, mp3, webm, mpga, wav From 36446c92ea1503c5ae164434a14166142347cd03 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 09:20:20 -0400 Subject: [PATCH 04/23] Text edits. --- docs/source/audio/generate_transcript.rst | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/docs/source/audio/generate_transcript.rst b/docs/source/audio/generate_transcript.rst index a17fa67..fa8c5db 100644 --- a/docs/source/audio/generate_transcript.rst +++ b/docs/source/audio/generate_transcript.rst @@ -1,7 +1,8 @@ Generate transcript =================== -Textual can also generated a transcript from an audio file. This can be accomplished via our :meth:`get_audio_transcript` method: -To generate a transcript. +Textual can generate a transcript from an audio file. To do this, use the :meth:`get_audio_transcript` method. + +To generate a transcript: .. code-block:: python @@ -11,15 +12,24 @@ To generate a transcript. transcription = textual.get_audio_transcript('path_to_file.mp3') -This will generate a :class:`transcription_result`. It will contain the full text of the transcription, the detected language, and a list of audio segments. Each segment will be some portion of the transcription with start and end times in milliseconds. +This generates a :class:`transcription_result`. + +It contains: -It'll look something like this: +* The full text of the transcription. +* The detected language. +* A list of audio segments. Each segment is some portion of the transcription with start and end times in milliseconds. + +It looks something like this: .. literalinclude:: transcription_result.json :language: JSON +.. rubric:: Additional remarks + +When you use the Textual Cloud (https://textual.tonic.ai), file uploads are limited to 25MB or less. -.. rubric:: Additional Remarks +Textual supports the following file types: m4a, mp3, webm, mpga, wav. -When using the Textual Cloud (https://textual.tonic.ai) file uploads are limited to 25MB or less. Supported file types are m4a, mp3, webm, mpga, wav. For file types like m4a you'll need to make sure your build of ffmpeg has the necessary libraries. \ No newline at end of file +For file types such as m4a, make sure that your build of ffmpeg has the necessary libraries. From 37247c82313fa0b6accecbc60d36fc1b48d47d6e Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 09:34:08 -0400 Subject: [PATCH 05/23] Text edits --- docs/source/audio/index.rst | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/docs/source/audio/index.rst b/docs/source/audio/index.rst index bd3d8d5..958e5b2 100644 --- a/docs/source/audio/index.rst +++ b/docs/source/audio/index.rst @@ -4,12 +4,18 @@ Audio The Textual audio functionality allows you to process audio files in different ways. With this module you can: - Generate a transcript -- Sanitize the transcript by synthesizing/redacting it +- Synthesize or redact sensitive values in the transcript - Generate a redacted (beeped-out) audio file from the original recording Before you can use these functions, read the :doc:`Getting started ` guide and create an API key. -Textual audio processing supports m4a, mp3, webm, mpga, wav files. For file types like m4a you'll need to make sure your build of ffmpeg has the necessary libraries. If you are using the Textual cloud or you are self-hosting but using the Azure AI Whisper integration then you'll have to limit your file sizes to 25MB or less. If you are self-hosting Textual's ASR containers then there are no file size limitations. +Textual audio processing supports the following audio file types: m4a, mp3, webm, mpga, wav + +For file types such as m4a, make sure that your build of ffmpeg has the necessary libraries. + +If you use Textual Cloud, or you self-host using the Azure AI Whisper integration, then file sizes are limited to 25MB or smaller. + +If you self-host using Textual's Automatic Speech Recognition (ASR) containers, then there are no limitations on file size. .. toctree:: :hidden: @@ -18,4 +24,4 @@ Textual audio processing supports m4a, mp3, webm, mpga, wav files. For file type generate_transcript redact_transcript generate_redacted_audio - api \ No newline at end of file + api From a2f7a63f01f257c4df83b356865b15f25a79347b Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 09:38:25 -0400 Subject: [PATCH 06/23] Text edits --- docs/source/audio/redact_transcript.rst | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/docs/source/audio/redact_transcript.rst b/docs/source/audio/redact_transcript.rst index fdb43de..bf0f5e2 100644 --- a/docs/source/audio/redact_transcript.rst +++ b/docs/source/audio/redact_transcript.rst @@ -1,8 +1,10 @@ Redacting a transcript ---------------------- -To redact a transcript you'll first need to generate a transcription result, which you can do via the :meth:`get_audio_transcript` method (see :doc:`here for an example `). +Before you can redact a transcript, you must first generate a transcription result. To do this, use the :meth:`get_audio_transcript` method. For an example, go to see :doc:`here for an example `. -Once you have a transcript you can call :meth:`redact_audio_transcript`. Here is an example: +Once you have a transcript, call :meth:`redact_audio_transcript`. + +For example: .. code-block:: python @@ -18,8 +20,17 @@ Once you have a transcript you can call :meth:`redact_audio_transcript` which will include the original transcription, the redacted/synthesized text of the transcription, a list of redacted_segments, and the usage. +The :py:func:`redact_audio_transcript` returns a :class:`redacted_transcript_result`, which includes: + +* The original transcription. +* The redacted or synthesized text of the transcription +* A list of redacted_segments. +* The usage. + +.. rubric:: Additional remarks + +When you use Textual Cloud (https://textual.tonic.ai), file uploads are limited to 25MB or smaller. -.. rubric:: Additional Remarks +Textual supports the following audio file types: m4a, mp3, webm, mpga, wav -When using the Textual Cloud (https://textual.tonic.ai) file uploads are limited to 25MB or less. Supported file types are m4a, mp3, webm, mpga, wav. For file types like m4a you'll need to make sure your build of ffmpeg has the necessary libraries. \ No newline at end of file +For file types such as m4a, make that sure your build of ffmpeg has the necessary libraries. From 9b48467621d99a9f1c83b2edc3144663e407d7f2 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 09:41:17 -0400 Subject: [PATCH 07/23] Text edits --- docs/source/datasets/downloading_files.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/source/datasets/downloading_files.rst b/docs/source/datasets/downloading_files.rst index 3733b2a..71836f0 100644 --- a/docs/source/datasets/downloading_files.rst +++ b/docs/source/datasets/downloading_files.rst @@ -1,7 +1,11 @@ Downloading a redacted dataset file ===================================== -To download the redacted or synthesized version of the file, get the specific file from the dataset, then call the **download** function. +To download the redacted or synthesized version of the file: + +1. Get the specific file from the dataset. + +2. Call the **download** function. For example: @@ -20,4 +24,4 @@ To download a specific file in a dataset that you fetch by name: file = txt_file = list(filter(lambda x: x.name=='', dataset.files))[0] file_bytes = file.download() with open('', 'wb') as f: - f.write(file_bytes) \ No newline at end of file + f.write(file_bytes) From 0a3a8059d49b839a45582802dfa8b7ce94ba491f Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 09:42:37 -0400 Subject: [PATCH 08/23] Text edits --- docs/source/datasets/index.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/source/datasets/index.rst b/docs/source/datasets/index.rst index 6964d58..f5da924 100644 --- a/docs/source/datasets/index.rst +++ b/docs/source/datasets/index.rst @@ -1,9 +1,11 @@ Datasets ========================= -A dataset is a collection of files that are all redacted and synthesized in the same way. Datasets are a helpful organization tool to ensure that you can easily track a collections of files and how sensitive data is removed from those files. +A dataset is a collection of files that are all redacted and synthesized in the same way. Datasets are a helpful organization tool to ensure that you can easily track a collection of files and how sensitive data is removed from those files. -Typically, you configure datasets from the Textual application, but for ease of use, the SDK supports many dataset operations. However, some operations can only be performed from the Textual application. +Typically, you configure datasets from the Textual application, but for ease of use, the SDK supports many dataset operations. + +However, some operations can only be performed from the Textual application. @@ -19,4 +21,4 @@ Typically, you configure datasets from the Textual application, but for ease of viewing_files downloading_files viewing_config - api \ No newline at end of file + api From c741b66022f0b12fb7912c18d7f296cebc2d3a1b Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 09:44:35 -0400 Subject: [PATCH 09/23] Text edits --- docs/source/datasets/uploading_files.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/source/datasets/uploading_files.rst b/docs/source/datasets/uploading_files.rst index cfcf0a4..55887b1 100644 --- a/docs/source/datasets/uploading_files.rst +++ b/docs/source/datasets/uploading_files.rst @@ -1,8 +1,13 @@ Uploading files to a dataset ============================= -You can upload files to your dataset from the SDK. Provide the complete path to the file, and the complete name of the file as you want it to appear in Textual. +You can upload files to your dataset from the SDK. + +When you upload file, you provide: + +* The complete path to the file. +* The complete name of the file as it should appear in Textual. .. code-block:: python - dataset.add_file('','') \ No newline at end of file + dataset.add_file('','') From 54e77a8e97a615c16f6d490ec1b62fca2f44aa16 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 10:02:48 -0400 Subject: [PATCH 10/23] Text edits --- docs/source/datasets/viewing_config.rst | 53 +++++++++++++++++++------ 1 file changed, 41 insertions(+), 12 deletions(-) diff --git a/docs/source/datasets/viewing_config.rst b/docs/source/datasets/viewing_config.rst index 65fd19e..b6ee888 100644 --- a/docs/source/datasets/viewing_config.rst +++ b/docs/source/datasets/viewing_config.rst @@ -1,7 +1,12 @@ -Viewing the PII information for a dataset ------------------------------------------ +Viewing detected entities for a dataset +======================================= -You can also retrieve a list of entities found in the files of a dataset. You can retrieve all entities found or just specific entity types. The below will retrieve information on ALL entities. +You can retrieve a list of entities that were detected in the dataset files. + +Retrieving all entities for a dataset +------------------------------------- + +To retrieve the complete list of entities for a dataset: .. code-block:: python @@ -13,26 +18,43 @@ You can also retrieve a list of entities found in the files of a dataset. You c for file in files: entities = file.get_entities() -It will return a response a dictionary whose key is the type of PII and whose value is a list of found entities. The returned entity includes the original text value of the entity as well as the few words preceding and following the entity, e.g. +It returns a response in the form of a dictionary where: + +* The key is the entity type. +* The value is the list of detected entities of that type. + +For each entity, the response includes: + +* The original text value of the entity. +* To provide context, a few words that precede and follow the entity. + +For example: .. literalinclude:: pii_occurence_response.json :language: JSON +Retrieving specific types of entities for a dataset +--------------------------------------------------- -The call to get_entities() can also take an optional list of entities. For example, you could pass in a hard coded list as: +The call to ``get_entities()`` can take an optional list of entity types. + +For example, you could pass in a hard-coded list of entity types: .. code-block:: python file.get_entities(['NAME_GIVEN','NAME_FAMILY']) -Or do the same using the PiiType enum +Or you could use the ``PiiType`` enum: .. code-block:: python from tonic_textual.enums.pii_type import PiiType file.get_entities([PiiType.NAME_GIVEN, PiiType.NAME_FAMILY]) -Or you could even just pass in the current set of entities enabled by the dataset configuration, e.g. +Retrieving the entities for the enabled entity types for a dataset +------------------------------------------------------------------ + +To pass in the current set of entities that are enabled by the dataset configuration: .. code-block:: python @@ -44,12 +66,10 @@ Or you could even just pass in the current set of entities enabled by the datase file.get_entities(entities) -Viewing redaction and synthesis mappings for a dataset +Viewing entity mappings for a dataset ------------------------------------------------------ -You can retrieve the original, redacted, synthetic, and final output values for -entities in a dataset after the current generator configuration is applied. The -response is grouped by file. +You can retrieve mappings for each detected entity in a dataset. .. code-block:: python @@ -58,4 +78,13 @@ response is grouped by file. for file in mappings.files: for entity in file.entities: - print(file.file_name, entity.text, entity.output_text) \ No newline at end of file + print(file.file_name, entity.text, entity.output_text) + +The response is grouped by file. + +Each entity mapping includes: + +* The original entity value. +* The redacted version of the entity value. +* The synthesized version of the entity value. +* The final output value based on the current dataset configuration. From bde7030532fedc315d4b11d7dd06a33d1c0331fd Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 10:08:21 -0400 Subject: [PATCH 11/23] Text edits --- docs/source/parse/parsing_files.rst | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/source/parse/parsing_files.rst b/docs/source/parse/parsing_files.rst index 78a4656..372fa6d 100644 --- a/docs/source/parse/parsing_files.rst +++ b/docs/source/parse/parsing_files.rst @@ -1,7 +1,7 @@ Parsing files ================= -When Textual parses files, it convert unstructured files, such as PDF and DOCX, into a more structured JSON form. Textual uses the same JSON schema for all of its supported file types. +When Textual parses files, it converts unstructured files, such as PDF and DOCX, into a more structured JSON form. Textual uses the same JSON schema for all of its supported file types. To parse a single file, call the **parse_file** function. The function is synchronous. It only returns when the file parsing is complete. For very large files, such as PDFS that are several hundred pages long, this process can take a few minutes. @@ -22,9 +22,9 @@ To parse a single file from a local file system, start with the following snippe To read the files, use the 'rb' access mode, which opens the file for read in binary format. -In the **parse_file** command, you can set an optional timeout. The timeout indicates the number of seconds after which to stop waiting for the parsed result. +In the ``parse_file`` command, you can set an optional timeout. The timeout indicates the number of seconds after which to stop waiting for the parsed result. -To set a timeout for for all parse requests from the SDK, set the environment variable TONIC_TEXTUAL_PARSE_TIMEOUT_IN_SECONDS. +To set a timeout for for all parse requests from the SDK, set the environment variable ``TONIC_TEXTUAL_PARSE_TIMEOUT_IN_SECONDS``. Parsing a file from Amazon S3 ----------------------------- @@ -40,10 +40,12 @@ Because this uses the boto3 library to fetch the file from Amazon S3, you must f Understanding the parsed result ------------------------------- -The parsed result is a :class:`FileParseResult`. It is a wrapper around the JSON that is generated during processing. +The parsed result is a :class:`FileParseResult`. -To learn more about the structure of the parsed result, go to |parsed_structure_external_link| in the Textual documentation. +It is a wrapper around the JSON that is generated during processing. + +To learn more about the structure of the parsed result, go to the |parsed_structure_external_link| in the Textual documentation. .. |parsed_structure_external_link| raw:: html - Parsed JSON structure + JSON output structure information From 4c0b9df4e38dbb240bd35b1fa6e9f1943cd16a97 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 10:10:14 -0400 Subject: [PATCH 12/23] Text edits --- docs/source/parse/working_with_parsed_output.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/source/parse/working_with_parsed_output.rst b/docs/source/parse/working_with_parsed_output.rst index 8f32aef..e4162f8 100644 --- a/docs/source/parse/working_with_parsed_output.rst +++ b/docs/source/parse/working_with_parsed_output.rst @@ -5,16 +5,16 @@ After a file is parsed, either directly or as part of a dataset, you can begin t Typically, users build pipelines to feed vector databases for RAG applications, or to prepare datasets to fine-tune or build an LLM. -The parsed result is documented in the Textual documentation in |parsed_structure_external_link|. This topic describes the JSON schema that is used to store the parsed result. +In the Textual documentation, the |parsed_structure_external_link| topic describes the JSON schema that is used to store the parsed result. The SDK provides access to the raw JSON in the form of a Python dictionary. It also provides a helper methods and utilities to perform common actions. Examples of actions that the SDK supports include: -- Get the content of the file in Markdown or plain text -- Redact or synthesize the file content +- Get the content of the file in Markdown or plain text. +- Redact or synthesize the file content. - Chunk the file. You can redact or synthesize the chunks and also enrich them with additional entity metadata. -- List all of the identified tables and key-value pairs that were found in a document +- List all of the identified tables and key-value pairs that were found in a document. The below snippet includes most of these supported actions. @@ -42,4 +42,4 @@ For a list of all of the available operations, go to the :class:`FileParseResult .. |parsed_structure_external_link| raw:: html - Parsed JSON structure + JSON output structure From a490525eaed4957f5ff359aef0e7dccb9faf7806 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 10:30:47 -0400 Subject: [PATCH 13/23] Text edits --- docs/source/redact/csv_helper.rst | 50 ++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 17 deletions(-) diff --git a/docs/source/redact/csv_helper.rst b/docs/source/redact/csv_helper.rst index 5672caf..1db72f1 100644 --- a/docs/source/redact/csv_helper.rst +++ b/docs/source/redact/csv_helper.rst @@ -1,15 +1,28 @@ Processing data in CSV files ----------------------------- +============================ -Typically, in a CSV file you only wish to process a specific column or columns of data. Additionally, different rows of data might relate to each other, for example if the CSV stored a chat conversation where each row was a single message, but multiple rows together formed a conversation. This helper class can be used to group rows of data together to yield more accurate identification by maximizing context sent to our NER model. It can then return either entity information of each row or a new, redacted CSV. +Typically, in a CSV file you only want to process a specific column or columns of data. -As an example, imagine a CSV with 3 columns. +Also, different rows of data might relate to each other. For example, a CSV stores a chat conversation where each row is a single message, but multiple rows together form a conversation. + +You can use this helper class to group rows of data to yield more accurate identification by maximizing the context that you send to our NER model. + +It can then return either the entity information for each row or a new, redacted CSV. + +For example, a CSV file contains the following columns: #. message_id #. conversation_id #. message -So this particular CSV can store many messages spread across many conversations. Ideally, we would want to create a single document matching each conversation prior to processing in order to ensure the best quality identification. This can be solved with the code below. This first example returns redaction results for each row and handles the pre-processing transparently. +This CSV file stores many messages spread across many conversations. + +Creating a single document to group the conversations +----------------------------------------------------- + +Before processing, to ensure the best quality detections, we create a single document that matches each conversation. + +This can be solved with the code below. This example returns redaction results for each row and handles the pre-processing transparently. .. code-block:: python @@ -22,15 +35,18 @@ So this particular CSV can store many messages spread across many conversations. with open('original.csv', 'r') as f: response = helper.redact(f, True, lambda row: row['conversation_id'], lambda row: row['message'], lambda x: ner.redact(x)) -The key call here is to the helper's redact method. This function requires you to pass in several arguments: +The key call here is to the helper's ``redact`` method. This function requires you to pass in the following arguments: + +* The CSV file. +* Whether to treat the first row as a header. +* A function that shows how to group columns into messages. If not specified all rows are grouped together. +* A function that shows how to retrieve the necessary text. +* A function for redacting. This is normally a wrapper around the TextualNer ``redact()`` method. -* The csv file -* Whether or not the first row should be treated as a header -* A function which shows how to group columns into messages, if not specified we group all rows together -* A function which shows how to retrieve the necessary text -* A function for redacting, this normally is a wrapper around the TextualNer redact() method +Creating a new redacted CSV file +----------------------------------------------------- -You can also create a new redacted file. The function signature is similar. Here is an example: +You can also create a new redacted file. The following example writes the redacted CSV back to disk: .. code-block:: python @@ -46,10 +62,10 @@ You can also create a new redacted file. The function signature is similar. He with open('redacted.csv', mode='w') as f: print(buf.getvalue(), file=f) -In this example we write the redacted CSV back to disk. The function arguments are also slightly different. They are: +The function arguments to create a redacted file are slightly different: -* The csv file -* Whether or not the first row should be treated as a header -* The column used for grouping, if not specified we group all rows together -* The column containing the text -* A function for redacting, this normally is a wrapper around the TextualNer redact() method \ No newline at end of file +* The CSV file. +* Whether to treat the first row as a header +* The column used for grouping. If not specified, all rows are grouped together. +* The column that contains the text. +* A function to use to redact the file. This normally is a wrapper around the TextualNer ``redact()`` method. From 103c7434fd380529125d828bfe764e1487b52675 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 10:36:14 -0400 Subject: [PATCH 14/23] Text edits --- docs/source/redact/index.rst | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/source/redact/index.rst b/docs/source/redact/index.rst index 7cb7c02..27770ab 100644 --- a/docs/source/redact/index.rst +++ b/docs/source/redact/index.rst @@ -1,13 +1,19 @@ Synthesize / Redact ==================== -The Textual redact functionality allows you to identify entities in files, and then optionally tokenize or synthesize these entities to create a safe version of your unstructured text. This functionality works on both raw strings and files, including PDF, DOCX, XLSX, and other formats. +The Textual redact functionality allows you to identify entities in text, and then optionally tokenize or synthesize these entities to create a safe version of your unstructured text. + +This functionality works on both raw strings and files, including PDF, DOCX, XLSX, and other formats. Before you can use these functions, read the :doc:`Getting started ` guide and create an API key. -Textual operates on your data in a two step process. First, sensitive information is identified. Textual supports identification of 30+ built-in entity types which you can read about `here `_. Textual also supports defining your own `custom entities `_. Second, this information of where entities are located is used to then tokenize or synthesize the data. +When Textual operates on your data: + +1. It first identifies sensitive information. Textual can identify 30+ `built-in entity types `_. You can also define your own `custom entity types `_. + +2. Second, it uses information about where entities are located to tokenize or synthesize the data. -In the following section :doc:`Choosing tokenization or synthesis <./redact_config>` you can learn different ways to configure your output. +In :doc:`Choosing tokenization or synthesis <./redact_config>` you can learn different ways to configure your output. .. toctree:: @@ -22,4 +28,4 @@ In the following section :doc:`Choosing tokenization or synthesis <./redact_conf redacting_dataframes redacting_large_data csv_helper - api \ No newline at end of file + api From e61fe563cdf44289c19a10fb1cf3c2c67d1200c3 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 12:14:36 -0400 Subject: [PATCH 15/23] Text edits --- docs/source/redact/redact_config.rst | 71 ++++++++++++++++------------ 1 file changed, 42 insertions(+), 29 deletions(-) diff --git a/docs/source/redact/redact_config.rst b/docs/source/redact/redact_config.rst index 5b19c97..658b601 100644 --- a/docs/source/redact/redact_config.rst +++ b/docs/source/redact/redact_config.rst @@ -3,56 +3,60 @@ Choosing tokenization or synthesis ==================================== -Each built-in entity type supported by Textual can be configured. Whether you are redacting text, json, html, or binary files like PDF the configuration is the same. This configuration determines how each entity is handled and ultimately determines the look and feel of your output data. +You can configure each Textual built-in and custom entity type. Whether you redact text, json, html, or binary files such as PDF, the configuration is the same. This configuration determines how each entity is handled and ultimately determines the look and feel of your output data. -Basic configuration --------------------- +Available states for entity types +--------------------------------- -Each built-in and custom entity type supported by Textual can be set to one of four different states. These states determine what the value looks like in the output. They are: +Each built-in and custom entity type that Textual supports can be set to one of the following states. These states determine what the value looks like in the output. + +* Ignored +* Redacted or tokenized +* Synthesized +* Group synthesized Ignored ^^^^^^^^^^^^ -When ignored, an entity is left alone in the output +When ignored, an entity is kept as is in the output. Redaction / Tokenization ^^^^^^^^^^^^^^^^^^^^^^^^^ -When tokenized (also referred to as redacted) entities are replaced with *unique* and *reversible* tokens, e.g.:: +Tokenized (also referred to as redacted) entities are replaced with *unique* and *reversible* tokens. For example:: My name is John Smith. -> My name is [NAME_GIVEN_dySb5] [NAME_FAMILY_7w4Db3]. -Synthesize +Synthesis ^^^^^^^^^^^^ -When synthesized, entities are replaced with realistic fake values, e.g.:: +Synthesized entities are replaced with realistic fake values, For example:: My name is John Smith. -> My name is Alan Johnson These fake values are consistent. So in the above example, John goes to Alan and will do so in all cases within the document and optionally across documents as well. -Group synthesize +Group synthesis ^^^^^^^^^^^^^^^^^^ -When group synthesized, entities are also replaced with realistic values however an entity-linking operation is also performed. - - -There are two primary ways to configure how built-in entity types are treated. All SDK functions that operate on data support the `generator_default` and `generator_config` function parameters. +Group synthesized entities are also replaced with realistic values. However, Textual alsoi performs an entity-linking operation. -`generator_default` defines the default configuration for all entities. If not set, the default is set to `Redaction` meaning all entity types will be redacted/tokenized. +Configuring handling for entity types +------------------------------------- -`generator_config` allows for more fine-grain control. Different entities can be set to different options. One common strategy, for example, is to set the `generator_default` to 'Off'. This will tell Textual to ignore all entity types. The `generator_config` can then be used to re-enable the specific entity types youc are about. +To configure how built-in entity types are treated, all SDK functions that operate on data support the ``generator_default`` and ``generator_config`` function parameters. +``generator_default`` defines the default configuration for all entity types. If not set, the default is ``Redaction``, meaning that entities of all types are redacted or tokenized. -In code, the `generator_default`and `generator_config` accept the following possible values (case-sensitive). +``generator_config`` allows for more fine-grained control. Different entity types can use different options. For example, one common strategy is to set ``generator_default`` to ``Off``. This tells Textual to ignore all entity types. ``generator_config`` is then used to re-enable redaction or synthesis for specific entity types that are relevant to you. -Textual supports different synthesis options: -- `Off`: Ignores the entity -- `Redaction`: Tokenizes the entity -- `Synthesis`: Standard synthesis with realistic replacement values -- `GroupingSynthesis`: LLM-based synthesis that maintains contextual relationships between entities +In code, ``generator_default``and ``generator_config`` accept the following possible values, which are case-sensitive. -The following example passes a string to the `redact` method. We set the `generator_default` to 'Off' while then specifying a handful of entities as 'Synthesis'. +- ``Off``: Ignores entities +- ``Redaction``: Tokenizes entities +- ``Synthesis``: Standard synthesis with realistic replacement values +- ``GroupingSynthesis``: LLM-based synthesis that maintains contextual relationships between entities +The following example passes a string to the `redact` method. It sets ``generator_default`` to ``Off``, and configures a handful of entity types with ``Synthesis``. .. code-block:: python @@ -97,15 +101,20 @@ This produces the following output: Advanced configuration ----------------------- -Built-in entity types can be modified via Regex. Regex can be used to classify more text as a given entity type or less. All SDK functions that modify data accept the parameters `label_allow_lists` and `label_block_lists`. These lists are set **per entity**. +For built-in entity types, you can use regular expressions to modify the detection. A regular expression can be used to: + +* Identify additional values for a given entity type. +* Exclude specified values from a given entity type. -Let's start by excluding certain matches from the NAME_FAMILY and ORGANIZATION entity types. Below, we provide a Regex expression for NAME_FAMILY. This could, for example, prevent 'Wilson' from being tagged as a last name in the below text +All SDK functions that modify data accept the parameters ``label_allow_lists`` and ``label_block_lists``. These lists are set **for each entity type**. + +To start, we'll exclude certain matches from the NAME_FAMILY and ORGANIZATION entity types. Below, we provide a regular expression for NAME_FAMILY. For example, this could prevent 'Wilson' from being tagged as a last name in the below text .. code-block:: none I suffer from Wilson Disease -We also are excluding from Organization any of the exact string matches for Tonic. +We'll also exclude from ORGANIZATION any of the exact string matches for Tonic. .. code-block:: python @@ -117,7 +126,7 @@ We also are excluding from Organization any of the exact string matches for Toni ner.redact('', label_block_lists = label_block_lists) -Just like `label_block_lists` can be used to exclude text we can use `label_allow_lists` to bring in additional text. In the below example, we flag all matches of the below regex to HEALTHCARE_ID. +Similar to how you use ``label_block_lists`` to exclude text, you can use ``label_allow_lists`` to detect additional values. In the below example, we identify all matches of the below regular expression as HEALTHCARE_ID entity values. .. code-block:: python @@ -128,7 +137,11 @@ Just like `label_block_lists` can be used to exclude text we can use `label_allo ner.redact('', label_allow_lists = label_allow_lists) -Custom entity configuration ----------------------------- +Custom entity type configuration +-------------------------------- + +All of the configuration options above also apply to custom entity types. + +However, by default, custom entity types are not used unless you explicitly include them in a given request. -All of the configuration options above apply to custom entities as well. However, by default, custom entities are not used unless explicitly requested in a given request. Each Python SDK method supports a function parameter called `custom_entities`. It is a python list of custom entity names to include in the request. +Each Python SDK method supports a function parameter called ``custom_entities``. It is a Python list of names of custom entity types to include in the request. From af748b246e23f23cea0383463c1d5fd2a7d618e6 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 12:17:41 -0400 Subject: [PATCH 16/23] Text edits --- docs/source/redact/redacting_dataframes.rst | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/docs/source/redact/redacting_dataframes.rst b/docs/source/redact/redacting_dataframes.rst index f1f16da..4f172be 100644 --- a/docs/source/redact/redacting_dataframes.rst +++ b/docs/source/redact/redacting_dataframes.rst @@ -1,12 +1,22 @@ Working with DataFrames ========================= -The :meth:`redact` function can be called as a user-defined function (UDF) on a DataFrame column. As an example, lets read a CSV file redact a given column, and write the CSV back to disk. Make sure to first install pandas. +The :meth:`redact` function can be called as a user-defined function (UDF) on a DataFrame column. + +Before you do this, you must install pandas. .. code-block:: bash pip install pandas +The following example: + +1. Reads a CSV file. + +2. Redacts a given column. + +3. Writes the CSV back to disk. + .. code-block:: python from tonic_textual.redact_api import TextualNer @@ -19,4 +29,4 @@ The :meth:`redact` function can be c # Let's say there is a notes column in the CSV containing unstructured text df['notes'] = df['notes'].apply(lambda x: ner.redact(x).redacted_text if not pd.isnull(x) else None)) - df.to_csv('file_redacted.csv') \ No newline at end of file + df.to_csv('file_redacted.csv') From 509e249b56a6cbc6f9f49cb2108f712516cd1037 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 12:57:51 -0400 Subject: [PATCH 17/23] Text edits --- docs/source/redact/redacting_files.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/redact/redacting_files.rst b/docs/source/redact/redacting_files.rst index 422be6b..b421fd4 100644 --- a/docs/source/redact/redacting_files.rst +++ b/docs/source/redact/redacting_files.rst @@ -23,7 +23,7 @@ To redact an individual file: with open('', 'rb') as f: j = redact.start_file_redaction(f.read(),'') - # Specify generator_config to determine which entities are 'Redacted', 'Synthesis', and 'Off'. + # Specify generator_config to determine which entity types are 'Redacted', 'Synthesis', and 'Off'. # 'Redacted' is the default. To override the default, use the generator_default param. new_bytes = redact.download_redacted_file(j) @@ -35,7 +35,7 @@ Configure how to handle specific entity types By default, in the downloaded file, all of the entities are redacted. -To synthesize values for or ignore specific entities in the file, use the **generator_config** param. +To synthesize values for or ignore specific types of entities in the file, use the ``generator_config`` param. In this example, we disable the modification of numeric values and synthesize email addresses: From 7c6f2bb69a008c76f7a46b3584595a129b25c862 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 14:21:32 -0400 Subject: [PATCH 18/23] Text edits --- docs/source/redact/redacting_html.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/source/redact/redacting_html.rst b/docs/source/redact/redacting_html.rst index 1285ee7..f9b28a6 100644 --- a/docs/source/redact/redacting_html.rst +++ b/docs/source/redact/redacting_html.rst @@ -1,15 +1,17 @@ Redact HTML data ================= -To redact sensitive information from HTML, pass the HTML document string to the `redact_html` method. +To redact sensitive information from HTML, pass the HTML document string to the ``redact_html`` method. -Like other SDK functions that modify data the `redact_html` allows you to configure how different entity types are treated. You can learn more about the common parameters: +Similar to other SDK functions that modify data, the ``redact_html`` allows you to configure how to treat different entity types. To learn more about the common parameters: -* generator_default -* generator_config -* label_allow_lists -* label_block_lists +* ``generator_default`` +* ``generator_config`` +* ``label_allow_lists`` +* ``label_block_lists`` -by reading :ref:`redact-config`. +go to :ref:`redact-config`. + +Here's an example of redacting HTML: .. code-block:: python @@ -34,4 +36,6 @@ by reading :ref:`redact-config`. xml_redaction = textual.redact_html(html_content) -The response includes entity level information, including the XPATH at which the sensitive entity is found. The start and end positions are relative to the beginning of thhe XPATH location where the entity is found. \ No newline at end of file +The response includes information about the detected entities, including the XPATH where each entity is found. + +The start and end positions are relative to the beginning of thhe XPATH location where the entity is found. From 44cc0d97ec29f83f15dfd342a1fd3f3df38da78f Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 15:04:08 -0400 Subject: [PATCH 19/23] Text edits --- docs/source/redact/redacting_json.rst | 36 ++++++++++++++++++--------- 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/docs/source/redact/redacting_json.rst b/docs/source/redact/redacting_json.rst index 9d5506c..42e7290 100644 --- a/docs/source/redact/redacting_json.rst +++ b/docs/source/redact/redacting_json.rst @@ -1,15 +1,21 @@ Redact JSON data =================== -To redact sensitive information from a JSON string or Python dict, pass the object to the `redact_json` method: -Like other SDK functions that modify data the `redact_html` allows you to configure how different entity types are treated. You can learn more about the common parameters: +Using redact_json +----------------- -* generator_default -* generator_config -* label_allow_lists -* label_block_lists +To redact sensitive information from a JSON string or Python dict, pass the object to the ``redact_json`` method: -by reading :ref:`redact-config`. +Similar to other SDK functions that modify data, ``redact_html`` allows you to configure how to treat different entity types. + +To learn more about the common parameters: + +* ``generator_default`` +* ``generator_config`` +* ``label_allow_lists`` +* ``label_block_lists`` + +go to :ref:`redact-config`. .. code-block:: python @@ -48,14 +54,20 @@ This produces the following output: Conversation data stored in JSON -------------------------------- -When conversation data (typically text transcribed from audio recordings) is stored in JSON it is common for different parts of the conversation are found spread across multiple locations in JSON. Using the redact_json method is not ideal because each piece of text is treated independently when performing NER identification. This can result in worse NER identification. The :class:`JsonConversationHelper` will process entire conversations in single NER calls yielding better performance and then return an NER result that still maps to your original JSON structure. +When conversation data, such as text transcribed from audio recordings is stored in JSON, different parts of the conversation are often spread across multiple locations in JSON. -As an example, let's say you have a JSON document representing a conversation as follows: +Using ``redact_json`` method is not ideal in this case, because NER identification treats each piece of text independently. This can result in worse NER identification. + +The :class:`JsonConversationHelper` processes entire conversations in single NER calls, which improves performance, and then returns an NER result that still maps to your original JSON structure. + +For example, the following JSON document represents a conversation: .. literalinclude:: json_conversation_example.json :language: JSON -Naively, we could process each speech utterance using our redact_json endpoint but we could lose context since each utterance would be run through our models independetly. Let's use the :class:`JsonConversationHelper` to improve our results. +Naively, we could use the ``redact_json`` endpoint to process each speech utterance. However, we might lose context, because each utterance runs through our models independetly. + +To improve the results, we'll use the :class:`JsonConversationHelper`. .. code-block:: python @@ -79,7 +91,7 @@ Naively, we could process each speech utterance using our redact_json endpoint b response = helper.redact(data, lambda x: x["conversation"]["transcript"], lambda x: x["content"], lambda content: ner.redact(content)) -This yields the following redaction result below. Each piece of speech from the conversation is stored in its own element in the resulting array. The order of text in the response matches the order of text in the original conversation. +This produces the following redaction result. In the resulting array, each piece of speech from the conversation is stored in its own element. The order of the text in the response matches the order of text in the original conversation. .. literalinclude:: json_conversation_response.json - :language: JSON \ No newline at end of file + :language: JSON From 5dcc8f3480075b06fdab1554f77a2ae3382529c2 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 15:12:14 -0400 Subject: [PATCH 20/23] Text edits --- docs/source/redact/redacting_large_data.rst | 32 ++++++++++++++++----- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/docs/source/redact/redacting_large_data.rst b/docs/source/redact/redacting_large_data.rst index 9e66276..df7a26d 100644 --- a/docs/source/redact/redacting_large_data.rst +++ b/docs/source/redact/redacting_large_data.rst @@ -1,15 +1,22 @@ Working with large data sets ================================= -For most use cases the :meth:`redact` and :meth:`redact_bulk` functions are sufficient. However, sometimes you need to process a lot of data quickly. Typically this means making multiple redact requests concurrently instead of sequentially. +For most use cases, the :meth:`redact` and :meth:`redact_bulk` functions are sufficient. -We can accomplish this using Python's asyncio library which you can install below. +However, sometimes you need to process a lot of data quickly. Typically, this means making multiple redact requests concurrently instead of sequentially. + +To accomplish this, you can use Python's asyncio library. To install asyncio: .. code-block:: bash pip install asyncio -The below snippet can be used to process a large number of files through concurrent requests. **Note that this snippet will not run in in a Jupyter notebook due to how Jupyter notebook handles event loops. Below is a second example when running in Jupypter notebook** +Issuing concurrent requests +--------------------------- + +The below snippet can be used to process a large number of files through concurrent requests. + +**Note that because of how Jupyter notebook handles event loops, this snippet cannot run in in a Jupyter notebook. A later example shows how to run in Jupypter notebook.** .. code-block:: python @@ -27,7 +34,11 @@ The below snippet can be used to process a large number of files through concurr results = [task.result() for task in tasks] -If you run the above and see an error like **The event loop is already running** this is likely because you are running in a Jupyter notebook. To successfully run in a Jupyter notebook please use the following: +Running in a Jupyter notebook +----------------------------- +If you run the above and see an error similar to **The event loop is already running**, this is likely because you are running in a Jupyter notebook. + +To successfully run in a Jupyter notebook, use the following: .. code-block:: python @@ -36,7 +47,7 @@ If you run the above and see an error like **The event loop is already running** ner = TextualNer() - file_names = ['...'] # The list of files to be processed asynchronously + file_names = ['...'] # The list of files to process asynchronously async def async_redact(t): return ner.redact(t) @@ -47,7 +58,14 @@ If you run the above and see an error like **The event loop is already running** results = [task.result() for task in tasks] -In another case, perhaps you are processing DataFrames but the frames themselves are quite large and you wish to redact rows in parallel. For this we can use Dask, a framework that sits on top of Pandas for concurrent execution. Make sure to first install dask[dataframe] and pandas. +Processing large DataFrames +--------------------------- + +In another case, you might be processing very large DataFrame, and want to redact rows in parallel. + +For this we can use Dask, a framework that sits on top of Pandas for concurrent execution. + +Before you use Dask, you must install dask[dataframe] and pandas. .. code-block:: bash @@ -64,4 +82,4 @@ In another case, perhaps you are processing DataFrames but the frames themselves df = get_dataframe() npartitions=25 # Sets the number of requests to make concurrently. - df[col] = dd.from_pandas(df[col], npartitions=npartitions).apply(lambda x: redact(x) if not pd.isnull(x) else x, meta=pd.Series(dtype='str', name=col)).compute() \ No newline at end of file + df[col] = dd.from_pandas(df[col], npartitions=npartitions).apply(lambda x: redact(x) if not pd.isnull(x) else x, meta=pd.Series(dtype='str', name=col)).compute() From cafc1fc8843a74b98b8c9df8f2a33c90e52dd97d Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 15:17:54 -0400 Subject: [PATCH 21/23] Text edits --- docs/source/redact/redacting_text.rst | 42 ++++++++++++++++++--------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/docs/source/redact/redacting_text.rst b/docs/source/redact/redacting_text.rst index a86f54e..4528102 100644 --- a/docs/source/redact/redacting_text.rst +++ b/docs/source/redact/redacting_text.rst @@ -1,15 +1,21 @@ Redact text ================ -To redact sensitive information from a text string, pass the string to the `redact` method. -Like other SDK functions that modify data the `redact_html` allows you to configure how different entity types are treated. You can learn more about the common parameters: +Redact a single string +---------------------- + +To redact sensitive information from a text string, pass the string to the ``redact`` method. + +Similar to other SDK functions that modify data, ``redact`` allows you to configure how to treat different entity types. + +To learn more about the common parameters: -* generator_default -* generator_config -* label_allow_lists -* label_block_lists +* ``generator_default`` +* ``generator_config`` +* ``label_allow_lists`` +* ``label_block_lists`` -by reading :ref:`redact-config`. +go to :ref:`redact-config`. .. code-block:: python @@ -59,15 +65,15 @@ This produces the following output: "new_text": "[ORGANIZATION_P5XLAH]" } -You can also record `redact` calls, so that you can view and analyze results in the Textual application. To learn more, read :ref:`record-api-call-section` +You can also record ``redact`` calls, so that you can view and analyze results in the Textual application. To learn more, go to :ref:`record-api-call-section` Bulk redact raw text --------------------- -In the same way that you use the `redact` method to redact strings, you can use the `redact_bulk` method to redact many strings at the same time. +In the same way that you use the ``redact`` method to redact strings, you can use the ``redact_bulk`` method to redact many strings at the same time. Each string is redacted individually. Each string is fed into our model independently and cannot affect other strings. -To redact sensitive information from a list of text strings, pass the list to the `redact_bulk` method: +To redact sensitive information from a list of text strings, pass the list to the ``redact_bulk`` method: .. code-block:: python @@ -134,7 +140,11 @@ This produces the following output: Recording API requests ---------------------- -When you use the :meth:`redact` method to redact text, you can optionally record these requests to view and analyze later in the Textual application. The `redact` method takes an optional `record_options` (:class:`RecordApiRequestOptions`) argument. To record an API request: +When you use the :meth:`redact` method to redact text, you can optionally record these requests to view and analyze later in the Textual application. + +The ``redact`` method takes an optional ``record_options`` (:class:`RecordApiRequestOptions`) argument. + +To record an API request: .. code-block:: python @@ -149,15 +159,19 @@ When you use the :meth:`redact` meth tags=["my_first_request"]) ) -The above code runs the redaction in the same way as any other redaction request, and then records the API request and its results. The request itself is automatically purged after 1 hour. You can view the results from the **API Explorer** page in Textual. The retention time is specified in hours and can be set to a value between 1 and 720. +The above code runs the redaction in the same way as any other redaction request, and then records the API request and its results. + +The request itself is automatically purged after 1 hour. + +You can view the results from the **API Explorer** page in Textual. The retention time for the results specified in hours and can be set to a value between 1 and 720. Replacing values in your redaction response ------------------------------------------- -Tonic Textual includes additional utilities for customizing responses. The :class:`ReplaceTextHelper` can take a redaction response from our redact call and modify the replacement values. +Tonic Textual includes additional utilities to customize responses. The :class:`ReplaceTextHelper` can take a redaction response from our redact call and modify the replacement values. -For example, the below example will modify the replacement values for first names and cities and replace them with equal length strings comprised of just 'x'. +For example, the below example modifies the replacement values for first names and cities. It replaces them with strings of equal length that consist of 'x'. .. code-block:: python From 750a50aa3487eb39520c678f32ee467882b14e3a Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 15:19:27 -0400 Subject: [PATCH 22/23] Update redacting_xml.rst --- docs/source/redact/redacting_xml.rst | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/docs/source/redact/redacting_xml.rst b/docs/source/redact/redacting_xml.rst index 187d63b..f56544a 100644 --- a/docs/source/redact/redacting_xml.rst +++ b/docs/source/redact/redacting_xml.rst @@ -1,15 +1,17 @@ Redact XML data ================ -To redact sensitive information from XML, pass the XML document string to the `redact_xml` method. +To redact sensitive information from XML, pass the XML document string to the ``redact_xml`` method. -Like other SDK functions that modify data the `redact_html` allows you to configure how different entity types are treated. You can learn more about the common parameters: +Similar to other SDK functions that modify data, ``redact_xml`` allows you to configure how to treat different entity types. -* generator_default -* generator_config -* label_allow_lists -* label_block_lists +To learn more about the common parameters: -by reading :ref:`redact-config`. +* ``generator_default`` +* ``generator_config`` +* ``label_allow_lists`` +* ``label_block_lists`` + +go to :ref:`redact-config`. .. code-block:: python @@ -39,4 +41,6 @@ by reading :ref:`redact-config`. xml_redaction = textual.redact_xml(xml_string) -The response includes entity level information, including the XPATH at which the sensitive entity is found. The start and end positions are relative to the beginning of thhe XPATH location where the entity is found. \ No newline at end of file +The response includes entity level information, including the XPATH where the sensitive entity is found. + +The start and end positions are relative to the beginning of thhe XPATH location where the entity is found. From 6148e05f894a22716ae01d3ce27f70ffaec1f227 Mon Sep 17 00:00:00 2001 From: Janice Manwiller <107077736+JaniceManwiller@users.noreply.github.com> Date: Tue, 24 Mar 2026 15:20:29 -0400 Subject: [PATCH 23/23] Text edits --- docs/source/index.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index 12feaf9..ce575f7 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -52,11 +52,11 @@ To redact text or files, use the TextualNer client. To parse files, which is use Both clients support the following optional arguments: -* **base_url** - The URL of the server that hosts Tonic Textual. Default: ``https://textual.tonic.ai`` +* ``base_url`` - The URL of the server that hosts Tonic Textual. Default: ``https://textual.tonic.ai`` -* **api_key** - Your API key. If not specified, you must set ``TONIC_TEXTUAL_API_KEY`` in your environment. +* ``api_key`` - Your API key. If not specified, you must set ``TONIC_TEXTUAL_API_KEY`` in your environment. -* **verify** - Whether to verify SSL certification. Default: ``true`` +* ``verify`` - Whether to verify SSL certification. Default: ``true`` .. |signup_link| raw:: html