Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,59 @@

All notable changes to this project will be documented in this file.

## [2.0.2] - 2026-05-06
### Added
- Dict context support for Conditional Data Access.

## [2.0.1] - 2026-04-29
### Fixed
- Fern client re-initialisation on token refresh.

## [2.0.0] - 2025-11-11
### Added
- Multi-vault and multi-connection support via fluent builder (`Skyflow.builder()`).
- New typed request and response classes for all vault operations (`InsertRequest`, `GetRequest`, `UpdateRequest`, `DeleteRequest`, `QueryRequest`, `DetokenizeRequest`, `TokenizeRequest`, `FileUploadRequest`).
- Detect API: `deidentify_text`, `reidentify_text`, `deidentify_file`, and `get_detect_run`.
- File upload support via `vault().upload_file()`.
- Flexible credential types: API key, static bearer token, service account credentials string, credentials file path, and `SKYFLOW_CREDENTIALS` environment variable.
- `SkyflowError` now includes `http_code`, `grpc_code`, `http_status`, `request_id`, and `details` fields.
- `set_log_level()` on the client for runtime log level changes.

### Changed
- Complete rewrite of the SDK public API. See [docs/migrate_to_v2.md](docs/migrate_to_v2.md) for migration instructions.

## [1.16.0] - 2025-09-23
### Fixed
- Remote disconnect error in vault operations.

## [1.15.8] - 2025-09-30
### Fixed
- Retry logic when `continue_on_error` is set to `true` in insert.

## [1.15.7] - 2025-09-23
### Fixed
- Retry handling for errors in insert method.

## [1.15.6] - 2025-09-22
### Fixed
- Added retry logic for transient errors.

## [1.15.5] - 2025-09-18
### Fixed
- Remote disconnected errors in vault operations.

## [1.15.4] - 2025-09-12
### Fixed
- Retry on exception during vault requests.

## [1.15.3] - 2025-09-12
### Fixed
- Retry on exception during vault requests.

## [1.15.2] - 2025-09-12
### Fixed
- Retry on connection error in insert method.

## [1.15.1] - 2023-12-07
## Fixed
- Not receiving tokens when calling Get with options tokens as true.
Expand Down
59 changes: 55 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Skyflow Python SDK

> **This is the current, recommended version of the Skyflow SDK.** V2.1.0 brings flexible auth, multi-vault support, native data types, and rich error diagnostics.
>
> Migrating from v1? See the **[Migration Guide](https://github.com/skyflowapi/skyflow-python/blob/main/docs/migrate_to_v2.md)** for step-by-step instructions. V1 is in maintenance mode and will reach End of Life on October 31, 2026.

The Skyflow Python SDK is designed to help with integrating Skyflow into a Python backend.

## Table of Contents
Expand Down Expand Up @@ -703,18 +707,65 @@ options = {

Embed context values into a bearer token during generation so you can reference those values in your policies. This enables more flexible access controls, such as tracking end-user identity when making API calls using service accounts, and facilitates using signed data tokens during detokenization.

Generate bearer tokens containing context information using a service account with the context_id identifier. Context information is represented as a JWT claim in a Skyflow-generated bearer token. Tokens generated from such service accounts include a context_identifier claim, are valid for 60 minutes, and can be used to make API calls to the Data and Management APIs, depending on the service account's permissions.
Generate bearer tokens containing context information using a service account with the `context_id` identifier. Context information is represented as a JWT claim in a Skyflow-generated bearer token. Tokens generated from such service accounts include a `context_identifier` claim, are valid for 60 minutes, and can be used to make API calls to the Data and Management APIs, depending on the service account's permissions.

The `ctx` parameter accepts either a **string** or a **dict**:

**String context** — use when your policy references a single context value:

```python
options = {'ctx': 'user_12345'}
token, _ = generate_bearer_token(filepath, options)
```

**Dict context** — use when your policy needs multiple context values for conditional data access. Each key in the dict maps to a Skyflow CEL policy variable under `request.context.*`:

```python
options = {
'ctx': {
'role': 'admin',
'department': 'finance',
'user_id': 'user_12345',
}
}
token, _ = generate_bearer_token(filepath, options)
```

With the dict above, your Skyflow policies can reference `request.context.role`, `request.context.department`, and `request.context.user_id` to make conditional access decisions.

Dict keys must contain only alphanumeric characters and underscores (`[a-zA-Z0-9_]`). Invalid keys will raise a `SkyflowError`.

> [!TIP]
> See the full example in the samples directory: [token_generation_with_context_example.py](samples/service_account/token_generation_with_context_example.py)
> See [docs.skyflow.com](https://docs.skyflow.com) for more details on authentication, access control, and governance for Skyflow.
> See the full example in the samples directory: [token_generation_with_context_example.py](samples/service_account/token_generation_with_context_example.py)
> See Skyflow's [context-aware authorization](https://docs.skyflow.com) and [conditional data access](https://docs.skyflow.com) docs for policy variable syntax like `request.context.*`.

#### Generate signed data tokens: `generate_signed_data_tokens(filepath, options)`

Digitally sign data tokens with a service account's private key to add an extra layer of protection. Skyflow generates data tokens when sensitive data is inserted into the vault. Detokenize signed tokens only by providing the signed data token along with a bearer token generated from the service account's credentials. The service account must have the necessary permissions and context to successfully detokenize the signed data tokens.

The `ctx` parameter on signed data tokens also accepts either a **string** or a **dict**, using the same format as bearer tokens:

```python
# String context
options = {
'ctx': 'user_12345',
'data_tokens': ['dataToken1', 'dataToken2'],
'time_to_live': 90,
}

# Dict context
options = {
'ctx': {
'role': 'analyst',
'department': 'research',
},
'data_tokens': ['dataToken1', 'dataToken2'],
'time_to_live': 90,
}
```

> [!TIP]
> See the full example in the samples directory: [signed_token_generation_example.py](samples/service_account/signed_token_generation_example.py)
> See the full example in the samples directory: [signed_token_generation_example.py](samples/service_account/signed_token_generation_example.py)
> See [docs.skyflow.com](https://docs.skyflow.com) for more details on authentication, access control, and governance for Skyflow.

## Logging
Expand Down
124 changes: 64 additions & 60 deletions samples/detect_api/deidentify_file.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
from skyflow.error import SkyflowError
from skyflow import Env, Skyflow, LogLevel
from skyflow.utils.enums import DetectEntities, MaskingMethod, DetectOutputTranscriptions
from skyflow.vault.detect import DeidentifyFileRequest, TokenFormat, Transformations, DateTransformation, Bleep, FileInput
from skyflow.vault.detect import (
DeidentifyFileRequest,
TokenFormat,
Transformations,
DateTransformation,
Bleep,
FileInput,
)

"""
* Skyflow Deidentify File Example
Expand All @@ -11,6 +18,7 @@
* spreadsheets, presentations, structured text.
"""


def perform_file_deidentification():
try:
# Step 1: Configure Credentials
Expand All @@ -23,7 +31,7 @@ def perform_file_deidentification():
'vault_id': '<YOUR_VAULT_ID>', # Replace with your vault ID
'cluster_id': '<YOUR_CLUSTER_ID>', # Replace with your cluster ID
'env': Env.PROD, # Deployment environment
'credentials': credentials
'credentials': credentials,
}

# Step 3: Configure & Initialize Skyflow Client
Expand All @@ -36,70 +44,66 @@ def perform_file_deidentification():

# Step 4: Create File Object
file_path = '<FILE_PATH>' # Replace with your file path
file = open(file_path, 'rb')
# Step 5: Configure Deidentify File Request with all options
deidentify_request = DeidentifyFileRequest(
file=FileInput(file), # File to de-identify (can also provide a file path)
entities=[DetectEntities.SSN, DetectEntities.CREDIT_CARD], # Entities to detect
allow_regex_list=['<YOUR_REGEX_PATTERN>'], # Optional: Patterns to allow
restrict_regex_list=['<YOUR_REGEX_PATTERN>'], # Optional: Patterns to restrict

# Token format configuration
token_format=TokenFormat(
vault_token=[DetectEntities.SSN], # Use vault tokens for these entities
),

# Optional: Custom transformations
# transformations=Transformations(
# shift_dates=DateTransformation(
# max_days=30,
# min_days=10,
# entities=[DetectEntities.DOB]
# )
# ),

# Output configuration
output_directory='<OUTPUT_DIRECTORY_PATH>', # Where to save processed file
wait_time=15, # Max wait time in seconds (max 64)

# Image-specific options
output_processed_image=True, # Include processed image in output
output_ocr_text=True, # Include OCR text in response
masking_method=MaskingMethod.BLACKBOX, # Masking method for images

# PDF-specific options
pixel_density=15, # Pixel density for PDF processing
max_resolution=2000, # Max resolution for PDF

# Audio-specific options
output_processed_audio=True, # Include processed audio
output_transcription=DetectOutputTranscriptions.PLAINTEXT_TRANSCRIPTION, # Transcription type

# Audio bleep configuration

# bleep=Bleep(
# gain=5, # Loudness in dB
# frequency=1000, # Pitch in Hz
# start_padding=0.1, # Padding at start (seconds)
# stop_padding=0.2 # Padding at end (seconds)
# )
)

# Step 6: Call deidentifyFile API
response = skyflow_client.detect().deidentify_file(deidentify_request)
# Step 5: Configure Deidentify File Request and call API
with open(file_path, 'rb') as file:
deidentify_request = DeidentifyFileRequest(
file=FileInput(file), # File to de-identify (can also provide a file path)
entities=[DetectEntities.SSN, DetectEntities.CREDIT_CARD], # Entities to detect
allow_regex_list=['<YOUR_REGEX_PATTERN>'], # Optional: Patterns to allow
restrict_regex_list=['<YOUR_REGEX_PATTERN>'], # Optional: Patterns to restrict
# Token format configuration
token_format=TokenFormat(
vault_token=[DetectEntities.SSN], # Use vault tokens for these entities
),
# Optional: Custom transformations
# transformations=Transformations(
# shift_dates=DateTransformation(
# max_days=30,
# min_days=10,
# entities=[DetectEntities.DOB]
# )
# ),
# Output configuration
output_directory='<OUTPUT_DIRECTORY_PATH>', # Where to save processed file
wait_time=15, # Max wait time in seconds (max 64)
# Image-specific options
output_processed_image=True, # Include processed image in output
output_ocr_text=True, # Include OCR text in response
masking_method=MaskingMethod.BLACKBOX, # Masking method for images
# PDF-specific options
pixel_density=15, # Pixel density for PDF processing
max_resolution=2000, # Max resolution for PDF
# Audio-specific options
output_processed_audio=True, # Include processed audio
output_transcription=DetectOutputTranscriptions.PLAINTEXT_TRANSCRIPTION, # Transcription type
# Audio bleep configuration
# bleep=Bleep(
# gain=5, # Loudness in dB
# frequency=1000, # Pitch in Hz
# start_padding=0.1, # Padding at start (seconds)
# stop_padding=0.2 # Padding at end (seconds)
# )
)

# Step 6: Call deidentifyFile API
response = skyflow_client.detect().deidentify_file(deidentify_request)

# Handle Successful Response
print("\nDeidentify File Response:", response)
print('\nDeidentify File Response:', response)

except SkyflowError as error:
# Handle Skyflow-specific errors
print('\nSkyflow Error:', {
'http_code': error.http_code,
'grpc_code': error.grpc_code,
'http_status': error.http_status,
'message': error.message,
'details': error.details
})
print(
'\nSkyflow Error:',
{
'http_code': error.http_code,
'grpc_code': error.grpc_code,
'http_status': error.http_status,
'message': error.message,
'details': error.details,
},
)
except Exception as error:
# Handle unexpected errors
print('Unexpected Error:', error)
Loading
Loading