Skip to content

Commit 01a2356

Browse files
SK-2813: resolve pr comments and update claude.md file
1 parent 38527d1 commit 01a2356

15 files changed

Lines changed: 878 additions & 51 deletions

File tree

.claude/CLAUDE.md

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,84 @@
1-
See @../AGENTS.md for documentation on how to configure and use Claude agents in this workspace.
1+
# CLAUDE.md
2+
3+
## Commands
4+
5+
**Install dependencies:**
6+
```bash
7+
pip install -r requirements.txt
8+
pip install ".[dev]" # includes ruff and codespell
9+
```
10+
11+
**Run all tests:**
12+
```bash
13+
python -m coverage run --source=skyflow \
14+
--omit=skyflow/generated/*,skyflow/utils/validations/*,skyflow/vault/data/*,skyflow/vault/detect/*,skyflow/vault/tokens/*,skyflow/vault/connection/*,skyflow/error/*,skyflow/utils/enums/*,skyflow/vault/controller/_audit.py,skyflow/vault/controller/_bin_look_up.py \
15+
-m unittest discover
16+
```
17+
18+
**Run a single test:**
19+
```bash
20+
python -m unittest tests.vault.controller.test__vault.TestVault.test_insert
21+
```
22+
23+
**Lint:**
24+
```bash
25+
ruff check . --output-format=github
26+
```
27+
28+
**Spell check:**
29+
```bash
30+
codespell
31+
```
32+
33+
**Build package:**
34+
```bash
35+
python setup.py sdist bdist_wheel
36+
```
37+
38+
**Commit message format:** All commits must include a Jira ticket ID, e.g. `SDK-123: description`. CI enforces this on PRs.
39+
40+
## Architecture
41+
42+
### Entry point and builder
43+
44+
`skyflow/client/skyflow.py``Skyflow` is a facade built via `Skyflow.builder()`. The inner `Builder` class accumulates vault configs, connection configs, and optional Skyflow-level credentials. Calling `.build()` validates all configs and instantiates `VaultClient` + controller objects per config. Each `add_vault_config()` call creates a `Vault` controller and a `Detect` controller backed by the same `VaultClient`.
45+
46+
### Three controllers
47+
48+
- `skyflow/vault/controller/_vault.py``Vault`: insert, get, update, delete, query, detokenize, tokenize, upload_file. All methods follow: validate → `__initialize()` → call generated API → parse response.
49+
- `skyflow/vault/controller/_connections.py``Connection`: single `invoke()` method that builds an HTTP request via `requests.PreparedRequest` and sends it directly (bypasses the generated client).
50+
- `skyflow/vault/controller/_detect.py``Detect`: deidentify/reidentify text and files. File operations use long-polling (`__poll_for_processed_file`) with exponential back-off up to a configurable `wait_time` (max 64 s).
51+
52+
### VaultClient (auth and token caching)
53+
54+
`skyflow/vault/client/client.py` — Holds the generated REST client (`skyflow/generated/rest/`) and manages bearer token lifecycle. `initialize_client_configuration()` is called before every API request; it skips re-initialization if a non-expired token exists. Credential resolution order: config-level credentials → Skyflow-level credentials → `SKYFLOW_CREDENTIALS` env var.
55+
56+
### Service account / JWT layer
57+
58+
`skyflow/service_account/_utils.py` — Standalone functions for bearer token generation (`generate_bearer_token`, `generate_bearer_token_from_creds`) and signed data tokens (`generate_signed_data_tokens`). Signs a JWT with RS256 then exchanges it at the `tokenURI` via `AuthClient`. `is_expired()` decodes JWT without verification to check `exp`.
59+
60+
### Generated REST client
61+
62+
`skyflow/generated/rest/`**Do not edit.** Auto-generated by Fern from the Skyflow API spec. All vault/token/query/detect/file API calls go through this layer. The `Skyflow` class in `generated/rest/client.py` is the raw HTTP client (separate from `skyflow/client/skyflow.py`).
63+
64+
### Validation
65+
66+
`skyflow/utils/validations/_validations.py` — All request and config validation lives here. Every public SDK method calls a corresponding `validate_*` function before touching the API. Config-level validation (vault, connection, credentials) runs at `build()` / `add_*_config()` time.
67+
68+
### Error handling
69+
70+
`skyflow/error/_skyflow_error.py``SkyflowError(message, http_code, request_id, grpc_code, http_status, details)`. `handle_exception()` in `skyflow/utils/_utils.py` converts raw generated-client exceptions into `SkyflowError` by inspecting `content-type` of the error response.
71+
72+
### Messages and constants
73+
74+
- `skyflow/utils/_skyflow_messages.py` — All user-facing error strings, log strings, and HTTP status enums. Strings are grouped into `Error`, `ErrorLogs`, `Info`, `ErrorCodes`, `HttpStatus` inner enums.
75+
- `skyflow/utils/constants.py` — Named constant classes (`CredentialField`, `JWT`, `SdkMetricsKey`, etc.) and top-level constants (`PROTOCOL`, `SKY_META_DATA_HEADER`).
76+
- `skyflow/utils/enums/``LogLevel`, `Env`, `RedactionType`, `TokenMode`, `ContentType`, `DetectEntities`, etc.
77+
78+
### Request/Response objects
79+
80+
Plain dataclasses in `skyflow/vault/data/`, `skyflow/vault/tokens/`, `skyflow/vault/detect/`, and `skyflow/vault/connection/`. Response objects always have an `errors` field (list or `None`).
81+
82+
### SDK metadata header
83+
84+
Every API call appends `sky-metadata: <json>` via `__get_headers()`. The metrics dict (`skyflow/utils/_utils.py:get_metrics()`) is computed once per process and cached in `_CACHED_METRICS`.

skyflow/error/_skyflow_error.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,4 @@ def __init__(self,
1414
self.http_status = http_status if http_status else SkyflowMessages.HttpStatus.BAD_REQUEST.value
1515
self.details = details
1616
self.request_id = request_id
17-
super().__init__()
17+
super().__init__(message)

skyflow/service_account/_utils.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -240,8 +240,7 @@ def generate_signed_data_tokens_from_creds(credentials, options):
240240
return get_signed_tokens(json_credentials, options)
241241

242242
def get_signed_data_token_response_object(signed_token, actual_token):
243-
response_object = {
243+
return {
244244
ResponseField.TOKEN: actual_token,
245-
ResponseField.SIGNED_TOKEN: signed_token
245+
ResponseField.SIGNED_TOKEN: signed_token,
246246
}
247-
return response_object.get(ResponseField.TOKEN), response_object.get(ResponseField.SIGNED_TOKEN)

skyflow/utils/_utils.py

Lines changed: 50 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -421,22 +421,30 @@ def parse_invoke_connection_response(api_response: requests.Response):
421421
error_response = json.loads(content)
422422
error_from_client = api_response.headers.get(HttpHeader.ERROR_FROM_CLIENT)
423423

424-
status_code = error_response.get(ResponseField.ERROR, {}).get(ResponseField.HTTP_CODE, status_code)
425-
http_status = error_response.get(ResponseField.ERROR, {}).get(ResponseField.HTTP_STATUS)
426-
grpc_code = error_response.get(ResponseField.ERROR, {}).get(ResponseField.GRPC_CODE)
427-
details = error_response.get(ResponseField.ERROR, {}).get(ResponseField.DETAILS)
428-
message = error_response.get(ResponseField.ERROR, {}).get(ResponseField.MESSAGE, SkyflowMessages.Error.UNKNOWN_ERROR_DEFAULT_MESSAGE.value)
429-
424+
http_status = None
425+
grpc_code = None
426+
details = None
427+
428+
error_obj = error_response.get(ResponseField.ERROR) if isinstance(error_response, dict) else None
429+
if isinstance(error_obj, dict):
430+
status_code = error_obj.get(ResponseField.HTTP_CODE, status_code)
431+
http_status = error_obj.get(ResponseField.HTTP_STATUS)
432+
grpc_code = error_obj.get(ResponseField.GRPC_CODE)
433+
details = error_obj.get(ResponseField.DETAILS)
434+
message = error_obj.get(ResponseField.MESSAGE, message)
435+
elif isinstance(error_obj, str) and error_obj:
436+
message = error_obj
437+
430438
if error_from_client is not None:
431-
if details is None:
439+
if details is None:
432440
details = []
433441
error_from_client_bool = error_from_client.lower() == BooleanString.TRUE
434442
details.append({ResponseField.ERROR_FROM_CLIENT: error_from_client_bool})
435443

436444
raise SkyflowError(message, status_code, request_id, grpc_code, http_status, details)
437-
445+
438446
except json.JSONDecodeError:
439-
raise SkyflowError(content if content else message, status_code, request_id)
447+
raise SkyflowError(message, status_code, request_id)
440448

441449
def parse_deidentify_text_response(api_response: DeidentifyStringResponse):
442450
entities = [convert_detected_entity_to_entity_info(entity) for entity in api_response.entities]
@@ -480,21 +488,46 @@ def handle_exception(error, logger):
480488

481489
def handle_json_error(err, data, request_id, logger):
482490
try:
483-
if isinstance(data, dict): # If data is already a dict
491+
if isinstance(data, dict):
484492
description = data
485493
elif isinstance(data, ErrorResponse):
486494
description = data.dict()
487495
else:
488496
description = json.loads(data)
489-
status_code = description.get(ResponseField.ERROR, {}).get(ResponseField.HTTP_CODE, HttpStatusCode.INTERNAL_SERVER_ERROR)
490-
http_status = description.get(ResponseField.ERROR, {}).get(ResponseField.HTTP_STATUS)
491-
grpc_code = description.get(ResponseField.ERROR, {}).get(ResponseField.GRPC_CODE)
492-
details = description.get(ResponseField.ERROR, {}).get(ResponseField.DETAILS, [])
493497

494-
description_message = description.get(ResponseField.ERROR, {}).get(ResponseField.MESSAGE, SkyflowMessages.Error.UNKNOWN_ERROR_DEFAULT_MESSAGE.value)
495-
log_and_reject_error(description_message, status_code, request_id, http_status, grpc_code, details, logger = logger)
498+
if ResponseField.ERROR in description:
499+
error_obj = description.get(ResponseField.ERROR, {})
500+
status_code = error_obj.get(ResponseField.HTTP_CODE, HttpStatusCode.INTERNAL_SERVER_ERROR)
501+
http_status = error_obj.get(ResponseField.HTTP_STATUS)
502+
grpc_code = error_obj.get(ResponseField.GRPC_CODE)
503+
details = error_obj.get(ResponseField.DETAILS, [])
504+
description_message = error_obj.get(ResponseField.MESSAGE, SkyflowMessages.Error.UNKNOWN_ERROR_DEFAULT_MESSAGE.value)
505+
elif ResponseField.RESPONSES in description:
506+
responses = description.get(ResponseField.RESPONSES, [])
507+
messages = []
508+
status_code = HttpStatusCode.INTERNAL_SERVER_ERROR
509+
for resp in responses:
510+
resp_status = resp.get(ResponseField.STATUS, HttpStatusCode.INTERNAL_SERVER_ERROR)
511+
resp_body = resp.get(ResponseField.BODY, {})
512+
if isinstance(resp_status, int) and resp_status >= 400:
513+
status_code = resp_status
514+
error_msg = resp_body.get(ResponseField.ERROR)
515+
if error_msg:
516+
messages.append(str(error_msg))
517+
description_message = '; '.join(messages) if messages else SkyflowMessages.Error.UNKNOWN_ERROR_DEFAULT_MESSAGE.value
518+
http_status = None
519+
grpc_code = None
520+
details = []
521+
else:
522+
status_code = HttpStatusCode.INTERNAL_SERVER_ERROR
523+
http_status = None
524+
grpc_code = None
525+
details = []
526+
description_message = SkyflowMessages.Error.UNKNOWN_ERROR_DEFAULT_MESSAGE.value
527+
528+
log_and_reject_error(description_message, status_code, request_id, http_status, grpc_code, details, logger=logger)
496529
except json.JSONDecodeError:
497-
log_and_reject_error(SkyflowMessages.Error.INVALID_JSON_RESPONSE.value, err, request_id, logger = logger)
530+
log_and_reject_error(SkyflowMessages.Error.INVALID_JSON_RESPONSE.value, err, request_id, logger=logger)
498531

499532
def handle_text_error(err, data, request_id, logger):
500533
log_and_reject_error(data, err.status, request_id, logger = logger)

skyflow/utils/constants.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,7 @@ class ResponseField:
117117
TYPE = 'type'
118118
TOKENIZED_DATA = 'tokenized_data'
119119
SIGNED_TOKEN = 'signed_token'
120+
RESPONSES = 'responses'
120121

121122

122123
class CredentialField:
@@ -195,6 +196,7 @@ class DeidentifyFileRequestField:
195196
OUTPUT_OCR_TEXT = 'output_ocr_text'
196197
MASKING_METHOD = 'masking_method'
197198
PIXEL_DENSITY = 'pixel_density'
199+
DENSITY = 'density'
198200
MAX_RESOLUTION = 'max_resolution'
199201
OUTPUT_PROCESSED_AUDIO = 'output_processed_audio'
200202
OUTPUT_TRANSCRIPTION = 'output_transcription'
@@ -230,6 +232,7 @@ class DeidentifyField:
230232
ENTITY_UNQ_COUNTER = 'entity_unq_counter'
231233
ENTITY_UNIQUE_COUNTER = 'entity_unique_counter'
232234
ENTITY_ONLY = 'entity_only'
235+
VAULT_TOKEN = 'vault_token'
233236
ENTITIES = 'entities'
234237
MAX_DAYS = 'max_days'
235238
MIN_DAYS = 'min_days'

skyflow/utils/enums/detect_output_transcriptions.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ class DetectOutputTranscriptions(Enum):
44
DIARIZED_TRANSCRIPTION = "diarized_transcription"
55
MEDICAL_DIARIZED_TRANSCRIPTION = "medical_diarized_transcription"
66
MEDICAL_TRANSCRIPTION = "medical_transcription"
7-
TRANSCRIPTION = "transcription"
7+
TRANSCRIPTION = "transcription"
8+
PLAINTEXT_TRANSCRIPTION = "plaintext_transcription"

skyflow/utils/validations/_validations.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -232,8 +232,10 @@ def validate_update_vault_config(logger, config):
232232
if ConfigField.ENV in config and config.get(ConfigField.ENV) not in Env:
233233
raise SkyflowError(SkyflowMessages.Error.INVALID_ENV.value.format(vault_id), invalid_input_error_code)
234234

235-
if ConfigField.CREDENTIALS in config and config.get(ConfigField.CREDENTIALS):
236-
validate_credentials(logger, config.get(ConfigField.CREDENTIALS), ConfigType.VAULT, vault_id)
235+
if ConfigField.CREDENTIALS not in config:
236+
raise SkyflowError(SkyflowMessages.Error.EMPTY_CREDENTIALS.value.format(ConfigType.VAULT, vault_id), invalid_input_error_code)
237+
238+
validate_credentials(logger, config.get(ConfigField.CREDENTIALS), ConfigType.VAULT, vault_id)
237239

238240
return True
239241

@@ -255,8 +257,10 @@ def validate_connection_config(logger, config):
255257
SkyflowMessages.Error.INVALID_CONNECTION_URL.value.format(connection_id)
256258
)
257259

258-
if ConfigField.CREDENTIALS in config:
259-
validate_credentials(logger, config.get(ConfigField.CREDENTIALS), ConfigType.CONNECTION, connection_id)
260+
if ConfigField.CREDENTIALS not in config:
261+
raise SkyflowError(SkyflowMessages.Error.EMPTY_CREDENTIALS.value.format(ConfigType.CONNECTION, connection_id), invalid_input_error_code)
262+
263+
validate_credentials(logger, config.get(ConfigField.CREDENTIALS), ConfigType.CONNECTION, connection_id)
260264

261265
return True
262266

@@ -298,7 +302,7 @@ def validate_file_from_request(file_input: FileInput):
298302
if has_file:
299303
file = file_input.file
300304
# Validate file object has required attributes
301-
if not hasattr(file, FileUploadField.FILE_NAME) or not isinstance(file.name, str) or not file.name.strip():
305+
if not hasattr(file, FileUploadField.NAME) or not isinstance(file.name, str) or not file.name.strip():
302306
raise SkyflowError(SkyflowMessages.Error.INVALID_FILE_TYPE.value, invalid_input_error_code)
303307

304308
# Validate file name

skyflow/vault/controller/_detect.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ def output_to_dict_list(output):
187187
return DeidentifyFileResponse(
188188
file_base64=base64_string,
189189
file=file_obj,
190-
type=first_output.get(DeidentifyField.TYPE, DetectStatus.UNKNOWN),
190+
type=first_output.get(DeidentifyField.TYPE, None),
191191
extension=extension,
192192
word_count=word_count,
193193
char_count=char_count,
@@ -207,6 +207,7 @@ def __get_token_format(self, request):
207207
DeidentifyField.DEFAULT: getattr(request.token_format, DeidentifyField.DEFAULT, None),
208208
DeidentifyField.ENTITY_UNQ_COUNTER: getattr(request.token_format, DeidentifyField.ENTITY_UNIQUE_COUNTER, None),
209209
DeidentifyField.ENTITY_ONLY: getattr(request.token_format, DeidentifyField.ENTITY_ONLY, None),
210+
DeidentifyField.VAULT_TOKEN: getattr(request.token_format, DeidentifyField.VAULT_TOKEN, None)
210211
}
211212

212213
def __get_transformations(self, request):
@@ -346,7 +347,7 @@ def deidentify_file(self, request: DeidentifyFileRequest):
346347
DeidentifyField.ALLOW_REGEX: request.allow_regex_list,
347348
DeidentifyField.RESTRICT_REGEX: request.restrict_regex_list,
348349
DeidentifyFileRequestField.MAX_RESOLUTION: getattr(request, DeidentifyFileRequestField.MAX_RESOLUTION, None),
349-
DeidentifyFileRequestField.PIXEL_DENSITY: getattr(request, DeidentifyFileRequestField.PIXEL_DENSITY, None),
350+
DeidentifyFileRequestField.DENSITY: getattr(request, DeidentifyFileRequestField.PIXEL_DENSITY, None),
350351
DeidentifyField.REQUEST_OPTIONS: {'additional_headers': self.__get_headers()}
351352
}
352353

skyflow/vault/detect/_deidentify_file_response.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import io
2+
from typing import Optional
23
from skyflow.vault.detect._file import File
34

45
class DeidentifyFileResponse:
@@ -17,6 +18,7 @@ def __init__(
1718
entities: list = None, # list of dicts with keys 'file' and 'extension'
1819
run_id: str = None,
1920
status: str = None,
21+
errors: Optional[list] = None,
2022
):
2123
self.file_base64 = file_base64
2224
self.file = File(file) if file else None
@@ -31,6 +33,7 @@ def __init__(
3133
self.entities = entities if entities is not None else []
3234
self.run_id = run_id
3335
self.status = status
36+
self.errors = errors
3437

3538
def __repr__(self):
3639
return (
@@ -40,7 +43,7 @@ def __repr__(self):
4043
f"char_count={self.char_count!r}, size_in_kb={self.size_in_kb!r}, "
4144
f"duration_in_seconds={self.duration_in_seconds!r}, page_count={self.page_count!r}, "
4245
f"slide_count={self.slide_count!r}, entities={self.entities!r}, "
43-
f"run_id={self.run_id!r}, status={self.status!r})"
46+
f"run_id={self.run_id!r}, status={self.status!r}, errors={self.errors!r})"
4447
)
4548

4649
def __str__(self):
Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
1-
from typing import List
1+
from typing import List, Optional
22
from ._entity_info import EntityInfo
33

44
class DeidentifyTextResponse:
5-
def __init__(self,
5+
def __init__(self,
66
processed_text: str,
7-
entities: List[EntityInfo],
7+
entities: List[EntityInfo],
88
word_count: int,
9-
char_count: int):
9+
char_count: int,
10+
errors: Optional[list] = None):
1011
self.processed_text = processed_text
1112
self.entities = entities
1213
self.word_count = word_count
1314
self.char_count = char_count
15+
self.errors = errors
1416

1517
def __repr__(self):
16-
return f"DeidentifyTextResponse(processed_text='{self.processed_text}', entities={self.entities}, word_count={self.word_count}, char_count={self.char_count})"
18+
return f"DeidentifyTextResponse(processed_text='{self.processed_text}', entities={self.entities}, word_count={self.word_count}, char_count={self.char_count}, errors={self.errors})"
1719

1820
def __str__(self):
1921
return self.__repr__()

0 commit comments

Comments
 (0)