|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +This is the Skyflow Python SDK (`skyflow-python`). It provides a Python interface to the Skyflow Data Privacy Vault API — vault operations (insert, get, update, delete, query, tokenize, detokenize, upload_file), service account authentication (bearer tokens, signed data tokens), connections, and detect (deidentify/reidentify text and files). |
| 6 | + |
| 7 | +**Current version:** 2.x |
| 8 | + |
| 9 | +## Critical Boundary — Generated Code |
| 10 | + |
| 11 | +**Never edit files under `skyflow/generated/`.** |
| 12 | + |
| 13 | +These are auto-generated by [Fern](https://buildwithfern.com) from the Skyflow API definition. Manual edits are overwritten on the next generation run. If you find a bug in generated code, report it — do not patch it directly. |
| 14 | + |
| 15 | +The `ruff.toml` and coverage omit lists already exclude `generated/` from all checks. |
| 16 | + |
| 17 | +## Project Structure |
| 18 | + |
| 19 | +``` |
| 20 | +skyflow/ |
| 21 | + client/ |
| 22 | + skyflow.py # Skyflow facade + Builder (entry point) |
| 23 | + vault/ |
| 24 | + client/ |
| 25 | + client.py # VaultClient — auth, token caching, generated client holder |
| 26 | + controller/ |
| 27 | + _vault.py # Vault: insert, get, update, delete, query, tokenize, detokenize, upload_file |
| 28 | + _connections.py # Connection: invoke() |
| 29 | + _detect.py # Detect: deidentify_text, reidentify_text, deidentify_file, get_detect_run |
| 30 | + data/ # Request/Response objects: InsertRequest, GetResponse, FileUploadRequest, etc. |
| 31 | + tokens/ # DetokenizeRequest/Response, TokenizeRequest/Response |
| 32 | + connection/ # InvokeConnectionRequest/Response |
| 33 | + detect/ # DeidentifyTextRequest, ReidentifyTextRequest, DeidentifyFileRequest, etc. |
| 34 | + service_account/ |
| 35 | + _utils.py # generate_bearer_token, generate_bearer_token_from_creds, generate_signed_data_tokens |
| 36 | + auth/ # AuthClient — JWT exchange at tokenURI |
| 37 | + utils/ |
| 38 | + _skyflow_messages.py # All error/log/info strings (Error, ErrorLogs, Info, ErrorCodes, HttpStatus enums) |
| 39 | + constants.py # CredentialField, JWT, SdkMetricsKey, OptionField, and top-level constants |
| 40 | + _utils.py # handle_exception(), get_metrics(), is_expired() |
| 41 | + validations/ |
| 42 | + _validations.py # ALL request and config validation — validate_*() functions |
| 43 | + enums/ # LogLevel, Env, RedactionType, TokenMode, ContentType, DetectEntities, etc. |
| 44 | + logger.py # log_info(), log_error_log(), Logger |
| 45 | + error/ |
| 46 | + _skyflow_error.py # SkyflowError(message, http_code, request_id, grpc_code, http_status, details) |
| 47 | + generated/ # ← FERN-GENERATED, DO NOT EDIT |
| 48 | + rest/ # Raw HTTP client, API classes |
| 49 | +tests/ # unittest tests mirroring skyflow/ structure |
| 50 | +samples/ |
| 51 | + vault_api/ # Vault operation samples |
| 52 | + service_account/ # Bearer token / signed token samples |
| 53 | + connection/ # Connection samples |
| 54 | + detect_api/ # Detect samples |
| 55 | +docs/ |
| 56 | + migrate_to_v2.md # v1 → v2 migration guide |
| 57 | + advanced_initialization.md |
| 58 | + auth_credentials.md |
| 59 | +``` |
| 60 | + |
| 61 | +## Naming Conventions |
| 62 | + |
| 63 | +- **Methods / variables / parameters:** `snake_case` — `vault_id`, `get_records`, `token_uri` |
| 64 | +- **Classes / Exceptions / Enums:** `PascalCase` — `InsertRequest`, `SkyflowError`, `RedactionType` |
| 65 | +- **Constants / module-level values:** `UPPER_SNAKE_CASE` — `SKY_META_DATA_HEADER`, `PROTOCOL` |
| 66 | +- **Private methods / attributes:** `_snake_case` — `_validate_ctx`, `_cached_headers` |
| 67 | +- **Acronyms are all-lowercase in identifiers:** `skyflow_id` (not `skyflow_ID`), `token_uri` (not `token_URI`), `api_key` (not `API_key`) |
| 68 | +- **Response objects:** always use `snake_case` field names — `skyflow_id`, `inserted_fields`, `detokenized_fields` |
| 69 | +- **Deprecated methods:** use `@deprecated` from `typing_extensions` for compile-time IDE strikethrough, plus `warnings.warn(msg, DeprecationWarning, stacklevel=2)` for runtime console output |
| 70 | +- **Error messages:** always use `SkyflowMessages` enum constants — never hardcode strings in controllers or validators |
| 71 | + |
| 72 | +## Build and Test |
| 73 | + |
| 74 | +```bash |
| 75 | +# Install dependencies |
| 76 | +pip install -r requirements.txt |
| 77 | +pip install ".[dev]" # includes ruff and codespell |
| 78 | + |
| 79 | +# Lint |
| 80 | +ruff check . --output-format=github |
| 81 | + |
| 82 | +# Spell check |
| 83 | +codespell |
| 84 | + |
| 85 | +# Run all tests with coverage |
| 86 | +python -m coverage run --source=skyflow \ |
| 87 | + --omit=skyflow/generated/*,skyflow/utils/validations/*,skyflow/vault/data/*,skyflow/vault/detect/*,skyflow/vault/tokens/*,skyflow/vault/connection/*,skyflow/error/*,skyflow/utils/enums/*,skyflow/vault/controller/_audit.py,skyflow/vault/controller/_bin_look_up.py \ |
| 88 | + -m unittest discover |
| 89 | + |
| 90 | +# Coverage report |
| 91 | +python -m coverage report --show-missing |
| 92 | + |
| 93 | +# Run a single test |
| 94 | +python -m unittest tests.vault.controller.test__vault.TestVault.test_insert |
| 95 | + |
| 96 | +# Build package |
| 97 | +python setup.py sdist bdist_wheel |
| 98 | +``` |
| 99 | + |
| 100 | +**Commit message format:** All commits must include a Jira ticket ID, e.g. `SK-123: description`. CI enforces this on PRs. |
| 101 | + |
| 102 | +## Credentials Format |
| 103 | + |
| 104 | +The SDK accepts credentials as a dict with one of the following key patterns: |
| 105 | + |
| 106 | +```python |
| 107 | +# Service account credentials string (JSON) |
| 108 | +credentials = {'credentials_string': '{"clientID":"...","tokenURI":"...","keyID":"...","privateKey":"..."}'} |
| 109 | + |
| 110 | +# Service account credentials file path |
| 111 | +credentials = {'path': 'credentials.json'} |
| 112 | + |
| 113 | +# API key |
| 114 | +credentials = {'api_key': '<YOUR_API_KEY>'} |
| 115 | + |
| 116 | +# Static bearer token |
| 117 | +credentials = {'token': '<BEARER_TOKEN>'} |
| 118 | +``` |
| 119 | + |
| 120 | +The canonical credential JSON field names are `clientID`, `tokenURI`, `keyID`, `privateKey`. These are accessed via `CredentialField` constants in `skyflow/utils/constants.py`. |
| 121 | + |
| 122 | +## Key Design Patterns |
| 123 | + |
| 124 | +### Controller method flow |
| 125 | +Every public controller method follows this exact sequence: |
| 126 | +1. `log_info(SkyflowMessages.Info.XXX_TRIGGERED.value, logger)` |
| 127 | +2. `validate_xxx_request(logger, request)` — raises `SkyflowError` on invalid input |
| 128 | +3. `self.__initialize()` — refreshes bearer token if expired |
| 129 | +4. Call generated API via `self.__vault_client.get_xxx_api()` |
| 130 | +5. Parse response into typed response object |
| 131 | +6. `log_info(SkyflowMessages.Info.XXX_SUCCESS.value, logger)` |
| 132 | +7. `except Exception as e: handle_exception(e, logger)` |
| 133 | + |
| 134 | +### Validation pattern |
| 135 | +All validators in `_validations.py` follow: |
| 136 | +1. `log_error_log(SkyflowMessages.ErrorLogs.XXX.value, logger)` — log before raising |
| 137 | +2. `raise SkyflowError(SkyflowMessages.Error.XXX.value, SkyflowMessages.ErrorCodes.INVALID_INPUT.value)` |
| 138 | + |
| 139 | +### Credential resolution order |
| 140 | +`VaultClient` resolves credentials in this priority order: |
| 141 | +1. Config-level credentials (passed to `add_vault_config()`) |
| 142 | +2. Skyflow-level credentials (passed to `add_skyflow_credentials()`) |
| 143 | +3. `SKYFLOW_CREDENTIALS` environment variable |
| 144 | + |
| 145 | +## Known Pre-existing Coverage Exclusions |
| 146 | + |
| 147 | +These modules are excluded from coverage measurement — omissions here are not regressions: |
| 148 | + |
| 149 | +| Path | Reason | |
| 150 | +|---|---| |
| 151 | +| `skyflow/generated/*` | Fern-generated REST client | |
| 152 | +| `skyflow/utils/validations/*` | Validation-only, tested indirectly via controller tests | |
| 153 | +| `skyflow/vault/data/*` | Plain dataclasses, no logic | |
| 154 | +| `skyflow/vault/detect/*` | Detect request/response dataclasses | |
| 155 | +| `skyflow/vault/tokens/*` | Token request/response dataclasses | |
| 156 | +| `skyflow/vault/connection/*` | Connection request/response dataclasses | |
| 157 | +| `skyflow/error/*` | Error class, minimal logic | |
| 158 | +| `skyflow/utils/enums/*` | Enum definitions only | |
| 159 | +| `skyflow/vault/controller/_audit.py` | Audit controller, not yet in test suite | |
| 160 | +| `skyflow/vault/controller/_bin_look_up.py` | BIN lookup controller, not yet in test suite | |
| 161 | + |
| 162 | +## Slash Commands |
| 163 | + |
| 164 | +- `/code-review` — full review: SDK patterns + naming + test coverage + code smells + security |
| 165 | +- `/code-smell` — standalone structural smell analysis (long methods, dead code, misplaced validation) |
| 166 | +- `/code-security` — standalone security audit (credentials, input validation, path traversal, HTTP security) |
| 167 | +- `/sdk-sample <feature>` — generate a runnable sample file for a vault or service account feature |
| 168 | +- `/test [module.path]` — run quality pipeline (lint → spell check → tests → coverage report) |
0 commit comments