Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
ad095e4
SK-1964: add sample for handling deidentify file response in async fo…
skyflow-himanshu Feb 19, 2026
3626913
SK-2777: Public Release - Update Fern client re-initialisation (#240)
saileshwar-skyflow Apr 29, 2026
e3e6b35
[AUTOMATED] Public Release - 2.0.1
saileshwar-skyflow Apr 29, 2026
3c53165
SK-2681: Add dict context support for Conditional Data Access (#241)
saileshwar-skyflow May 6, 2026
7ce51fb
[AUTOMATED] Public Release - 2.0.2
saileshwar-skyflow May 6, 2026
0490e6e
SK-2813: Merge branch 'main' into saileshwar/SK-2813-python-v2-code-c…
saileshwar-skyflow May 11, 2026
9fbbaa4
SK-2813: python sdk v2 code clean up
saileshwar-skyflow May 11, 2026
38527d1
SK-2813: add unit tests
saileshwar-skyflow May 11, 2026
01a2356
SK-2813: resolve pr comments and update claude.md file
saileshwar-skyflow May 14, 2026
170cbbc
SK-2813: fixed literal 400 in utils is now replaced with constant
saileshwar-skyflow May 14, 2026
7be6e4c
SK-2813: Remove .claude folder from git tracking
saileshwar-skyflow May 14, 2026
c27cc7e
SK-2813: revert get signed data tokens response to tuple
saileshwar-skyflow May 14, 2026
f2aeee1
SK-2813: fix unit tests
saileshwar-skyflow May 14, 2026
2bb244c
SK-2813: add unit tests
saileshwar-skyflow May 14, 2026
89d91ad
SK-2833: add deprecation warnings
saileshwar-skyflow May 18, 2026
ed4956a
SK-2833: update file upload request
saileshwar-skyflow May 18, 2026
0d08bba
SK-2838: update CHANGELOG.md with latest releases
saileshwar-skyflow May 19, 2026
005efed
SK-2838: add claude setup
saileshwar-skyflow May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# CLAUDE.md

## Project Overview

This is the Skyflow Python SDK (`skyflow-python`). It provides a Python interface to the Skyflow Data Privacy Vault API — vault operations (insert, get, update, delete, query, tokenize, detokenize, upload_file), service account authentication (bearer tokens, signed data tokens), connections, and detect (deidentify/reidentify text and files).

**Current version:** 2.x

## Critical Boundary — Generated Code

**Never edit files under `skyflow/generated/`.**

These are auto-generated by [Fern](https://buildwithfern.com) from the Skyflow API definition. Manual edits are overwritten on the next generation run. If you find a bug in generated code, report it — do not patch it directly.

The `ruff.toml` and coverage omit lists already exclude `generated/` from all checks.

## Project Structure

```
skyflow/
client/
skyflow.py # Skyflow facade + Builder (entry point)
vault/
client/
client.py # VaultClient — auth, token caching, generated client holder
controller/
_vault.py # Vault: insert, get, update, delete, query, tokenize, detokenize, upload_file
_connections.py # Connection: invoke()
_detect.py # Detect: deidentify_text, reidentify_text, deidentify_file, get_detect_run
data/ # Request/Response objects: InsertRequest, GetResponse, FileUploadRequest, etc.
tokens/ # DetokenizeRequest/Response, TokenizeRequest/Response
connection/ # InvokeConnectionRequest/Response
detect/ # DeidentifyTextRequest, ReidentifyTextRequest, DeidentifyFileRequest, etc.
service_account/
_utils.py # generate_bearer_token, generate_bearer_token_from_creds, generate_signed_data_tokens
auth/ # AuthClient — JWT exchange at tokenURI
utils/
_skyflow_messages.py # All error/log/info strings (Error, ErrorLogs, Info, ErrorCodes, HttpStatus enums)
constants.py # CredentialField, JWT, SdkMetricsKey, OptionField, and top-level constants
_utils.py # handle_exception(), get_metrics(), is_expired()
validations/
_validations.py # ALL request and config validation — validate_*() functions
enums/ # LogLevel, Env, RedactionType, TokenMode, ContentType, DetectEntities, etc.
logger.py # log_info(), log_error_log(), Logger
error/
_skyflow_error.py # SkyflowError(message, http_code, request_id, grpc_code, http_status, details)
generated/ # ← FERN-GENERATED, DO NOT EDIT
rest/ # Raw HTTP client, API classes
tests/ # unittest tests mirroring skyflow/ structure
samples/
vault_api/ # Vault operation samples
service_account/ # Bearer token / signed token samples
connection/ # Connection samples
detect_api/ # Detect samples
docs/
migrate_to_v2.md # v1 → v2 migration guide
advanced_initialization.md
auth_credentials.md
```

## Naming Conventions

- **Methods / variables / parameters:** `snake_case` — `vault_id`, `get_records`, `token_uri`
- **Classes / Exceptions / Enums:** `PascalCase` — `InsertRequest`, `SkyflowError`, `RedactionType`
- **Constants / module-level values:** `UPPER_SNAKE_CASE` — `SKY_META_DATA_HEADER`, `PROTOCOL`
- **Private methods / attributes:** `_snake_case` — `_validate_ctx`, `_cached_headers`
- **Acronyms are all-lowercase in identifiers:** `skyflow_id` (not `skyflow_ID`), `token_uri` (not `token_URI`), `api_key` (not `API_key`)
- **Response objects:** always use `snake_case` field names — `skyflow_id`, `inserted_fields`, `detokenized_fields`
- **Deprecated methods:** use `@deprecated` from `typing_extensions` for compile-time IDE strikethrough, plus `warnings.warn(msg, DeprecationWarning, stacklevel=2)` for runtime console output
- **Error messages:** always use `SkyflowMessages` enum constants — never hardcode strings in controllers or validators

## Build and Test

```bash
# Install dependencies
pip install -r requirements.txt
pip install ".[dev]" # includes ruff and codespell

# Lint
ruff check . --output-format=github

# Spell check
codespell

# Run all tests with coverage
python -m coverage run --source=skyflow \
--omit=skyflow/generated/*,skyflow/utils/validations/*,skyflow/vault/data/*,skyflow/vault/detect/*,skyflow/vault/tokens/*,skyflow/vault/connection/*,skyflow/error/*,skyflow/utils/enums/*,skyflow/vault/controller/_audit.py,skyflow/vault/controller/_bin_look_up.py \
-m unittest discover

# Coverage report
python -m coverage report --show-missing

# Run a single test
python -m unittest tests.vault.controller.test__vault.TestVault.test_insert

# Build package
python setup.py sdist bdist_wheel
```

**Commit message format:** All commits must include a Jira ticket ID, e.g. `SK-123: description`. CI enforces this on PRs.

## Credentials Format

The SDK accepts credentials as a dict with one of the following key patterns:

```python
# Service account credentials string (JSON)
credentials = {'credentials_string': '{"clientID":"...","tokenURI":"...","keyID":"...","privateKey":"..."}'}

# Service account credentials file path
credentials = {'path': 'credentials.json'}

# API key
credentials = {'api_key': '<YOUR_API_KEY>'}

# Static bearer token
credentials = {'token': '<BEARER_TOKEN>'}
```

The canonical credential JSON field names are `clientID`, `tokenURI`, `keyID`, `privateKey`. These are accessed via `CredentialField` constants in `skyflow/utils/constants.py`.

## Key Design Patterns

### Controller method flow
Every public controller method follows this exact sequence:
1. `log_info(SkyflowMessages.Info.XXX_TRIGGERED.value, logger)`
2. `validate_xxx_request(logger, request)` — raises `SkyflowError` on invalid input
3. `self.__initialize()` — refreshes bearer token if expired
4. Call generated API via `self.__vault_client.get_xxx_api()`
5. Parse response into typed response object
6. `log_info(SkyflowMessages.Info.XXX_SUCCESS.value, logger)`
7. `except Exception as e: handle_exception(e, logger)`

### Validation pattern
All validators in `_validations.py` follow:
1. `log_error_log(SkyflowMessages.ErrorLogs.XXX.value, logger)` — log before raising
2. `raise SkyflowError(SkyflowMessages.Error.XXX.value, SkyflowMessages.ErrorCodes.INVALID_INPUT.value)`

### Credential resolution order
`VaultClient` resolves credentials in this priority order:
1. Config-level credentials (passed to `add_vault_config()`)
2. Skyflow-level credentials (passed to `add_skyflow_credentials()`)
3. `SKYFLOW_CREDENTIALS` environment variable

## Known Pre-existing Coverage Exclusions

These modules are excluded from coverage measurement — omissions here are not regressions:

| Path | Reason |
|---|---|
| `skyflow/generated/*` | Fern-generated REST client |
| `skyflow/utils/validations/*` | Validation-only, tested indirectly via controller tests |
| `skyflow/vault/data/*` | Plain dataclasses, no logic |
| `skyflow/vault/detect/*` | Detect request/response dataclasses |
| `skyflow/vault/tokens/*` | Token request/response dataclasses |
| `skyflow/vault/connection/*` | Connection request/response dataclasses |
| `skyflow/error/*` | Error class, minimal logic |
| `skyflow/utils/enums/*` | Enum definitions only |
| `skyflow/vault/controller/_audit.py` | Audit controller, not yet in test suite |
| `skyflow/vault/controller/_bin_look_up.py` | BIN lookup controller, not yet in test suite |

## Slash Commands

- `/code-review` — full review: SDK patterns + naming + test coverage + code smells + security
- `/code-smell` — standalone structural smell analysis (long methods, dead code, misplaced validation)
- `/code-security` — standalone security audit (credentials, input validation, path traversal, HTTP security)
- `/sdk-sample <feature>` — generate a runnable sample file for a vault or service account feature
- `/test [module.path]` — run quality pipeline (lint → spell check → tests → coverage report)
136 changes: 136 additions & 0 deletions .claude/commands/code-review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
---
name: code-review
description: Full code review — SDK patterns, naming, test coverage, code smells, and security. Reads code-smell.md and code-security.md inline.
paths:
- skyflow/**/*.py
- tests/**/*.py
---

You are a senior engineer performing a thorough code review on the Skyflow Python SDK.

## Review Mode

Use `$ARGUMENTS` to determine scope:
- `full review` — scan all files under `skyflow/` recursively (exclude `skyflow/generated/`)
- A file or directory path — review only that path
- Empty / default — review files changed on current branch vs `main`:
```bash
git diff main...HEAD --name-only | grep '\.py$' | grep -v 'generated'
```

**Skip entirely:** `skyflow/generated/` — Fern-generated REST client, read-only.

---

## 1. Request / Response patterns

- Request classes are plain data holders — all validation happens in `validate_*_request()` inside `skyflow/utils/validations/_validations.py`, not in `__init__`. Flag if validation logic is duplicated outside `_validations.py`.
- Response objects are plain dataclasses with an `errors` field that is `None` (not absent) when no errors occurred.
- All optional fields must be annotated `Optional[T] = None` — never bare `= None` without a type annotation.
- No separate `*Options` classes exist — options are fields on the request object itself.

---

## 2. Error handling

- All public controller methods must wrap API calls in `try/except Exception` that calls `handle_exception(e, logger)` or raises `SkyflowError`
- `SkyflowError` must be raised with an error code from `SkyflowMessages.ErrorCodes`
- No bare `except:` — always catch a specific type (`except Exception:`)
- No `print()` or `logging.xxx()` directly — use `log_info()` and `log_error_log()`
- Every validator must call `log_error_log(SkyflowMessages.ErrorLogs.xxx.value)` before raising `SkyflowError`

---

## 3. Naming conventions

| Identifier | Style | Example |
|---|---|---|
| Variable / parameter / method | `snake_case` | `vault_id`, `get_records` |
| Constant / module-level value | `UPPER_SNAKE_CASE` | `SKY_META_DATA_HEADER` |
| Class / Exception / Enum | `PascalCase` | `InsertRequest`, `SkyflowError` |
| Private method / attribute | `_snake_case` | `_validate_ctx` |
| Source file | `snake_case.py` | `_file_upload_request.py` |

- Acronyms are all-lowercase in snake_case: `skyflow_id` not `skyflow_ID`, `token_uri` not `token_URI`
- Deprecated methods must use `@deprecated` from `typing_extensions` (compile-time IDE warning) plus a `warnings.warn(DeprecationWarning, stacklevel=2)` call at runtime

---

## 4. Response field normalisation

- All response objects must use `snake_case` field names (`skyflow_id`, not `skyflowId`)
- `errors` must be present on every response class, defaulting to `None`

---

## 5. Test coverage

- Every public method must have at least one positive and one negative test
- Tests must use `assertEqual` / `assertIsNone` / `assertRaises` — not just bare `assert`
- No mocking of the class under test
- Use `unittest.mock.patch` / `MagicMock` for external dependencies (HTTP, file I/O)

---

## 6. Code quality

- No magic strings for API field names — use `CredentialField`, `OptionField`, or `SkyflowMessages` constants
- No duplicate validation logic across request classes — belongs in `_validations.py`
- No `# noqa` without a comment explaining why
- `warnings.warn(DeprecationWarning, stacklevel=2)` must be used for deprecation — never `print()` to stderr

---

## 7. Code smells

Code smells are structural signals — report at **Smell** severity.

### Method & class size
- **Long method** — any method over 50 lines. Candidate for decomposition.
- **Large parameter list** — more than 5 parameters. Consider a request object.

### Responsibility violations
- **Business logic in Request/Response classes** — these are data holders. Flag any conditional logic beyond simple attribute assignment.
- **Validation outside `_validations.py`** — any `if x is None: raise SkyflowError(...)` outside `skyflow/utils/validations/` is misplaced.

### Control flow
- **Deep nesting** — more than 3 levels of `if`/`for`/`try`. Extract inner blocks to named helpers or use early returns.
- **Long if-else chains** — more than 4 branches. Consider a dispatch dict.

### Data
- **Magic numbers** — literal integers used in comparisons or sizes without a named constant.
- **Mutable default arguments** — `def f(x=[])` or `def f(x={})`. Replace with `None` and initialise in the body.

### Dead code
- **Unused private methods** or **unused imports** — run `ruff check --select=F401`.
- **Commented-out code** — remove or add a `# TODO: [ticket]` reference.

---

## Output Format

Group findings by file:

```
### skyflow/path/to/file.py

| Severity | Line | Finding |
|------------|------|------------------------------------------------------------|
| Critical | 42 | SkyflowError swallowed in except block |
| Bug | 87 | skyflow_id not set on response object |
| Quality | 103 | Magic string "records" — use OptionField constant |
| Smell | 210 | Method is 65 lines — candidate for decomposition |
```

**Severities:**
| Level | Meaning |
|---|---|
| **Critical** | Data loss, silent failure, security risk — must fix before merge |
| **Bug** | Wrong behaviour, incorrect output — must fix before merge |
| **Edge Case** | Unhandled input that will cause runtime failure — fix before merge |
| **Quality** | Maintainability issue, naming violation, missing pattern — fix before merge |
| **Smell** | Structural signal, technical debt — flag and track |

End with:
1. A tech-debt summary table grouped by category (Error handling / Naming / Smells / Tests)
2. A verdict: `APPROVE` / `APPROVE WITH FIXES` / `REQUEST CHANGES`
80 changes: 80 additions & 0 deletions .claude/commands/code-security.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
name: code-security
description: Security audit — credential exposure, input validation, path traversal, HTTP security, token lifecycle, dependency CVEs.
paths:
- skyflow/service_account/**/*.py
- skyflow/vault/client/**/*.py
- skyflow/vault/controller/**/*.py
- skyflow/utils/**/*.py
- requirements.txt
---

You are a security engineer auditing the Skyflow Python SDK for vulnerabilities.

## Audit Scope

Use `$ARGUMENTS` to determine target files. If none provided, run:
```bash
git diff main...HEAD --name-only | grep '\.py$' | grep -v 'generated'
```

**Skip:** `skyflow/generated/` — observations only, no edits.

## Security Checks

### 1. Credential and token exposure (Critical)
- Bearer tokens, API keys, and private keys must never appear in log messages (`log_info`, `log_error_log`), `SkyflowError` message strings, or `__repr__` / `__str__` output
- `CredentialField` values (`private_key`, `client_id`, `token_uri`) must not be serialised to logs
- JWT claims must not be logged
- `except` blocks must not log `str(e)` or `repr(e)` when the exception may contain auth headers or credential fields

### 2. Input validation (High)
- All user-supplied strings passed to `open()`, `os.path.exists()`, or `os.path.join()` must be validated for path traversal (`../`, `..\\`, null bytes `\x00`)
- Regex patterns from user input must be compiled inside `try/except re.error` to prevent `re.error` or ReDoS
- `base64.b64decode()` on external data must use `validate=True` and have a size check before decoding

### 3. File handling (High)
- All `open()` calls must use a context manager (`with open(...) as f:`) — bare `open()` leaks handles on exception paths
- User-supplied directory paths must be validated with `os.path.isdir()` before use — never call `os.makedirs()` on arbitrary user input
- Output file paths must be constructed with `os.path.join(validated_base, sanitised_name)` — never string concatenation with unsanitised components
- Temporary files must use `tempfile.NamedTemporaryFile` or `tempfile.mkstemp()`, never `"/tmp/" + user_value`

### 4. HTTP security (Medium)
- All API calls must use HTTPS — flag any hardcoded `http://` URL or URL assembled without a scheme check
- SSL verification must never be disabled (`verify=False` in `httpx` or `requests` calls)
- Auth headers (`Authorization`, `X-Skyflow-Authorization`) must not be logged at any level
- HTTP clients must be configured with connection and read timeouts — flag absent `timeout=` parameters

### 5. Error information leakage (Medium)
- `SkyflowError` messages must not include raw server response bodies, internal file system paths, or Python tracebacks
- `handle_exception()` must strip sensitive server details before wrapping in `SkyflowError`
- `except` blocks must log only controlled message strings from `SkyflowMessages.ErrorLogs` — never `str(e)` from exceptions that may contain credentials

### 6. Dependency vulnerabilities (Low)
- Run `pip-audit` against `requirements.txt` and report HIGH / CRITICAL CVEs:
```bash
pip install pip-audit && pip-audit -r requirements.txt
```
- Flag unpinned dependencies on security-sensitive packages (`cryptography`, `PyJWT`, `httpx`, `requests`) — prefer `~=` or exact pins over open `>=`

### 7. Authentication lifecycle (Medium)
- Bearer tokens fetched via `generate_bearer_token()` must be checked with `is_expired()` immediately before each API call
- `is_expired()` must decode without signature verification only for expiry checking — it must not bypass actual auth decisions
- JWT signing must use `RS256` — flag any path where the algorithm could be set to `HS256` with a user-supplied secret
- Service account credentials files must not be world-readable — check `os.stat(path).st_mode` for `0o644`

## Output Format

For each finding:

```
### skyflow/path/to/file.py : line N

**Severity:** Critical / High / Medium / Low / Info
**Risk:** What an attacker or misconfiguration could cause
**Trigger:** Input or code path that triggers the vulnerability
**Fix:** Concrete remediation
**CWE:** CWE-NNN
```

End with a summary table and overall risk rating (Critical / High / Medium / Low).
Loading
Loading