Skip to content

Commit 005efed

Browse files
SK-2838: add claude setup
1 parent 0d08bba commit 005efed

8 files changed

Lines changed: 841 additions & 0 deletions

File tree

.claude/CLAUDE.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# CLAUDE.md
2+
3+
## Project Overview
4+
5+
This is the Skyflow Python SDK (`skyflow-python`). It provides a Python interface to the Skyflow Data Privacy Vault API — vault operations (insert, get, update, delete, query, tokenize, detokenize, upload_file), service account authentication (bearer tokens, signed data tokens), connections, and detect (deidentify/reidentify text and files).
6+
7+
**Current version:** 2.x
8+
9+
## Critical Boundary — Generated Code
10+
11+
**Never edit files under `skyflow/generated/`.**
12+
13+
These are auto-generated by [Fern](https://buildwithfern.com) from the Skyflow API definition. Manual edits are overwritten on the next generation run. If you find a bug in generated code, report it — do not patch it directly.
14+
15+
The `ruff.toml` and coverage omit lists already exclude `generated/` from all checks.
16+
17+
## Project Structure
18+
19+
```
20+
skyflow/
21+
client/
22+
skyflow.py # Skyflow facade + Builder (entry point)
23+
vault/
24+
client/
25+
client.py # VaultClient — auth, token caching, generated client holder
26+
controller/
27+
_vault.py # Vault: insert, get, update, delete, query, tokenize, detokenize, upload_file
28+
_connections.py # Connection: invoke()
29+
_detect.py # Detect: deidentify_text, reidentify_text, deidentify_file, get_detect_run
30+
data/ # Request/Response objects: InsertRequest, GetResponse, FileUploadRequest, etc.
31+
tokens/ # DetokenizeRequest/Response, TokenizeRequest/Response
32+
connection/ # InvokeConnectionRequest/Response
33+
detect/ # DeidentifyTextRequest, ReidentifyTextRequest, DeidentifyFileRequest, etc.
34+
service_account/
35+
_utils.py # generate_bearer_token, generate_bearer_token_from_creds, generate_signed_data_tokens
36+
auth/ # AuthClient — JWT exchange at tokenURI
37+
utils/
38+
_skyflow_messages.py # All error/log/info strings (Error, ErrorLogs, Info, ErrorCodes, HttpStatus enums)
39+
constants.py # CredentialField, JWT, SdkMetricsKey, OptionField, and top-level constants
40+
_utils.py # handle_exception(), get_metrics(), is_expired()
41+
validations/
42+
_validations.py # ALL request and config validation — validate_*() functions
43+
enums/ # LogLevel, Env, RedactionType, TokenMode, ContentType, DetectEntities, etc.
44+
logger.py # log_info(), log_error_log(), Logger
45+
error/
46+
_skyflow_error.py # SkyflowError(message, http_code, request_id, grpc_code, http_status, details)
47+
generated/ # ← FERN-GENERATED, DO NOT EDIT
48+
rest/ # Raw HTTP client, API classes
49+
tests/ # unittest tests mirroring skyflow/ structure
50+
samples/
51+
vault_api/ # Vault operation samples
52+
service_account/ # Bearer token / signed token samples
53+
connection/ # Connection samples
54+
detect_api/ # Detect samples
55+
docs/
56+
migrate_to_v2.md # v1 → v2 migration guide
57+
advanced_initialization.md
58+
auth_credentials.md
59+
```
60+
61+
## Naming Conventions
62+
63+
- **Methods / variables / parameters:** `snake_case``vault_id`, `get_records`, `token_uri`
64+
- **Classes / Exceptions / Enums:** `PascalCase``InsertRequest`, `SkyflowError`, `RedactionType`
65+
- **Constants / module-level values:** `UPPER_SNAKE_CASE``SKY_META_DATA_HEADER`, `PROTOCOL`
66+
- **Private methods / attributes:** `_snake_case``_validate_ctx`, `_cached_headers`
67+
- **Acronyms are all-lowercase in identifiers:** `skyflow_id` (not `skyflow_ID`), `token_uri` (not `token_URI`), `api_key` (not `API_key`)
68+
- **Response objects:** always use `snake_case` field names — `skyflow_id`, `inserted_fields`, `detokenized_fields`
69+
- **Deprecated methods:** use `@deprecated` from `typing_extensions` for compile-time IDE strikethrough, plus `warnings.warn(msg, DeprecationWarning, stacklevel=2)` for runtime console output
70+
- **Error messages:** always use `SkyflowMessages` enum constants — never hardcode strings in controllers or validators
71+
72+
## Build and Test
73+
74+
```bash
75+
# Install dependencies
76+
pip install -r requirements.txt
77+
pip install ".[dev]" # includes ruff and codespell
78+
79+
# Lint
80+
ruff check . --output-format=github
81+
82+
# Spell check
83+
codespell
84+
85+
# Run all tests with coverage
86+
python -m coverage run --source=skyflow \
87+
--omit=skyflow/generated/*,skyflow/utils/validations/*,skyflow/vault/data/*,skyflow/vault/detect/*,skyflow/vault/tokens/*,skyflow/vault/connection/*,skyflow/error/*,skyflow/utils/enums/*,skyflow/vault/controller/_audit.py,skyflow/vault/controller/_bin_look_up.py \
88+
-m unittest discover
89+
90+
# Coverage report
91+
python -m coverage report --show-missing
92+
93+
# Run a single test
94+
python -m unittest tests.vault.controller.test__vault.TestVault.test_insert
95+
96+
# Build package
97+
python setup.py sdist bdist_wheel
98+
```
99+
100+
**Commit message format:** All commits must include a Jira ticket ID, e.g. `SK-123: description`. CI enforces this on PRs.
101+
102+
## Credentials Format
103+
104+
The SDK accepts credentials as a dict with one of the following key patterns:
105+
106+
```python
107+
# Service account credentials string (JSON)
108+
credentials = {'credentials_string': '{"clientID":"...","tokenURI":"...","keyID":"...","privateKey":"..."}'}
109+
110+
# Service account credentials file path
111+
credentials = {'path': 'credentials.json'}
112+
113+
# API key
114+
credentials = {'api_key': '<YOUR_API_KEY>'}
115+
116+
# Static bearer token
117+
credentials = {'token': '<BEARER_TOKEN>'}
118+
```
119+
120+
The canonical credential JSON field names are `clientID`, `tokenURI`, `keyID`, `privateKey`. These are accessed via `CredentialField` constants in `skyflow/utils/constants.py`.
121+
122+
## Key Design Patterns
123+
124+
### Controller method flow
125+
Every public controller method follows this exact sequence:
126+
1. `log_info(SkyflowMessages.Info.XXX_TRIGGERED.value, logger)`
127+
2. `validate_xxx_request(logger, request)` — raises `SkyflowError` on invalid input
128+
3. `self.__initialize()` — refreshes bearer token if expired
129+
4. Call generated API via `self.__vault_client.get_xxx_api()`
130+
5. Parse response into typed response object
131+
6. `log_info(SkyflowMessages.Info.XXX_SUCCESS.value, logger)`
132+
7. `except Exception as e: handle_exception(e, logger)`
133+
134+
### Validation pattern
135+
All validators in `_validations.py` follow:
136+
1. `log_error_log(SkyflowMessages.ErrorLogs.XXX.value, logger)` — log before raising
137+
2. `raise SkyflowError(SkyflowMessages.Error.XXX.value, SkyflowMessages.ErrorCodes.INVALID_INPUT.value)`
138+
139+
### Credential resolution order
140+
`VaultClient` resolves credentials in this priority order:
141+
1. Config-level credentials (passed to `add_vault_config()`)
142+
2. Skyflow-level credentials (passed to `add_skyflow_credentials()`)
143+
3. `SKYFLOW_CREDENTIALS` environment variable
144+
145+
## Known Pre-existing Coverage Exclusions
146+
147+
These modules are excluded from coverage measurement — omissions here are not regressions:
148+
149+
| Path | Reason |
150+
|---|---|
151+
| `skyflow/generated/*` | Fern-generated REST client |
152+
| `skyflow/utils/validations/*` | Validation-only, tested indirectly via controller tests |
153+
| `skyflow/vault/data/*` | Plain dataclasses, no logic |
154+
| `skyflow/vault/detect/*` | Detect request/response dataclasses |
155+
| `skyflow/vault/tokens/*` | Token request/response dataclasses |
156+
| `skyflow/vault/connection/*` | Connection request/response dataclasses |
157+
| `skyflow/error/*` | Error class, minimal logic |
158+
| `skyflow/utils/enums/*` | Enum definitions only |
159+
| `skyflow/vault/controller/_audit.py` | Audit controller, not yet in test suite |
160+
| `skyflow/vault/controller/_bin_look_up.py` | BIN lookup controller, not yet in test suite |
161+
162+
## Slash Commands
163+
164+
- `/code-review` — full review: SDK patterns + naming + test coverage + code smells + security
165+
- `/code-smell` — standalone structural smell analysis (long methods, dead code, misplaced validation)
166+
- `/code-security` — standalone security audit (credentials, input validation, path traversal, HTTP security)
167+
- `/sdk-sample <feature>` — generate a runnable sample file for a vault or service account feature
168+
- `/test [module.path]` — run quality pipeline (lint → spell check → tests → coverage report)

.claude/commands/code-review.md

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
name: code-review
3+
description: Full code review — SDK patterns, naming, test coverage, code smells, and security. Reads code-smell.md and code-security.md inline.
4+
paths:
5+
- skyflow/**/*.py
6+
- tests/**/*.py
7+
---
8+
9+
You are a senior engineer performing a thorough code review on the Skyflow Python SDK.
10+
11+
## Review Mode
12+
13+
Use `$ARGUMENTS` to determine scope:
14+
- `full review` — scan all files under `skyflow/` recursively (exclude `skyflow/generated/`)
15+
- A file or directory path — review only that path
16+
- Empty / default — review files changed on current branch vs `main`:
17+
```bash
18+
git diff main...HEAD --name-only | grep '\.py$' | grep -v 'generated'
19+
```
20+
21+
**Skip entirely:** `skyflow/generated/` — Fern-generated REST client, read-only.
22+
23+
---
24+
25+
## 1. Request / Response patterns
26+
27+
- Request classes are plain data holders — all validation happens in `validate_*_request()` inside `skyflow/utils/validations/_validations.py`, not in `__init__`. Flag if validation logic is duplicated outside `_validations.py`.
28+
- Response objects are plain dataclasses with an `errors` field that is `None` (not absent) when no errors occurred.
29+
- All optional fields must be annotated `Optional[T] = None` — never bare `= None` without a type annotation.
30+
- No separate `*Options` classes exist — options are fields on the request object itself.
31+
32+
---
33+
34+
## 2. Error handling
35+
36+
- All public controller methods must wrap API calls in `try/except Exception` that calls `handle_exception(e, logger)` or raises `SkyflowError`
37+
- `SkyflowError` must be raised with an error code from `SkyflowMessages.ErrorCodes`
38+
- No bare `except:` — always catch a specific type (`except Exception:`)
39+
- No `print()` or `logging.xxx()` directly — use `log_info()` and `log_error_log()`
40+
- Every validator must call `log_error_log(SkyflowMessages.ErrorLogs.xxx.value)` before raising `SkyflowError`
41+
42+
---
43+
44+
## 3. Naming conventions
45+
46+
| Identifier | Style | Example |
47+
|---|---|---|
48+
| Variable / parameter / method | `snake_case` | `vault_id`, `get_records` |
49+
| Constant / module-level value | `UPPER_SNAKE_CASE` | `SKY_META_DATA_HEADER` |
50+
| Class / Exception / Enum | `PascalCase` | `InsertRequest`, `SkyflowError` |
51+
| Private method / attribute | `_snake_case` | `_validate_ctx` |
52+
| Source file | `snake_case.py` | `_file_upload_request.py` |
53+
54+
- Acronyms are all-lowercase in snake_case: `skyflow_id` not `skyflow_ID`, `token_uri` not `token_URI`
55+
- Deprecated methods must use `@deprecated` from `typing_extensions` (compile-time IDE warning) plus a `warnings.warn(DeprecationWarning, stacklevel=2)` call at runtime
56+
57+
---
58+
59+
## 4. Response field normalisation
60+
61+
- All response objects must use `snake_case` field names (`skyflow_id`, not `skyflowId`)
62+
- `errors` must be present on every response class, defaulting to `None`
63+
64+
---
65+
66+
## 5. Test coverage
67+
68+
- Every public method must have at least one positive and one negative test
69+
- Tests must use `assertEqual` / `assertIsNone` / `assertRaises` — not just bare `assert`
70+
- No mocking of the class under test
71+
- Use `unittest.mock.patch` / `MagicMock` for external dependencies (HTTP, file I/O)
72+
73+
---
74+
75+
## 6. Code quality
76+
77+
- No magic strings for API field names — use `CredentialField`, `OptionField`, or `SkyflowMessages` constants
78+
- No duplicate validation logic across request classes — belongs in `_validations.py`
79+
- No `# noqa` without a comment explaining why
80+
- `warnings.warn(DeprecationWarning, stacklevel=2)` must be used for deprecation — never `print()` to stderr
81+
82+
---
83+
84+
## 7. Code smells
85+
86+
Code smells are structural signals — report at **Smell** severity.
87+
88+
### Method & class size
89+
- **Long method** — any method over 50 lines. Candidate for decomposition.
90+
- **Large parameter list** — more than 5 parameters. Consider a request object.
91+
92+
### Responsibility violations
93+
- **Business logic in Request/Response classes** — these are data holders. Flag any conditional logic beyond simple attribute assignment.
94+
- **Validation outside `_validations.py`** — any `if x is None: raise SkyflowError(...)` outside `skyflow/utils/validations/` is misplaced.
95+
96+
### Control flow
97+
- **Deep nesting** — more than 3 levels of `if`/`for`/`try`. Extract inner blocks to named helpers or use early returns.
98+
- **Long if-else chains** — more than 4 branches. Consider a dispatch dict.
99+
100+
### Data
101+
- **Magic numbers** — literal integers used in comparisons or sizes without a named constant.
102+
- **Mutable default arguments**`def f(x=[])` or `def f(x={})`. Replace with `None` and initialise in the body.
103+
104+
### Dead code
105+
- **Unused private methods** or **unused imports** — run `ruff check --select=F401`.
106+
- **Commented-out code** — remove or add a `# TODO: [ticket]` reference.
107+
108+
---
109+
110+
## Output Format
111+
112+
Group findings by file:
113+
114+
```
115+
### skyflow/path/to/file.py
116+
117+
| Severity | Line | Finding |
118+
|------------|------|------------------------------------------------------------|
119+
| Critical | 42 | SkyflowError swallowed in except block |
120+
| Bug | 87 | skyflow_id not set on response object |
121+
| Quality | 103 | Magic string "records" — use OptionField constant |
122+
| Smell | 210 | Method is 65 lines — candidate for decomposition |
123+
```
124+
125+
**Severities:**
126+
| Level | Meaning |
127+
|---|---|
128+
| **Critical** | Data loss, silent failure, security risk — must fix before merge |
129+
| **Bug** | Wrong behaviour, incorrect output — must fix before merge |
130+
| **Edge Case** | Unhandled input that will cause runtime failure — fix before merge |
131+
| **Quality** | Maintainability issue, naming violation, missing pattern — fix before merge |
132+
| **Smell** | Structural signal, technical debt — flag and track |
133+
134+
End with:
135+
1. A tech-debt summary table grouped by category (Error handling / Naming / Smells / Tests)
136+
2. A verdict: `APPROVE` / `APPROVE WITH FIXES` / `REQUEST CHANGES`

.claude/commands/code-security.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
name: code-security
3+
description: Security audit — credential exposure, input validation, path traversal, HTTP security, token lifecycle, dependency CVEs.
4+
paths:
5+
- skyflow/service_account/**/*.py
6+
- skyflow/vault/client/**/*.py
7+
- skyflow/vault/controller/**/*.py
8+
- skyflow/utils/**/*.py
9+
- requirements.txt
10+
---
11+
12+
You are a security engineer auditing the Skyflow Python SDK for vulnerabilities.
13+
14+
## Audit Scope
15+
16+
Use `$ARGUMENTS` to determine target files. If none provided, run:
17+
```bash
18+
git diff main...HEAD --name-only | grep '\.py$' | grep -v 'generated'
19+
```
20+
21+
**Skip:** `skyflow/generated/` — observations only, no edits.
22+
23+
## Security Checks
24+
25+
### 1. Credential and token exposure (Critical)
26+
- Bearer tokens, API keys, and private keys must never appear in log messages (`log_info`, `log_error_log`), `SkyflowError` message strings, or `__repr__` / `__str__` output
27+
- `CredentialField` values (`private_key`, `client_id`, `token_uri`) must not be serialised to logs
28+
- JWT claims must not be logged
29+
- `except` blocks must not log `str(e)` or `repr(e)` when the exception may contain auth headers or credential fields
30+
31+
### 2. Input validation (High)
32+
- All user-supplied strings passed to `open()`, `os.path.exists()`, or `os.path.join()` must be validated for path traversal (`../`, `..\\`, null bytes `\x00`)
33+
- Regex patterns from user input must be compiled inside `try/except re.error` to prevent `re.error` or ReDoS
34+
- `base64.b64decode()` on external data must use `validate=True` and have a size check before decoding
35+
36+
### 3. File handling (High)
37+
- All `open()` calls must use a context manager (`with open(...) as f:`) — bare `open()` leaks handles on exception paths
38+
- User-supplied directory paths must be validated with `os.path.isdir()` before use — never call `os.makedirs()` on arbitrary user input
39+
- Output file paths must be constructed with `os.path.join(validated_base, sanitised_name)` — never string concatenation with unsanitised components
40+
- Temporary files must use `tempfile.NamedTemporaryFile` or `tempfile.mkstemp()`, never `"/tmp/" + user_value`
41+
42+
### 4. HTTP security (Medium)
43+
- All API calls must use HTTPS — flag any hardcoded `http://` URL or URL assembled without a scheme check
44+
- SSL verification must never be disabled (`verify=False` in `httpx` or `requests` calls)
45+
- Auth headers (`Authorization`, `X-Skyflow-Authorization`) must not be logged at any level
46+
- HTTP clients must be configured with connection and read timeouts — flag absent `timeout=` parameters
47+
48+
### 5. Error information leakage (Medium)
49+
- `SkyflowError` messages must not include raw server response bodies, internal file system paths, or Python tracebacks
50+
- `handle_exception()` must strip sensitive server details before wrapping in `SkyflowError`
51+
- `except` blocks must log only controlled message strings from `SkyflowMessages.ErrorLogs` — never `str(e)` from exceptions that may contain credentials
52+
53+
### 6. Dependency vulnerabilities (Low)
54+
- Run `pip-audit` against `requirements.txt` and report HIGH / CRITICAL CVEs:
55+
```bash
56+
pip install pip-audit && pip-audit -r requirements.txt
57+
```
58+
- Flag unpinned dependencies on security-sensitive packages (`cryptography`, `PyJWT`, `httpx`, `requests`) — prefer `~=` or exact pins over open `>=`
59+
60+
### 7. Authentication lifecycle (Medium)
61+
- Bearer tokens fetched via `generate_bearer_token()` must be checked with `is_expired()` immediately before each API call
62+
- `is_expired()` must decode without signature verification only for expiry checking — it must not bypass actual auth decisions
63+
- JWT signing must use `RS256` — flag any path where the algorithm could be set to `HS256` with a user-supplied secret
64+
- Service account credentials files must not be world-readable — check `os.stat(path).st_mode` for `0o644`
65+
66+
## Output Format
67+
68+
For each finding:
69+
70+
```
71+
### skyflow/path/to/file.py : line N
72+
73+
**Severity:** Critical / High / Medium / Low / Info
74+
**Risk:** What an attacker or misconfiguration could cause
75+
**Trigger:** Input or code path that triggers the vulnerability
76+
**Fix:** Concrete remediation
77+
**CWE:** CWE-NNN
78+
```
79+
80+
End with a summary table and overall risk rating (Critical / High / Medium / Low).

0 commit comments

Comments
 (0)