Skip to content

Commit c9f2f2a

Browse files
authored
Merge pull request #8 from khnumdev/copilot/simplify-config-iteration
Simplify config documentation and clarify default iteration behavior
2 parents f3a2e08 + 94e87e9 commit c9f2f2a

6 files changed

Lines changed: 54 additions & 53 deletions

File tree

README.md

Lines changed: 32 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -83,40 +83,35 @@ brew install python@3.12
8383

8484
## Configuration
8585

86-
- Create a local `config.yaml` in your working directory. It is gitignored and not included in the repo.
87-
- Any CLI flag overrides values from `config.yaml`.
88-
- If neither config nor flags provide a value, the tool falls back to environment variables (for emulator detection) or sensible defaults.
86+
Create an optional `config.yaml` in your working directory to customize behavior. **By default, all commands iterate over all namespaces and all kinds** unless you specify filters.
8987

90-
Example `config.yaml` (full example with comments):
88+
### Minimal Example
9189

9290
```yaml
93-
# Project / environment
94-
project_id: "my-project" # (string) GCP project id. If omitted, ADC or DATASTORE_PROJECT_ID env var will be used.
95-
emulator_host: "localhost:8010" # (string) Datastore emulator host (host:port). If set, the emulator path is used.
91+
# Optional: specify project and emulator
92+
project_id: "my-project"
93+
emulator_host: "localhost:8010"
94+
```
9695
97-
# Explicit filters (empty -> iterate all)
98-
namespaces: [""] # (list) Namespaces to include. [""] means include default namespace and allow discovery of others.
99-
kinds: [] # (list) Kinds to include. Empty/omit means discover all kinds per namespace.
96+
### Common Options
10097
101-
# Defaults used by some commands (optional)
102-
kind: "" # (string) Default kind used by analyze-fields when CLI --kind is not provided.
103-
namespace: "" # (string) Default namespace used when CLI --namespace is omitted.
98+
```yaml
99+
# Optional filters (omit to process all namespaces and kinds)
100+
namespaces: ["custom-ns"] # List specific namespaces, or omit to process all
101+
kinds: ["MyKind"] # List specific kinds, or omit to process all
104102

105103
# Cleanup settings
106-
ttl_field: "expireAt" # (string) Property name that contains the TTL/expiry timestamp.
107-
delete_missing_ttl: true # (bool) If true, entities missing the TTL field will be deleted by cleanup.
108-
batch_size: 500 # (int) Number of keys to delete per batch when running cleanup (tunable).
104+
ttl_field: "expireAt" # Field name containing expiry timestamp
105+
batch_size: 500 # Delete batch size
109106

110107
# Analysis settings
111-
group_by_field: null # (string|null) Field name to group analysis by (e.g., batchId). Null means no grouping.
112-
sample_size: 500 # (int) Max entities to sample per-kind/per-group to bound analysis work. Set 0 or null to disable sampling.
113-
enable_parallel: true # (bool) Enable multi-threaded processing for analysis and deletion. Set false to force single-threaded.
114-
115-
# Logging
116-
log_level: "INFO" # (string) Logging level (DEBUG, INFO, WARNING, ERROR).
108+
sample_size: 500 # Max entities to sample per analysis (0 = no limit)
117109
```
118110
119-
The keys above map directly to CLI flags (CLI flags override values in `config.yaml`). Omit any option to use sensible defaults.
111+
**Notes:**
112+
- CLI flags always override config values
113+
- If no config is provided, sensible defaults are used
114+
- Environment variables `DATASTORE_PROJECT_ID` and `DATASTORE_EMULATOR_HOST` are also supported
120115

121116
## Quickstart
122117

@@ -148,20 +143,21 @@ Use these targets to get a working dev environment quickly.
148143

149144
### Basic CLI examples
150145
```bash
151-
# list kinds (scans stats or samples)
152-
python3 cli.py analyze-kinds --project my-project
146+
# Analyze all kinds in all namespaces (default behavior)
147+
lsu analyze-kinds
153148
154-
# analyze fields for a kind
155-
python3 cli.py analyze-fields --kind MyKind --group-by batchId
149+
# Analyze specific kind across all namespaces
150+
lsu analyze-fields --kind MyKind
156151
157-
# dry-run cleanup sample
158-
python3 cli.py cleanup --ttl-field expireAt --dry-run
159-
```
152+
# Analyze with grouping
153+
lsu analyze-fields --kind MyKind --group-by batchId
160154
161-
### Configuration
155+
# Dry-run cleanup for all kinds and namespaces
156+
lsu cleanup --dry-run
162157
163-
- Local `config.yaml` is supported; CLI flags override config values.
164-
- Example keys: `project_id`, `emulator_host`, `namespaces`, `kinds`, `kind`, `ttl_field`, `batch_size`, `sample_size`, `enable_parallel`.
158+
# Filter to specific namespace and kind
159+
lsu cleanup --kind MyKind --namespace custom-ns --dry-run
160+
```
165161

166162
### Emulator & integration testing
167163

@@ -198,7 +194,6 @@ The release workflow selects the appropriate token based on the `publish_target`
198194

199195
## Notes
200196

201-
- `sample_size` bounds per-kind/group analysis to avoid scanning entire datasets. Set to 0 or `null` in config to disable sampling.
202-
- `enable_parallel` (default true) enables multi-threaded processing during analysis and deletion; set to false to force single-threaded behavior.
203-
204-
If you'd like a short walkthrough or to change the default Makefile targets, tell me what you'd prefer and I can adjust the README or Makefile.
197+
- **By default, all commands iterate over all namespaces and all kinds** unless you specify filters via config or CLI flags
198+
- `sample_size` bounds per-kind analysis to avoid scanning entire large datasets (set to 0 to disable)
199+
- Multi-threaded processing is enabled by default for better performance

commands/analyze_entity_fields.py

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ def _process_entity(e: datastore.Entity):
5656
if enable_parallel:
5757
from concurrent.futures import ThreadPoolExecutor, as_completed
5858

59-
results_iter = []
6059
with ThreadPoolExecutor(max_workers=8) as exe:
6160
futures = {exe.submit(_process_entity, e): e for e in ents}
6261
for fut in tqdm(as_completed(futures), total=len(futures), desc="Analyzing field contributions", unit="entity"):
@@ -162,18 +161,14 @@ def analyze_field_contributions(
162161
sample_size = getattr(config, "sample_size", 500)
163162
enable_parallel = getattr(config, "enable_parallel", True)
164163

165-
# If no namespace provided, or config.namespaces is None/empty, iterate all namespaces
164+
# If no namespace provided, iterate all namespaces
166165
if namespace is None:
167-
if hasattr(config, "namespaces") and (not config.namespaces):
168-
ns_list = list_namespaces(client)
169-
else:
170-
ns_list = [namespace] if namespace else list_namespaces(client)
166+
ns_list = list_namespaces(client)
171167
results: Dict[str, Dict] = {}
172168
for ns in ns_list:
173169
results[ns or ""] = _analyze_single_namespace(
174170
client, kind=kind, namespace=ns, group_by_field=group_by_field, only_fields=only_fields, sample_size=sample_size
175171
)
176-
177172
return {"by_namespace": results}
178173

179174
# Single namespace

commands/analyze_kinds.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
from __future__ import annotations
22

33
import logging
4-
from typing import Dict, List, Optional, Tuple
4+
from typing import Dict, List
55

6-
from google.cloud import datastore
76
from google.cloud.datastore.helpers import entity_to_protobuf
87

98
from .config import (
@@ -17,7 +16,7 @@
1716
logger = logging.getLogger(__name__)
1817

1918

20-
def get_kind_stats(client, kind: str, namespace: Optional[str] = None) -> Tuple[Optional[int], Optional[int]]:
19+
def get_kind_stats(client, kind: str, namespace: str | None = None) -> tuple[int | None, int | None]:
2120
"""
2221
Returns (count, bytes) for the given kind/namespace using Datastore statistics.
2322
Falls back to None if not found.
@@ -39,7 +38,7 @@ def get_kind_stats(client, kind: str, namespace: Optional[str] = None) -> Tuple[
3938
return None, None
4039

4140

42-
def estimate_entity_count_and_size(client, kind: str, namespace: Optional[str], sample_size: int = 100) -> Tuple[int, int]:
41+
def estimate_entity_count_and_size(client, kind: str, namespace: str | None, sample_size: int = 100) -> tuple[int, int]:
4342
"""
4443
Original keys-only method: exact count, approximate bytes via sampling.
4544
"""
@@ -65,7 +64,7 @@ def estimate_entity_count_and_size(client, kind: str, namespace: Optional[str],
6564
return total_count, int(avg_size * total_count)
6665

6766

68-
def analyze_kinds(config: AppConfig, method: Optional[str] = None) -> List[Dict]:
67+
def analyze_kinds(config: AppConfig, method: str | None = None) -> List[Dict]:
6968
"""
7069
Analyze kinds using either:
7170
- 'stats' (default) => fast built-in Datastore statistics

commands/cleanup_expired.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import logging
44
from datetime import datetime, timezone
5-
from typing import Dict, List, Optional
5+
from typing import Dict, List
66

77
from google.cloud import datastore
88

commands/config.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,10 @@ def load_config(path: Optional[str] = None, overrides: Optional[Dict] = None) ->
6969
config.namespaces = _as_list(merged.get("namespaces"))
7070
config.kinds = _as_list(merged.get("kinds"))
7171

72-
# Normalise: treat [""] as empty
73-
if config.namespaces == [""] or config.namespaces is None:
72+
# Normalise: treat [""] as empty (meaning "iterate all")
73+
if config.namespaces == [""]:
7474
config.namespaces = []
75-
if config.kinds == [""] or config.kinds is None:
75+
if config.kinds == [""]:
7676
config.kinds = []
7777

7878
# Optional defaults used by some commands

tests/test_config.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,18 @@ def test_load_config_normalizes_namespaces(tmp_path: tempfile.TemporaryDirectory
1717
assert cfg.kinds == []
1818

1919

20+
def test_empty_lists_mean_iterate_all():
21+
"""Empty namespaces and kinds lists should mean 'iterate all'."""
22+
cfg = AppConfig()
23+
# Default config should have empty lists
24+
assert cfg.namespaces == []
25+
assert cfg.kinds == []
26+
27+
# Empty lists evaluate to False, triggering "iterate all" logic
28+
assert not cfg.namespaces
29+
assert not cfg.kinds
30+
31+
2032
def test_format_size_small_and_large():
2133
assert format_size(512) == "512.00 B"
2234
assert format_size(1024) == "1.00 KB"

0 commit comments

Comments
 (0)