Skip to content

Commit b3720e1

Browse files
IronAdamantclaude
andcommitted
v0.8.0: Jest test extraction, skip coverage artifacts, lower coupling threshold
- Extract Jest/Mocha/Vitest describe/it/test blocks as code units (test_suite/ test_case types). This unblocks the entire JS/TS test edge pipeline — require() and import dep extraction already worked but was unreachable without test units. - Add coverage/, .next/, .nuxt/ to _SKIP_DIRS to exclude build/test artifacts. - Halve coupling threshold formula: max(2, int(log2(N)/2)+1). Small projects now surface coupling signal (10 commits → threshold 2 vs 4 before). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent ff13f47 commit b3720e1

5 files changed

Lines changed: 43 additions & 22 deletions

File tree

CLAUDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,10 +35,11 @@ chisel/
3535
- **Multi-agent safety**: `project.py` provides: (1) `detect_project_root()` canonicalizes via git common dir so worktrees share identity, (2) `normalize_path()` ensures consistent relative paths, (3) `resolve_storage_dir()` defaults to project-local `.chisel/` (priority: explicit > env > project-local > ~/.chisel/), (4) `ProcessLock` for cross-process coordination — shared locks for reads, exclusive for writes. Cross-platform: `fcntl.flock` on Unix, `LockFileEx` on Windows.
3636
- **SQLite concurrency**: 30s `busy_timeout` + exponential-backoff retry on `_execute` for cross-process SQLITE_BUSY.
3737
- **Ownership vs Reviewers**: `ownership` = blame-based (who wrote the code, `role: "original_author"`). `who_reviews` = commit-activity-based (who maintains it, `role: "suggested_reviewer"`).
38-
- **Shared constants**: `_SKIP_DIRS` and `_EXTENSION_MAP` live in `ast_utils.py`. `_CODE_EXTENSIONS` in `engine.py` is derived from `_EXTENSION_MAP`.
38+
- **Shared constants**: `_SKIP_DIRS` and `_EXTENSION_MAP` live in `ast_utils.py`. `_CODE_EXTENSIONS` in `engine.py` is derived from `_EXTENSION_MAP`. `_SKIP_DIRS` includes `coverage`, `.next`, `.nuxt` to exclude build/test output artifacts.
3939
- **Shared dispatch**: `dispatch_tool()` in `mcp_server.py` is used by both HTTP and stdio servers. Tool schemas and dispatch tables live in `schemas.py`.
4040
- **Edge weighting**: Test edges carry a weight (0.4-1.0) based on file proximity. Python import-path matching (`from myapp.utils import foo``myapp/utils.py:foo`) takes priority over name-only matching. `_compute_proximity_weight()` and `_matches_import_path()` in `test_mapper.py`.
4141
- **AST regex improvements**: C#/Java support nested generics `<A<B>>` and annotations/attributes `@Override`/`[Test]`. Kotlin supports extension functions `fun String.foo()`. C++ supports template functions and destructors `~Foo()`. Swift supports `@objc`-style attributes. Dart supports factory constructors and getters/setters.
42+
- **Jest/Mocha/Vitest test block extraction**: `_JS_JEST_BLOCK_RE` in `ast_utils.py` matches `describe('name', ...)`, `it('name', ...)`, `test('name', ...)` (plus `.only`/`.skip`/`.todo` modifiers) as code units with `unit_type` "test_suite" or "test_case". `_TEST_UNIT_TYPES` in `test_mapper.py` ensures these are recognized as test units regardless of `_is_test_name()`. This enables test edge building for JS/TS projects — the `require()`/`import` dep extraction already worked but was unreachable without test units.
4243
- **Pluggable extractors**: `register_extractor(lang, fn)` in `ast_utils.py` lets users override built-in regex extractors with tree-sitter or LSP-backed ones. `_custom_extractors` checked before `_EXTRACTORS` in `extract_code_units()`. Zero-dep — the registry is just hooks.
4344
- **Batch SQL queries**: `storage.py` provides `get_*_batch()` methods for edges, code units, co-changes, churn, and blame. `impact.get_risk_map()` uses these to compute all risk scores in ~5 queries total instead of N*5. `_chunked()` helper splits large batches to stay under SQLite's variable limit.
4445
- **Process-level read locks**: All read tool methods in `engine.py` acquire `_process_lock.shared()` (outer) + `lock.read_lock()` (inner). Writes acquire `_process_lock.exclusive()` + `lock.write_lock()`. This allows concurrent reads from multiple processes while blocking during writes.
@@ -91,4 +92,4 @@ Each wired through: engine.tool_*() → CLI subcommand, HTTP POST /call, stdio M
9192
- **`stats`**: Returns summary counts for all database tables plus `coupling_threshold` (when commits > 0) so LLM agents can diagnose coupling=0.0 results.
9293
- **`triage`**: Combined risk_map + test_gaps + stale_tests for top-N riskiest files. Single command for pre-audit/refactor prioritization. Returns `{top_risk_files, test_gaps, stale_tests, summary}`.
9394
- **`limit` parameter**: All list-returning tools accept `limit` to cap result size. Also applies to dict-wrapped responses with a `files` key (e.g. `risk_map`).
94-
- **Adaptive coupling threshold**: `max(3, int(log2(commits)) + 1)`logarithmic scaling. Previous `commits // 4` was too aggressive (400 commits → 100 threshold, killing all signal). New formula: 10→4, 50→6, 200→8, 1000→11, 10000→14. Defined in `_coupling_threshold()` in `engine.py`.
95+
- **Adaptive coupling threshold**: `max(2, int(log2(commits) / 2) + 1)`half-log scaling. Previous `max(3, int(log2(N)) + 1)` was too aggressive for small/medium projects (10 commits → threshold 4, killing all signal when max co-change is 2-3). Half-log with floor of 2 surfaces early coupling signal: 10→2, 50→3, 100→4, 200→4, 1000→5, 10000→7. Defined in `_coupling_threshold()` in `engine.py`.

chisel/ast_utils.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
".git", "node_modules", "__pycache__", ".tox", ".venv", "venv",
2121
"env", ".mypy_cache", ".pytest_cache", ".ruff_cache", "dist",
2222
"build", ".eggs", "target", "vendor", "Pods",
23+
"coverage", ".next", ".nuxt",
2324
}
2425

2526

@@ -319,6 +320,10 @@ def _py_block_end(lines: list[str], start_idx: int, indent: int) -> int:
319320
r"^\s*(?:export\s+)?(?:const|let|var)\s+(?P<name>[A-Za-z_$]\w*)"
320321
r"\s*=\s*(?:async\s+)?(?:\([^)]*\)|[A-Za-z_$]\w*)\s*=>",
321322
)
323+
# Jest / Mocha / Vitest test block calls: describe('name', ...), it('name', ...), test('name', ...)
324+
_JS_JEST_BLOCK_RE = re.compile(
325+
r"""^\s*(?P<keyword>describe|it|test)(?:\.(?:only|skip|todo))?\s*\(\s*(?P<q>['"`])(?P<name>.*?)(?P=q)""",
326+
)
322327

323328
# ---------------------------------------------------------------------------
324329
# Go
@@ -512,10 +517,19 @@ def _name_kind(m):
512517
return m.group("name"), m.group("kind")
513518

514519

520+
def _jest_block_type(m):
521+
"""Extract (name, type) from a Jest/Mocha/Vitest test block match."""
522+
name = m.group("name")
523+
if m.group("keyword") == "describe":
524+
return name, "test_suite"
525+
return name, "test_case"
526+
527+
515528
_JS_TS_PATTERNS = [
516529
(_JS_NAMED_FUNC_RE, "function"),
517530
(_JS_CLASS_RE, "class"),
518531
(_JS_ARROW_RE, "function"),
532+
(_JS_JEST_BLOCK_RE, _jest_block_type),
519533
]
520534

521535
_GO_PATTERNS = [

chisel/engine.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,16 +17,17 @@
1717
_CODE_EXTENSIONS = frozenset(_EXTENSION_MAP)
1818

1919
def _coupling_threshold(commit_count):
20-
"""Adaptive co-change threshold with logarithmic scaling.
20+
"""Adaptive co-change threshold with half-log scaling.
2121
22-
Previous formula ``max(3, commits // 4)`` was too aggressive — a project
23-
with 400 commits required 100 co-commits, filtering out all signal.
24-
Logarithmic scaling keeps the noise floor reasonable at any size:
25-
10 → 4, 50 → 6, 200 → 8, 1000 → 11, 10000 → 14.
22+
Previous ``max(3, int(log2(N)) + 1)`` was too aggressive for small/medium
23+
projects — 10 commits → threshold 4, killing all signal when max co-change
24+
is typically 2-3. Half-log with a floor of 2 surfaces early coupling
25+
signal while still scaling to filter noise in large repos:
26+
10 → 2, 50 → 3, 100 → 4, 200 → 4, 1000 → 5, 10000 → 7.
2627
"""
2728
if commit_count <= 0:
28-
return 3
29-
return max(3, int(math.log2(commit_count)) + 1)
29+
return 2
30+
return max(2, int(math.log2(commit_count) / 2) + 1)
3031

3132

3233
def _diagnose_uniform(comp, value, stats):

chisel/test_mapper.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@
3939
]
4040

4141

42+
# Unit types that are inherently test constructs (from Jest/Mocha/Vitest block extraction).
43+
_TEST_UNIT_TYPES = frozenset({"test_suite", "test_case"})
44+
45+
4246
class TestMapper:
4347
"""Discovers test files, parses them, extracts dependencies, builds edges."""
4448

@@ -119,7 +123,7 @@ def parse_test_file(self, file_path):
119123

120124
test_units = []
121125
for unit in units:
122-
if _is_test_name(unit.name, framework):
126+
if unit.unit_type in _TEST_UNIT_TYPES or _is_test_name(unit.name, framework):
123127
tid = f"{rel_path}:{unit.name}"
124128
test_units.append({
125129
"id": tid,

tests/test_engine.py

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ def test_tool_stats(self, engine):
262262
if result["commits"] > 0:
263263
assert "coupling_threshold" in result
264264
import math
265-
expected = max(3, int(math.log2(result["commits"])) + 1)
265+
expected = max(2, int(math.log2(result["commits"]) / 2) + 1)
266266
assert result["coupling_threshold"] == expected
267267

268268

@@ -639,18 +639,19 @@ def test_record_result_acquires_exclusive_lock(self, engine):
639639

640640
class TestCouplingThreshold:
641641
def test_minimum_floor(self):
642-
assert _coupling_threshold(0) == 3
643-
assert _coupling_threshold(1) == 3
644-
assert _coupling_threshold(4) == 3
642+
assert _coupling_threshold(0) == 2
643+
assert _coupling_threshold(1) == 2
644+
assert _coupling_threshold(4) == 2
645645

646-
def test_logarithmic_scaling(self):
647-
assert _coupling_threshold(10) == 4 # log2(10)=3.3 → 3+1=4
648-
assert _coupling_threshold(50) == 6 # log2(50)=5.6 → 5+1=6
649-
assert _coupling_threshold(200) == 8 # log2(200)=7.6 → 7+1=8
650-
assert _coupling_threshold(1000) == 10 # log2(1000)=9.9 → 9+1=10
646+
def test_half_log_scaling(self):
647+
# Half-log: int(log2(N)/2) + 1, floor 2
648+
assert _coupling_threshold(10) == 2 # log2(10)/2=1.66 → 1+1=2
649+
assert _coupling_threshold(50) == 3 # log2(50)/2=2.82 → 2+1=3
650+
assert _coupling_threshold(200) == 4 # log2(200)/2=3.82 → 3+1=4
651+
assert _coupling_threshold(1000) == 5 # log2(1000)/2=4.98 → 4+1=5
651652

652653
def test_large_repos_reasonable(self):
653-
# At 10k commits, threshold should be ~14, not 2500
654+
# At 10k commits, threshold should be ~7, not 2500
654655
threshold = _coupling_threshold(10000)
655-
assert threshold <= 15
656-
assert threshold >= 10
656+
assert threshold <= 8
657+
assert threshold >= 5

0 commit comments

Comments
 (0)