-
Notifications
You must be signed in to change notification settings - Fork 474
Reanalyze: refactor DCE to pure pipeline architecture #8043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
rescript
@rescript/darwin-arm64
@rescript/darwin-x64
@rescript/linux-arm64
@rescript/linux-x64
@rescript/runtime
@rescript/win32-x64
commit: |
|
I don't see an explicit mention of monorepos or npm/yarn/pnpm workspaces. In this context "dependencies" only refer to other files, but there is still the missing use case when one has shared code like this: where the root "workspaces": [
"common",
"mobile",
"web"
]and the root "dependencies": [
"common",
"mobile",
"web",
]And "dependencies": []Whereas both "dependencies": [
"common"
]And we want DCE to mark stuff as dead that is part of This needs to be an option to configure because when e.g. developing a library with tests, I might have tests dependent on the main package, but not 100 % coverage. The knowledge about package manager monorepos is mostly in Also tagging @nojaf and @DZakh who are users of bun and pnpm monorepos. |
This is a behavior preserving refactor. |
- Introduce BindingContext in DeadValue to track the current binding location during dead-code traversal, so binding context is explicit and locally encapsulated. - Introduce ReportingContext in DeadCommon to track, per file, the end position of the last reported value when deciding whether to suppress nested warnings. - Replace addValueReference_state with addValueReference ~binding, so value-reference bookkeeping is driven by an explicit binding location rather than a threaded analysis state. - Update dead-code value and exception handling to use the new addValueReference API. - Refresh DEADCODE_REFACTOR_PLAN.md to mark these state-localisation steps as completed and to narrow the remaining follow-up to making the binding context fully pure. - Verified with make test-analysis that behaviour and expected outputs remain unchanged.
- Remove BindingContext module wrapper (was just forwarding to Current) - Remove Current module entirely (unnecessary abstraction) - Simplify to pass Location.t directly instead of record type - Remove unused max_value_pos_end field - Refactor traverseStructure to use pure functional mapper creation - Update DEADCODE_REFACTOR_PLAN.md to mark task 4.3 as complete This eliminates ~40 lines of wrapper code and makes the binding state tracking pure and simpler to understand.
The original plan was too granular with many 'add scaffolding but don't use it yet' tasks. This rewrite focuses on: - Problem-first structure: each task solves a real architectural issue - Combined related changes: no pointless intermediate states - Clear value propositions: why each task matters - Testable success criteria: how we know it worked - Realistic effort estimates Reduces 14 fine-grained tasks down to 10 focused tasks that each leave the codebase measurably better. Signed-off-by: Cursor AI <cursor@cursor.com>
Replace global config reads with explicit ~config parameter threading throughout the DCE analysis pipeline. This makes the analysis pure and testable with different configurations. ## Changes ### New module - DceConfig: Encapsulates DCE configuration (cli + run config) - DceConfig.current() captures global state once - All analysis functions now take explicit ~config parameter ### DCE Analysis (fully pure - no global reads) - DeadCode: threads config to all Dead* modules - DeadValue: replaced ~15 !Cli.debug reads with config.cli.debug - DeadType: replaced ~7 !Cli.debug reads with config.cli.debug - DeadOptionalArgs: takes ~config, passes to Log_.warning - DeadModules: uses config.run.transitive - DeadCommon: threads config through reporting pipeline - WriteDeadAnnotations: uses config.cli.write/json - ProcessDeadAnnotations: uses config.cli.live_names/live_paths ### Logging infrastructure - Log_.warning: now requires ~config (no optional) - Log_.logIssue: now requires ~config (no optional) - Log_.Stats.report: now requires ~config (no optional) - Consistent API - no conditional logic on Some/None ### Non-DCE analyses (call DceConfig.current() at use sites) - Exception: 4 call sites updated - Arnold: 7 call sites updated - TODO: Thread config through these for full purity ### Other - Common.ml: removed unused lineAnnotationStr field - Reanalyze: single DceConfig.current() call at entry point - DEADCODE_REFACTOR_PLAN.md: updated Task 2, added verification task ## Impact ✅ DCE analysis is now pure - takes explicit config, no global reads ✅ All config parameters required (zero 'config option' types) ✅ Can run analysis with different configs without mutating globals ✅ All tests pass - no regressions ## Remaining Work (Task 2) - Thread config through Exception/Arnold to eliminate DceConfig.current() - Verify zero DceConfig.current() calls in analysis code Signed-off-by: Cursor AI <ai@cursor.com> Signed-off-by: Cristiano Calcagno <cristianoc@users.noreply.github.com>
- Replace DceConfig.current() and !Common.Cli.debug with explicit config parameter - Thread config through Arnold.ml functions (Stats, ExtendFunctionTable, CheckExpressionWellFormed, Compile, Eval) - Thread config through Exception.ml functions (Event.combine, Checks.doCheck/doChecks, traverseAst) - Update Reanalyze.ml to pass config to all analysis functions - Improves testability and eliminates global state dependencies
Task 1 of the dead code refactor plan: eliminate global mutable state for current file context. Changes: - Add DeadCommon.FileContext.t with source_path, module_name, is_interface - Thread ~file parameter through DeadCode, DeadValue, DeadType, DeadCommon - Thread ~file through Exception.processCmt and Arnold.processCmt - Remove Common.currentSrc, currentModule, currentModuleName globals Design improvement: - FileContext.module_name is now a raw string (e.g. "ExnB"), not Name.t - Added FileContext.module_name_tagged helper to create Name.t when needed - This avoids confusion: raw name for hashtable keys, tagged name for paths - Previously the interface encoding (+prefix) leaked into code that expected raw names This makes it possible to process files concurrently or out of order, as analysis no longer depends on hidden global state for file context.
- Create AnnotationState module with explicit state type and accessor functions - Thread annotation_state through DeadCode.processCmt and Reanalyze pipeline - Update declIsDead, doReportDead, resolveRecursiveRefs to use explicit state - Update DeadOptionalArgs.check to take explicit state - Remove global positionsAnnotated hashtable from ProcessDeadAnnotations - Remove unused ~config parameter from iterFilesFromRootsToLeaves Note: Current implementation still mixes input (source annotations) with output (analysis results). This will be properly separated in Task 8.
Key design principles added: - Separate per-file input (keyed by filename) from project-wide analysis - Analysis results are immutable - returned by solver, not mutated - Enable incremental updates by replacing one file's data Updated tasks to emphasize: - Per-file state with merge functions for project-wide view - Solver returns AnalysisResult.t instead of mutating input - Task 3 marked partial (input/output mixing fixed in Task 8)
This refactors source annotations to use a clean architectural pattern: 1. **Two types**: FileAnnotations.builder (mutable) and FileAnnotations.t (immutable) 2. **Map phase**: process_cmt_file returns a builder per file 3. **Collect phase**: processCmtFiles collects builders into a list 4. **Merge phase**: merge_all combines all builders into immutable t 5. **Analyze phase**: solver receives immutable t (read-only) Key benefits: - Order independence: builders can be collected in any order - Parallelizable: map phase can run concurrently (future) - Incremental: replace one builder, re-merge (future) - Type-safe: t has no mutation functions in its API New files: - FileAnnotations.ml/.mli: builder/t types with clear API separation - DceFileProcessing.ml/.mli: AST processing returns builder Pattern documented in DEADCODE_REFACTOR_PLAN.md as reusable template for Tasks 4-7 (declarations, references, delayed items, file deps).
Applies the same architectural pattern from Task 3 (annotations) to declarations.
## New modules
- Declarations.ml/.mli: builder (mutable) and t (immutable) types
- CollectAnnotations.ml/.mli: AST traversal for @dead/@live/@genType annotations
## Key changes
1. **Declarations pattern**:
- process_cmt_file creates local Declarations.builder
- processCmtFiles collects file_data list (annotations + decls)
- Declarations.merge_all combines into immutable t
- Solver uses Declarations.t (read-only)
2. **Global state removed**:
- Deleted global DeadCommon.decls
- All declaration access now through explicit ~decls parameter
3. **AST processing separation**:
- CollectAnnotations.ml: annotation traversal (~150 lines)
- DceFileProcessing.ml: coordinator (~80 lines, was ~230)
4. **Threading ~decls:builder through AST processing**:
- addDeclaration_, addValueDeclaration
- DeadValue.processStructure, processSignatureItem, traverseStructure
- DeadType.addDeclaration
- DeadException.add
- DeadOptionalArgs.addFunctionReference
## Data flow
process_cmt_file (per-file)
├── CollectAnnotations → FileAnnotations.builder
└── DeadValue/Type/Exception → Declarations.builder
Merge phase:
FileAnnotations.merge_all → FileAnnotations.t
Declarations.merge_all → Declarations.t
Solver:
reportDead ~annotations ~decls (both immutable)
## Benefits
- Order independence: files can be processed in any order
- Parallelizable: map phase can run concurrently (future)
- Incremental: replace one file's builders, re-merge (future)
- Type-safe: t types have no mutation functions
Applies the map → list → merge pattern to references and cross-file items.
## Task 5: References
New module: References.ml/.mli
- builder (mutable) for AST processing
- t (immutable) for solver
- Tracks both value refs and type refs
- PosSet for position sets
Changes:
- Thread ~refs:References.builder through AST processing
- addValueReference, addTypeReference use References API
- Solver uses References.find_value_refs, References.find_type_refs
- Deleted global ValueReferences.table and TypeReferences.table
## Task 6: CrossFileItems (renamed from DelayedItems)
New module: CrossFileItems.ml/.mli
- builder (mutable) for AST processing
- t (immutable) for processing after merge
- Three item types: exception_refs, optional_arg_calls, function_refs
Changes:
- Thread ~cross_file:CrossFileItems.builder through AST processing
- DeadException.markAsUsed adds to cross_file builder
- DeadOptionalArgs.addReferences, addFunctionReference add to cross_file
- Deleted global delayedItems refs from DeadException and DeadOptionalArgs
## Data flow
process_cmt_file (per-file)
→ file_data { annotations; decls; refs; cross_file }
Merge phase:
FileAnnotations.merge_all → annotations (immutable)
Declarations.merge_all → decls (immutable)
CrossFileItems.merge_all → cross_file (immutable)
References builders merged into refs_builder
Process cross-file items:
process_exception_refs → writes to refs_builder
process_optional_args → reads decls
Freeze:
refs_builder → refs (immutable)
Solver:
reportDead ~annotations ~decls ~refs
## Global state deleted
- DeadCommon.ValueReferences.table
- DeadCommon.TypeReferences.table
- DeadException.delayedItems
- DeadOptionalArgs.delayedItems
- DeadOptionalArgs.functionReferences
## Naming
Renamed DelayedItems → CrossFileItems because it better describes
the semantic meaning: items that reference across file boundaries.
Applies the map → list → merge pattern to file dependencies.
## New module: FileDeps.ml/.mli
- `builder` (mutable) for AST processing
- `t` (immutable) for solver
- `add_file`: Register a file as existing
- `add_dep`: Add a dependency from one file to another
- `merge_all`: Merge all builders into immutable result
- `iter_files_from_roots_to_leaves`: Pure topological ordering
## Changes
- Thread `~file_deps:FileDeps.builder` through AST processing
- `addValueReference` records cross-file dependencies to builder
- `process_cmt_file` returns `file_deps` in `file_data`
- `reportDead` takes `~file_deps:FileDeps.t` (immutable)
- Moved topological sort from DeadCommon to FileDeps (pure function)
## Global state deleted
- `Common.FileReferences.table` - replaced by per-file FileDeps builders
## Data flow
process_cmt_file (per-file)
→ file_data { ..., file_deps: builder }
Merge phase:
FileDeps builders merged for cross-file items
Freeze:
file_deps_builder → file_deps (immutable)
Solver:
reportDead ~file_deps (uses iter_files_from_roots_to_leaves)
The `-write` flag that auto-inserted `@dead` annotations into source files was removed as it added significant complexity for a rarely-used feature. ## Deleted - `WriteDeadAnnotations.ml` (156 lines) - `Common.Cli.write` ref - `DceConfig.cli.write` field - `type line` and `type lineAnnotation` in Common.ml - `shouldWriteLineAnnotation` and `lineAnnotation` fields in DeadWarning - `-write` CLI argument - `~config` parameter from `logAdditionalInfo` (now unused) ## Simplified - `emitWarning` no longer computes line annotations - `logAdditionalInfo` no longer needs config parameter - DeadWarning type now just has: deadWarning, path, message ## Rationale The feature: - Added file I/O during analysis (violated pure analysis principles) - Maintained global state (currentFile, currentFileLines refs) - Required threading lineAnnotation through warning system - Was rarely used (most users want to delete dead code, not annotate it) Users who want to suppress warnings can still manually add `@dead` annotations.
This completes Tasks 8 and 8b of the DCE refactor plan, making the analysis
phase fully pure - returning an immutable AnalysisResult.t with no inline logging.
Key changes:
- Add AnalysisResult module with immutable result type and issue constructors
- makeDeadIssue: pure function to create dead code issues
- Decl.report: returns issue list (includes dead module + dead value issues)
- DeadOptionalArgs.check: returns issue list instead of logging
- DeadModules.checkModuleDead: returns issue option instead of logging
- reportDead: returns AnalysisResult.t, caller logs issues
- resolveRecursiveRefs: collects all issues (optional args, incorrect annotations)
via ~issues ref instead of logging inline
Architecture after this change:
merged_view (immutable)
│
▼
reportDead (pure function, no logging)
│
▼
AnalysisResult.t (immutable)
│
▼
report (all side effects here)
OptionalArgs.t is now fully immutable with no mutation of declarations. Key changes: - OptionalArgs.t: removed mutable fields - apply_call: pure function, returns new state - combine_pair: pure function, returns pair of new states - OptionalArgsState module in Common.ml for computed state map - compute_optional_args_state: returns immutable OptionalArgsState.t - DeadOptionalArgs.check: looks up state from map Architecture: - Declaration's optionalArgs = initial state (what args exist) - OptionalArgsState.t = computed state (after all calls/combines) - Solver uses OptionalArgsState.find_opt to get final state This completes the pure analysis pipeline - no mutation anywhere.
Extract focused modules from Common.ml: - Cli.ml: CLI option refs - Pos.ml: Position utilities - PosSet.ml, PosHash.ml: Position collections - StringSet.ml, FileSet.ml, FileHash.ml: String/file collections - DcePath.ml: Dead code path type (renamed from Path to avoid shadowing) - Decl.ml: Declaration types (Kind, t, posAdjustment) - Issue.ml: Issue types (severity, deadWarning, description, etc.) - LocSet.ml: Location set - OptionalArgs.ml, OptionalArgsState.ml: Optional args tracking This eliminates the Common.ml 'kitchen sink' that was causing: - Circular dependency issues - Poor code organization - Difficulty understanding module boundaries Each module now has a single responsibility. Signed-Off-By: Cristiano Calcagno
- Rename reportDead → solveDead to reflect its pure nature - Add -test-shuffle CLI flag to randomize file processing order - Add test-order-independence.sh script (runs 3 shuffled iterations) - Add make test-reanalyze-order-independence target (not in default test) - Add ARCHITECTURE.md with full pipeline diagram and documentation - Mark Task 11 complete in DEADCODE_REFACTOR_PLAN.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let shouldAdd = | ||
| match PosHash.find_opt decls posTo with | ||
| match Declarations.find_opt_builder decls posTo with | ||
| | Some {declKind = Value {optionalArgs}} -> | ||
| not (OptionalArgs.isEmpty optionalArgs) | ||
| | _ -> false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capture optional args across files
Cross-file higher-order calls are no longer tracked for optional argument analysis. addFunctionReference filters on Declarations.find_opt_builder decls posTo, but decls is the per-file builder passed from DeadValue.processValueDependency (value dependencies emitted from other modules). When posTo refers to a function defined in a different .cmt, this lookup always fails, so the cross-file function reference is dropped and optional-argument state is never propagated between modules. As a result, functions with optional args that are passed across module boundaries will keep their initial state and can be incorrectly reported as unused/always-supplied even though other files use them.
Useful? React with 👍 / 👎.
Reanalyze: Pure Pipeline Refactor for Dead Code Analysis
Summary
This PR refactors the reanalyze dead code analysis from a mutation-heavy, order-dependent architecture into a pure pipeline where:
Architecture
The DCE analysis is now structured as a 4-phase pure pipeline:
See
ARCHITECTURE.mdfor the full documentation with detailed diagrams.Key Changes
Global state elimination:
currentSrc,currentModule,currentModuleName→ explicitfile_contextdecls,ValueReferences.table,TypeReferences.table→ per-file buildersFileReferences.table→FileDepsmodule with builder patternDceConfig.tparameter threadingCommon.mlkitchen sink module → dedicated modulesImmutable data structures after AST processing:
FileAnnotations.t- source annotations (@dead,@live)Declarations.t- all exported declarationsReferences.t- value/type referencesFileDeps.t- cross-file dependenciesOptionalArgsState.t- computed optional arg stateAnalysisResult.t- solver output with issuesPure solver:
DeadCommon.solveDeadtakes all inputs explicitly, returnsAnalysisResult.treportDeadtosolveDeadto reflect its pure natureOrder-independence verification:
-test-shuffleCLI flag for testingmake test-reanalyze-order-independencetest that verifies shuffled file orders produce identical resultsWhy This Matters
Incremental updates - When a file changes, we can re-run Phase 1 for just that file, then re-merge and re-solve (pure operations on immutable data)
Testability - Each phase can be tested independently with pure inputs/outputs
Parallelization potential - Phases 1-3 work on immutable data
Easier reasoning - No hidden order dependencies or global state mutations
Documentation
ARCHITECTURE.md- Full architecture documentation with pipeline diagramDEADCODE_REFACTOR_PLAN.md- Detailed refactor plan with task tracking