Reanalyze: refactor DCE to pure pipeline architecture #8043

cristianoc · 2025-11-29T06:07:05Z

Reanalyze: Pure Pipeline Refactor for Dead Code Analysis

Summary

This PR refactors the reanalyze dead code analysis from a mutation-heavy, order-dependent architecture into a pure pipeline where:

Analysis is a pure function from inputs → results
Global mutable state is eliminated
Side effects (logging) live only at the edges
Processing files in different orders gives identical results
Incremental analysis is now possible - can reprocess one file without redoing everything

Architecture

The DCE analysis is now structured as a 4-phase pure pipeline:

┌──────────────────────────────────────────────────────────────────┐
│  PHASE 1: MAP      │ .cmt files → file_data (per-file builders) │
├──────────────────────────────────────────────────────────────────┤
│  PHASE 2: MERGE    │ file_data list → immutable merged view     │
├──────────────────────────────────────────────────────────────────┤
│  PHASE 3: SOLVE    │ merged view → AnalysisResult.t (issues)    │
├──────────────────────────────────────────────────────────────────┤
│  PHASE 4: REPORT   │ issues → logging (side effects only here)  │
└──────────────────────────────────────────────────────────────────┘

See ARCHITECTURE.md for the full documentation with detailed diagrams.

Key Changes

Global state elimination:

Removed global currentSrc, currentModule, currentModuleName → explicit file_context
Removed global decls, ValueReferences.table, TypeReferences.table → per-file builders
Removed global FileReferences.table → FileDeps module with builder pattern
Removed global config reads → explicit DceConfig.t parameter threading
Removed Common.ml kitchen sink module → dedicated modules

Immutable data structures after AST processing:

FileAnnotations.t - source annotations (@dead, @live)
Declarations.t - all exported declarations
References.t - value/type references
FileDeps.t - cross-file dependencies
OptionalArgsState.t - computed optional arg state
AnalysisResult.t - solver output with issues

Pure solver:

DeadCommon.solveDead takes all inputs explicitly, returns AnalysisResult.t
No side effects in the solver - logging happens only in the report phase
Renamed from reportDead to solveDead to reflect its pure nature

Order-independence verification:

Added -test-shuffle CLI flag for testing
Added make test-reanalyze-order-independence test that verifies shuffled file orders produce identical results

Why This Matters

Incremental updates - When a file changes, we can re-run Phase 1 for just that file, then re-merge and re-solve (pure operations on immutable data)
Testability - Each phase can be tested independently with pure inputs/outputs
Parallelization potential - Phases 1-3 work on immutable data
Easier reasoning - No hidden order dependencies or global state mutations

Documentation

ARCHITECTURE.md - Full architecture documentation with pipeline diagram
DEADCODE_REFACTOR_PLAN.md - Detailed refactor plan with task tracking

pkg-pr-new · 2025-11-29T06:19:17Z

Open in StackBlitz

rescript

npm i https://pkg.pr.new/rescript-lang/rescript@8043

@rescript/darwin-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-arm64@8043

@rescript/darwin-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/darwin-x64@8043

@rescript/linux-arm64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-arm64@8043

@rescript/linux-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/linux-x64@8043

@rescript/runtime

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/runtime@8043

@rescript/win32-x64

npm i https://pkg.pr.new/rescript-lang/rescript/@rescript/win32-x64@8043

commit: 45a4e7d

fhammerschmidt · 2025-11-29T11:10:23Z

I don't see an explicit mention of monorepos or npm/yarn/pnpm workspaces. In this context "dependencies" only refer to other files, but there is still the missing use case when one has shared code like this:

.
├── common
│   ├── package.json
│   ├── rescript.json
│   └── src/
├── mobile
│   ├── package.json
│   ├── rescript.json
│   └── src/
├── web
│   ├── package.json
│   ├── rescript.json
│   └── src/
├── package.json
└── rescript.json

where the root package.json contains a workspace field like so:

"workspaces": [
  "common",
  "mobile",
  "web"
]

and the root rescript.json has

"dependencies": [
  "common",
  "mobile",
  "web",
]

And common/rescript.json is independent of any internal package (may still have external dependencies in a real-world setup):

"dependencies": []

Whereas both web/rescript.json and mobile/rescript.json are dependent on the common package and thus contain:

"dependencies": [
  "common"
]

And we want DCE to mark stuff as dead that is part of common but not used anywhere in the mobile or web package.

This needs to be an option to configure because when e.g. developing a library with tests, I might have tests dependent on the main package, but not 100 % coverage.

The knowledge about package manager monorepos is mostly in rewatch. I imagine if this is all pure and separated now the final result can be recalculated based on a reanalyze includeDependencies: ["common"] option or something.

Also tagging @nojaf and @DZakh who are users of bun and pnpm monorepos.

cristianoc · 2025-11-29T11:56:19Z

I don't see an explicit mention of monorepos or npm/yarn/pnpm workspaces. In this context "dependencies" only refer to other files, but there is still the missing use case when one has shared code like this:
.

├── common

│   ├── package.json

│   ├── rescript.json

│   └── src/

├── mobile

│   ├── package.json

│   ├── rescript.json

│   └── src/

├── web

│   ├── package.json

│   ├── rescript.json

│   └── src/

├── package.json

└── rescript.json
where the root package.json contains a workspace field like so:
"workspaces": [

  "common",

  "mobile",

  "web"

]
and the root rescript.json has
"dependencies": [

  "common",

  "mobile",

  "web",

]
And common/rescript.json is independent of any internal package (may still have external dependencies in a real-world setup):
"dependencies": []
Whereas both web/rescript.json and mobile/rescript.json are dependent on the common package and thus contain:
"dependencies": [

  "common"

]
And we want DCE to mark stuff as dead that is part of common but not used anywhere in the mobile or web package.

This needs to be an option to configure because when e.g. developing a library with tests, I might have tests dependent on the main package, but not 100 % coverage.

The knowledge about package manager monorepos is mostly in rewatch. I imagine if this is all pure and separated now the final result can be recalculated based on a reanalyze includeDependencies: ["common"] option or something.

Also tagging @nojaf and @DZakh who are users of bun and pnpm monorepos.

This is a behavior preserving refactor.
Which btw makes it easier to add features in future.

- Introduce BindingContext in DeadValue to track the current binding location during dead-code traversal, so binding context is explicit and locally encapsulated. - Introduce ReportingContext in DeadCommon to track, per file, the end position of the last reported value when deciding whether to suppress nested warnings. - Replace addValueReference_state with addValueReference ~binding, so value-reference bookkeeping is driven by an explicit binding location rather than a threaded analysis state. - Update dead-code value and exception handling to use the new addValueReference API. - Refresh DEADCODE_REFACTOR_PLAN.md to mark these state-localisation steps as completed and to narrow the remaining follow-up to making the binding context fully pure. - Verified with make test-analysis that behaviour and expected outputs remain unchanged.

- Remove BindingContext module wrapper (was just forwarding to Current) - Remove Current module entirely (unnecessary abstraction) - Simplify to pass Location.t directly instead of record type - Remove unused max_value_pos_end field - Refactor traverseStructure to use pure functional mapper creation - Update DEADCODE_REFACTOR_PLAN.md to mark task 4.3 as complete This eliminates ~40 lines of wrapper code and makes the binding state tracking pure and simpler to understand.

The original plan was too granular with many 'add scaffolding but don't use it yet' tasks. This rewrite focuses on: - Problem-first structure: each task solves a real architectural issue - Combined related changes: no pointless intermediate states - Clear value propositions: why each task matters - Testable success criteria: how we know it worked - Realistic effort estimates Reduces 14 fine-grained tasks down to 10 focused tasks that each leave the codebase measurably better. Signed-off-by: Cursor AI <cursor@cursor.com>

Replace global config reads with explicit ~config parameter threading throughout the DCE analysis pipeline. This makes the analysis pure and testable with different configurations. ## Changes ### New module - DceConfig: Encapsulates DCE configuration (cli + run config) - DceConfig.current() captures global state once - All analysis functions now take explicit ~config parameter ### DCE Analysis (fully pure - no global reads) - DeadCode: threads config to all Dead* modules - DeadValue: replaced ~15 !Cli.debug reads with config.cli.debug - DeadType: replaced ~7 !Cli.debug reads with config.cli.debug - DeadOptionalArgs: takes ~config, passes to Log_.warning - DeadModules: uses config.run.transitive - DeadCommon: threads config through reporting pipeline - WriteDeadAnnotations: uses config.cli.write/json - ProcessDeadAnnotations: uses config.cli.live_names/live_paths ### Logging infrastructure - Log_.warning: now requires ~config (no optional) - Log_.logIssue: now requires ~config (no optional) - Log_.Stats.report: now requires ~config (no optional) - Consistent API - no conditional logic on Some/None ### Non-DCE analyses (call DceConfig.current() at use sites) - Exception: 4 call sites updated - Arnold: 7 call sites updated - TODO: Thread config through these for full purity ### Other - Common.ml: removed unused lineAnnotationStr field - Reanalyze: single DceConfig.current() call at entry point - DEADCODE_REFACTOR_PLAN.md: updated Task 2, added verification task ## Impact ✅ DCE analysis is now pure - takes explicit config, no global reads ✅ All config parameters required (zero 'config option' types) ✅ Can run analysis with different configs without mutating globals ✅ All tests pass - no regressions ## Remaining Work (Task 2) - Thread config through Exception/Arnold to eliminate DceConfig.current() - Verify zero DceConfig.current() calls in analysis code Signed-off-by: Cursor AI <ai@cursor.com> Signed-off-by: Cristiano Calcagno <cristianoc@users.noreply.github.com>

- Replace DceConfig.current() and !Common.Cli.debug with explicit config parameter - Thread config through Arnold.ml functions (Stats, ExtendFunctionTable, CheckExpressionWellFormed, Compile, Eval) - Thread config through Exception.ml functions (Event.combine, Checks.doCheck/doChecks, traverseAst) - Update Reanalyze.ml to pass config to all analysis functions - Improves testability and eliminates global state dependencies

Task 1 of the dead code refactor plan: eliminate global mutable state for current file context. Changes: - Add DeadCommon.FileContext.t with source_path, module_name, is_interface - Thread ~file parameter through DeadCode, DeadValue, DeadType, DeadCommon - Thread ~file through Exception.processCmt and Arnold.processCmt - Remove Common.currentSrc, currentModule, currentModuleName globals Design improvement: - FileContext.module_name is now a raw string (e.g. "ExnB"), not Name.t - Added FileContext.module_name_tagged helper to create Name.t when needed - This avoids confusion: raw name for hashtable keys, tagged name for paths - Previously the interface encoding (+prefix) leaked into code that expected raw names This makes it possible to process files concurrently or out of order, as analysis no longer depends on hidden global state for file context.

- Create AnnotationState module with explicit state type and accessor functions - Thread annotation_state through DeadCode.processCmt and Reanalyze pipeline - Update declIsDead, doReportDead, resolveRecursiveRefs to use explicit state - Update DeadOptionalArgs.check to take explicit state - Remove global positionsAnnotated hashtable from ProcessDeadAnnotations - Remove unused ~config parameter from iterFilesFromRootsToLeaves Note: Current implementation still mixes input (source annotations) with output (analysis results). This will be properly separated in Task 8.

Key design principles added: - Separate per-file input (keyed by filename) from project-wide analysis - Analysis results are immutable - returned by solver, not mutated - Enable incremental updates by replacing one file's data Updated tasks to emphasize: - Per-file state with merge functions for project-wide view - Solver returns AnalysisResult.t instead of mutating input - Task 3 marked partial (input/output mixing fixed in Task 8)

This refactors source annotations to use a clean architectural pattern: 1. **Two types**: FileAnnotations.builder (mutable) and FileAnnotations.t (immutable) 2. **Map phase**: process_cmt_file returns a builder per file 3. **Collect phase**: processCmtFiles collects builders into a list 4. **Merge phase**: merge_all combines all builders into immutable t 5. **Analyze phase**: solver receives immutable t (read-only) Key benefits: - Order independence: builders can be collected in any order - Parallelizable: map phase can run concurrently (future) - Incremental: replace one builder, re-merge (future) - Type-safe: t has no mutation functions in its API New files: - FileAnnotations.ml/.mli: builder/t types with clear API separation - DceFileProcessing.ml/.mli: AST processing returns builder Pattern documented in DEADCODE_REFACTOR_PLAN.md as reusable template for Tasks 4-7 (declarations, references, delayed items, file deps).

Applies the same architectural pattern from Task 3 (annotations) to declarations. ## New modules - Declarations.ml/.mli: builder (mutable) and t (immutable) types - CollectAnnotations.ml/.mli: AST traversal for @dead/@live/@genType annotations ## Key changes 1. **Declarations pattern**: - process_cmt_file creates local Declarations.builder - processCmtFiles collects file_data list (annotations + decls) - Declarations.merge_all combines into immutable t - Solver uses Declarations.t (read-only) 2. **Global state removed**: - Deleted global DeadCommon.decls - All declaration access now through explicit ~decls parameter 3. **AST processing separation**: - CollectAnnotations.ml: annotation traversal (~150 lines) - DceFileProcessing.ml: coordinator (~80 lines, was ~230) 4. **Threading ~decls:builder through AST processing**: - addDeclaration_, addValueDeclaration - DeadValue.processStructure, processSignatureItem, traverseStructure - DeadType.addDeclaration - DeadException.add - DeadOptionalArgs.addFunctionReference ## Data flow process_cmt_file (per-file) ├── CollectAnnotations → FileAnnotations.builder └── DeadValue/Type/Exception → Declarations.builder Merge phase: FileAnnotations.merge_all → FileAnnotations.t Declarations.merge_all → Declarations.t Solver: reportDead ~annotations ~decls (both immutable) ## Benefits - Order independence: files can be processed in any order - Parallelizable: map phase can run concurrently (future) - Incremental: replace one file's builders, re-merge (future) - Type-safe: t types have no mutation functions

Applies the map → list → merge pattern to references and cross-file items. ## Task 5: References New module: References.ml/.mli - builder (mutable) for AST processing - t (immutable) for solver - Tracks both value refs and type refs - PosSet for position sets Changes: - Thread ~refs:References.builder through AST processing - addValueReference, addTypeReference use References API - Solver uses References.find_value_refs, References.find_type_refs - Deleted global ValueReferences.table and TypeReferences.table ## Task 6: CrossFileItems (renamed from DelayedItems) New module: CrossFileItems.ml/.mli - builder (mutable) for AST processing - t (immutable) for processing after merge - Three item types: exception_refs, optional_arg_calls, function_refs Changes: - Thread ~cross_file:CrossFileItems.builder through AST processing - DeadException.markAsUsed adds to cross_file builder - DeadOptionalArgs.addReferences, addFunctionReference add to cross_file - Deleted global delayedItems refs from DeadException and DeadOptionalArgs ## Data flow process_cmt_file (per-file) → file_data { annotations; decls; refs; cross_file } Merge phase: FileAnnotations.merge_all → annotations (immutable) Declarations.merge_all → decls (immutable) CrossFileItems.merge_all → cross_file (immutable) References builders merged into refs_builder Process cross-file items: process_exception_refs → writes to refs_builder process_optional_args → reads decls Freeze: refs_builder → refs (immutable) Solver: reportDead ~annotations ~decls ~refs ## Global state deleted - DeadCommon.ValueReferences.table - DeadCommon.TypeReferences.table - DeadException.delayedItems - DeadOptionalArgs.delayedItems - DeadOptionalArgs.functionReferences ## Naming Renamed DelayedItems → CrossFileItems because it better describes the semantic meaning: items that reference across file boundaries.

Applies the map → list → merge pattern to file dependencies. ## New module: FileDeps.ml/.mli - `builder` (mutable) for AST processing - `t` (immutable) for solver - `add_file`: Register a file as existing - `add_dep`: Add a dependency from one file to another - `merge_all`: Merge all builders into immutable result - `iter_files_from_roots_to_leaves`: Pure topological ordering ## Changes - Thread `~file_deps:FileDeps.builder` through AST processing - `addValueReference` records cross-file dependencies to builder - `process_cmt_file` returns `file_deps` in `file_data` - `reportDead` takes `~file_deps:FileDeps.t` (immutable) - Moved topological sort from DeadCommon to FileDeps (pure function) ## Global state deleted - `Common.FileReferences.table` - replaced by per-file FileDeps builders ## Data flow process_cmt_file (per-file) → file_data { ..., file_deps: builder } Merge phase: FileDeps builders merged for cross-file items Freeze: file_deps_builder → file_deps (immutable) Solver: reportDead ~file_deps (uses iter_files_from_roots_to_leaves)

The `-write` flag that auto-inserted `@dead` annotations into source files was removed as it added significant complexity for a rarely-used feature. ## Deleted - `WriteDeadAnnotations.ml` (156 lines) - `Common.Cli.write` ref - `DceConfig.cli.write` field - `type line` and `type lineAnnotation` in Common.ml - `shouldWriteLineAnnotation` and `lineAnnotation` fields in DeadWarning - `-write` CLI argument - `~config` parameter from `logAdditionalInfo` (now unused) ## Simplified - `emitWarning` no longer computes line annotations - `logAdditionalInfo` no longer needs config parameter - DeadWarning type now just has: deadWarning, path, message ## Rationale The feature: - Added file I/O during analysis (violated pure analysis principles) - Maintained global state (currentFile, currentFileLines refs) - Required threading lineAnnotation through warning system - Was rarely used (most users want to delete dead code, not annotate it) Users who want to suppress warnings can still manually add `@dead` annotations.

This completes Tasks 8 and 8b of the DCE refactor plan, making the analysis phase fully pure - returning an immutable AnalysisResult.t with no inline logging. Key changes: - Add AnalysisResult module with immutable result type and issue constructors - makeDeadIssue: pure function to create dead code issues - Decl.report: returns issue list (includes dead module + dead value issues) - DeadOptionalArgs.check: returns issue list instead of logging - DeadModules.checkModuleDead: returns issue option instead of logging - reportDead: returns AnalysisResult.t, caller logs issues - resolveRecursiveRefs: collects all issues (optional args, incorrect annotations) via ~issues ref instead of logging inline Architecture after this change: merged_view (immutable) │ ▼ reportDead (pure function, no logging) │ ▼ AnalysisResult.t (immutable) │ ▼ report (all side effects here)

OptionalArgs.t is now fully immutable with no mutation of declarations. Key changes: - OptionalArgs.t: removed mutable fields - apply_call: pure function, returns new state - combine_pair: pure function, returns pair of new states - OptionalArgsState module in Common.ml for computed state map - compute_optional_args_state: returns immutable OptionalArgsState.t - DeadOptionalArgs.check: looks up state from map Architecture: - Declaration's optionalArgs = initial state (what args exist) - OptionalArgsState.t = computed state (after all calls/combines) - Solver uses OptionalArgsState.find_opt to get final state This completes the pure analysis pipeline - no mutation anywhere.

Extract focused modules from Common.ml: - Cli.ml: CLI option refs - Pos.ml: Position utilities - PosSet.ml, PosHash.ml: Position collections - StringSet.ml, FileSet.ml, FileHash.ml: String/file collections - DcePath.ml: Dead code path type (renamed from Path to avoid shadowing) - Decl.ml: Declaration types (Kind, t, posAdjustment) - Issue.ml: Issue types (severity, deadWarning, description, etc.) - LocSet.ml: Location set - OptionalArgs.ml, OptionalArgsState.ml: Optional args tracking This eliminates the Common.ml 'kitchen sink' that was causing: - Circular dependency issues - Poor code organization - Difficulty understanding module boundaries Each module now has a single responsibility. Signed-Off-By: Cristiano Calcagno

- Rename reportDead → solveDead to reflect its pure nature - Add -test-shuffle CLI flag to randomize file processing order - Add test-order-independence.sh script (runs 3 shuffled iterations) - Add make test-reanalyze-order-independence target (not in default test) - Add ARCHITECTURE.md with full pipeline diagram and documentation - Mark Task 11 complete in DEADCODE_REFACTOR_PLAN.md

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-12-09T07:12:19Z

analysis/reanalyze/src/DeadOptionalArgs.ml

    let shouldAdd =
-      match PosHash.find_opt decls posTo with
+      match Declarations.find_opt_builder decls posTo with
      | Some {declKind = Value {optionalArgs}} ->
        not (OptionalArgs.isEmpty optionalArgs)
      | _ -> false


Capture optional args across files

Cross-file higher-order calls are no longer tracked for optional argument analysis. addFunctionReference filters on Declarations.find_opt_builder decls posTo, but decls is the per-file builder passed from DeadValue.processValueDependency (value dependencies emitted from other modules). When posTo refers to a function defined in a different .cmt, this lookup always fails, so the cross-file function reference is dropped and optional-argument state is never propagated between modules. As a result, functions with optional args that are passed across module boundaries will keep their initial state and can be incorrectly reported as unused/always-supplied even though other files use them.

Useful? React with 👍 / 👎.

docs(reanalyze): add DCE refactor plan

176d7cc

cristianoc requested review from cknitt and fhammerschmidt November 29, 2025 06:07

cristianoc added 18 commits December 1, 2025 17:13

refactor(reanalyze): thread binding state and remove globals

687ce52

cristianoc changed the title ~~docs(reanalyze): add DCE refactor plan~~ Reanalyze: refactor DCE to pure pipeline architecture Dec 9, 2025

cristianoc marked this pull request as ready for review December 9, 2025 07:06

cristianoc requested a review from zth December 9, 2025 07:06

chatgpt-codex-connector bot reviewed Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reanalyze: refactor DCE to pure pipeline architecture #8043

Reanalyze: refactor DCE to pure pipeline architecture #8043

cristianoc commented Nov 29, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Nov 29, 2025 •

edited

Loading

Uh oh!

fhammerschmidt commented Nov 29, 2025

Uh oh!

cristianoc commented Nov 29, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Reanalyze: refactor DCE to pure pipeline architecture #8043

Are you sure you want to change the base?

Reanalyze: refactor DCE to pure pipeline architecture #8043

Conversation

cristianoc commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reanalyze: Pure Pipeline Refactor for Dead Code Analysis

Summary

Architecture

Key Changes

Why This Matters

Documentation

Uh oh!

pkg-pr-new bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhammerschmidt commented Nov 29, 2025

Uh oh!

cristianoc commented Nov 29, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cristianoc commented Nov 29, 2025 •

edited

Loading

pkg-pr-new bot commented Nov 29, 2025 •

edited

Loading