TelosLabs · moyom96 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026 · Mar 12, 2026
diff --git a/.github/prompts/tech_debt_analysis.md b/.github/prompts/tech_debt_analysis.md
@@ -0,0 +1,89 @@
+# Role
+
+You are a principal-level Ruby on Rails architect performing a semantic tech debt review.
+
+You specialize in detecting structural debt that static linters cannot catch, with particular attention to **AI-generated debt** -- patterns commonly introduced by AI coding agents: duplicated business logic across modules, ghost methods that are defined but never wired into the application, and bypassed Rails conventions (skipping service objects, concerns, or query objects in favor of inline procedural code).
+
+# Input Format
+
+You will receive a JSON object with:
+
+- `candidates`: an array of signals from static analysis tools (dead code detectors, complexity scorers). Each has `file`, `identifier`, `type`, `detail`, and `score`.
+- `code_snippets`: a map of `{ "file_path": "source code contents" }` for the flagged files.
+
+# Task
+
+1. Analyze all candidates and their corresponding source code.
+2. Confirm, reject, or reclassify each candidate. Reject weak or noisy signals.
+3. Discover **new** findings not covered by the candidates -- especially semantic duplication and leaked business logic that only appear when reading the actual code.
+4. Merge candidates that point to the same underlying issue into a single finding.
+5. Return a JSON array of actionable findings. If nothing qualifies, return `[]`.
+
+# Debt Type Definitions
+
+| Type | Definition | When to flag |
+|---|---|---|
+| `fat_controller` | A controller action or class that contains business logic, data transformation, or orchestration that belongs in a service/model. | Action > 15 lines of non-routing logic, or controller class doing work beyond params/auth/render. |
+| `leaked_business_logic` | Business rules living outside the domain layer (in controllers, views, helpers, jobs, or rake tasks). | Calculations, state transitions, validations, or policy checks outside models/services. |
+| `semantic_duplication` | Functionally identical or near-identical logic in two or more locations, even if variable names or structure differ. | Two code paths that achieve the same business outcome (e.g., discount calculation in both OrderService and InvoiceService). |
+| `missing_concern` | Shared behavior across multiple models/controllers that should be extracted into a Rails Concern. | Same callback pattern, scope, or method group copy-pasted across 2+ classes. |
+| `dead_code` | Methods, classes, or modules that are defined but never called or referenced anywhere in the application. | Confirmed by static analysis AND code inspection -- not just unused by one caller. |
+| `high_complexity` | A method with deeply nested conditionals, excessive branching, or a high cyclomatic/flog complexity score. | Flog score above the configured threshold, or clearly unreadable control flow. |
+
+# Severity Criteria
+
+- **high**: Actively causes bugs, blocks feature work, or creates significant maintenance burden. Refactoring is urgent.
+- **medium**: Creates friction or risk but does not block day-to-day work. Should be addressed within 1-2 sprints.
+- **low**: Minor code smell or improvement opportunity. Address opportunistically.
+
+# Scoring
+
+The `score` field is a **numeric impact estimate** (0-100):
+
+- For `high_complexity`: use the flog score directly from the candidate input.
+- For `fat_controller` / `leaked_business_logic`: estimate as lines of misplaced logic.
+- For `semantic_duplication`: estimate as the number of duplicated lines across all locations.
+- For `dead_code`: set to the number of dead lines/methods.
+- For `missing_concern`: estimate as lines of duplicated concern-worthy code.
+
+# canonical_pattern (Semantic Duplication Only)
+
+When `debt_type` is `semantic_duplication`, you MUST provide a `canonical_pattern` -- a stable, descriptive, snake_case slug that identifies the shared behavior independent of file paths or variable names.
+
+Examples:
+- `percentage_based_discount_calculation`
+- `user_role_authorization_check`
+- `date_range_filtering_query`
+- `csv_export_row_formatting`
+
+This slug must be **deterministic**: if the same duplication is detected in a future run (even if files change), the same `canonical_pattern` must be produced. Focus on the *business intent*, not the implementation details.
+
+For all other debt types, set `canonical_pattern` to `null`.
+
+# Output Schema
+
+Return a raw JSON array (no markdown fences, no commentary). Each element:
+
+```json
+{
+  "file_path": "app/controllers/orders_controller.rb",
+  "identifier": "OrdersController#create",
+  "debt_type": "fat_controller",
+  "severity": "high",
+  "title": "OrdersController#create embeds tax calculation logic",
+  "description": "The create action contains 47 lines of tax calculation and discount application that should be extracted to a dedicated TaxCalculator service. This is a common AI-generated pattern where inline logic was preferred over service extraction.",
+  "suggested_refactor": "Extract tax logic to app/services/tax_calculator.rb, call from controller as TaxCalculator.new(order_params).calculate.",
+  "canonical_pattern": null,
+  "score": 47
+}
+```
+
+# Rules
+
+1. **Only reference files and identifiers that appear in the input.** Never fabricate file paths or class names.
+2. **Be strict.** Suppress findings that are marginal, speculative, or would not survive a senior engineer's code review.
+3. **Merge duplicates.** If multiple candidates describe the same underlying problem, emit one finding.
+4. **Prefer Rails conventions.** Suggested refactors should use services, concerns, query objects, form objects, or POROs as appropriate.
+5. **Title must be specific.** Not "Complex method" but "UsersController#update has 6 nested conditionals for role-based field access."
+6. **Description must explain the 'why'.** State the concrete risk or cost, not just the symptom.
+7. **Return `[]` if no findings meet the bar.** An empty array is better than noise.
diff --git a/.github/workflows/ai_tech_debt_scan.yml b/.github/workflows/ai_tech_debt_scan.yml
@@ -0,0 +1,42 @@
+name: AI Tech Debt Scan
+
+on:
+  push:
+    branches:
+      - feature/agent_debt_collector
+  schedule:
+    - cron: "0 6 * * 1"
+  workflow_dispatch:
+    inputs:
+      dry_run:
+        description: "Run in dry-run mode (no issue creation)"
+        required: false
+        default: "false"
+
+permissions:
+  contents: read
+  issues: write
+
+jobs:
+  tech_debt_scan:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup Ruby
+        uses: ruby/setup-ruby@v1
+        with:
+          ruby-version: "3.3"
+          bundler-cache: true
+
+      - name: Run AI tech debt collector
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
+        run: |
+          if [ "${{ github.event.inputs.dry_run }}" = "true" ]; then
+            bundle exec ruby bin/agent_debt_collector --dry-run
+          else
+            bundle exec ruby bin/agent_debt_collector
+          fi
diff --git a/.gitignore b/.gitignore
@@ -4,3 +4,4 @@
 *.gem
 Gemfile.lock
 .DS_Store
+.env
diff --git a/Gemfile b/Gemfile
@@ -8,3 +8,11 @@ group :development do
   gem "rake"
   gem "standard"
 end
+
+group :tech_debt do
+  gem "debride", "~> 1.12"
+  gem "faraday-retry"
+  gem "flog", "~> 4.8"
+  gem "octokit", "~> 9.0"
+  gem "ruby-openai", "~> 7.0"
+end
diff --git a/bin/agent_debt_collector b/bin/agent_debt_collector
@@ -0,0 +1,42 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+
+require "optparse"
+require "json"
+require_relative "../lib/tech_debt/config"
+require_relative "../lib/tech_debt/analyzer"
+
+options = {
+  config_path: "config/tech_debt_settings.yml",
+  prompt_path: ".github/prompts/tech_debt_analysis.md",
+  dry_run: false,
+  skip_llm: false
+}
+
+OptionParser.new do |opts|
+  opts.banner = "Usage: bin/agent_debt_collector [options]"
+
+  opts.on("--config PATH", "Path to tech debt settings YAML") do |value|
+    options[:config_path] = value
+  end
+
+  opts.on("--prompt PATH", "Path to semantic triage prompt markdown") do |value|
+    options[:prompt_path] = value
+  end
+
+  opts.on("--dry-run", "Run analysis but skip GitHub API issue operations") do
+    options[:dry_run] = true
+  end
+
+  opts.on("--skip-llm", "Skip LLM triage and use raw collector findings") do
+    options[:skip_llm] = true
+  end
+end.parse!
+
+config = TechDebt::Config.load(options[:config_path])
+summary = TechDebt::Analyzer.new(config, prompt_path: options[:prompt_path]).run(
+  dry_run: options[:dry_run],
+  skip_llm: options[:skip_llm]
+)
+
+puts JSON.pretty_generate(summary)
diff --git a/config/tech_debt_settings.yml b/config/tech_debt_settings.yml
@@ -0,0 +1,63 @@
+version: 1
+
+llm:
+  provider: "openai"
+  model: "gpt-4.1"
+  api_key_env: "OPENAI_API_KEY"
+  max_tokens: 16384
+  temperature: 0.2
+
+analysis:
+  paths:
+    - "app/controllers/**/*.rb"
+    - "app/models/**/*.rb"
+    - "app/services/**/*.rb"
+    - "app/queries/**/*.rb"
+    - "app/jobs/**/*.rb"
+    - "lib/**/*.rb"
+  exclude_paths:
+    - "vendor/**"
+    - "db/migrate/**"
+    - "node_modules/**"
+  debt_types:
+    fat_controller:
+      enabled: true
+      threshold_lines: 100
+      severity: "high"
+    leaked_business_logic:
+      enabled: true
+      severity: "high"
+    semantic_duplication:
+      enabled: true
+      similarity_threshold: 0.85
+      severity: "medium"
+    dead_code:
+      enabled: true
+      severity: "low"
+    missing_concern:
+      enabled: true
+      severity: "medium"
+    high_complexity:
+      enabled: true
+      flog_threshold: 25
+      severity: "high"
+
+github:
+  repo: null
+  labels:
+    - name: "tech-debt"
+      color: "d93f0b"
+    - name: "ai-detected"
+      color: "7057ff"
+    - name: "severity:high"
+      color: "b60205"
+    - name: "severity:medium"
+      color: "fbca04"
+    - name: "severity:low"
+      color: "0e8a16"
+  issue_prefix: "[Tech Debt]"
+  max_issues_per_run: 10
+
+reporting:
+  generate_summary: true
+  summary_path: "tmp/tech_debt_report.json"
diff --git a/lib/tech_debt/analyzer.rb b/lib/tech_debt/analyzer.rb
@@ -0,0 +1,116 @@
+# frozen_string_literal: true
+
+require "fileutils"
+require "json"
+require_relative "collectors/debride_collector"
+require_relative "collectors/complexity_collector"
+require_relative "github/fingerprint"
+require_relative "github/issue_manager"
+require_relative "semantic/triage"
+
+module TechDebt
+  class Analyzer
+    def initialize(config, prompt_path:)
+      @config = config
+      @prompt_path = prompt_path
+    end
+
+    def run(dry_run: false, skip_llm: false)
+      candidates = collect_candidates
+      findings = if skip_llm
+                   candidates_to_findings(candidates)
+                 else
+                   Semantic::Triage.new(@config, prompt_path: @prompt_path).call(candidates)
+                 end
+
+      summary = process_findings(findings, dry_run: dry_run)
+      write_summary(summary) if @config.reporting["generate_summary"]
+      summary
+    end
+
+    private
+
+    def collect_candidates
+      collectors = [
+        Collectors::DebrideCollector.new(@config),
+        Collectors::ComplexityCollector.new(@config)
+      ]
+
+      collectors.flat_map(&:call)
+    end
+
+    def candidates_to_findings(candidates)
+      severity_map = @config.analysis.fetch("debt_types", {}).transform_values { |v| v["severity"] || "medium" }
+      candidates.map do |item|
+        {
+          "file_path" => item[:file],
+          "identifier" => item[:identifier],
+          "debt_type" => item[:type],
+          "severity" => severity_map.fetch(item[:type], "medium"),
+          "title" => "#{item[:type].tr('_', ' ')} detected for #{item[:identifier]}",
+          "description" => item[:detail],
+          "suggested_refactor" => "Review and refactor this section following Rails conventions.",
+          "canonical_pattern" => nil,
+          "score" => item[:score]
+        }
+      end
+    end
+
+    def process_findings(findings, dry_run:)
+      findings = findings.first(max_issues_per_run)
+      return dry_run_summary(findings) if dry_run
+
+      manager = Github::IssueManager.new(@config)
+      manager.ensure_labels!
+
+      created = []
+      skipped = []
+      findings.each do |item|
+        fingerprint = Github::Fingerprint.for_item(item)
+        if manager.issue_exists_by_fingerprint?(fingerprint)
+          skipped << item.merge("fingerprint" => fingerprint, "reason" => "already_reported")
+          next
+        end
+
+        issue = manager.create_issue(item, fingerprint)
+        created << {
+          "number" => issue.number,
+          "url" => issue.html_url,
+          "title" => issue.title,
+          "fingerprint" => fingerprint
+        }
+      end
+
+      {
+        "mode" => "live",
+        "total_findings" => findings.size,
+        "created_count" => created.size,
+        "skipped_count" => skipped.size,
+        "created" => created,
+        "skipped" => skipped
+      }
+    end
+
+    def dry_run_summary(findings)
+      simulated = findings.map do |item|
+        item.merge("fingerprint" => Github::Fingerprint.for_item(item))
+      end
+      {
+        "mode" => "dry_run",
+        "total_findings" => findings.size,
+        "would_create_count" => simulated.size,
+        "would_create" => simulated
+      }
+    end
+
+    def write_summary(summary)
+      path = @config.reporting.fetch("summary_path", "tmp/tech_debt_report.json")
+      FileUtils.mkdir_p(File.dirname(path))
+      File.write(path, JSON.pretty_generate(summary))
+    end
+
+    def max_issues_per_run
+      @config.github.fetch("max_issues_per_run", 10).to_i
+    end
+  end
+end
diff --git a/lib/tech_debt/collectors/base_collector.rb b/lib/tech_debt/collectors/base_collector.rb
@@ -0,0 +1,17 @@
+# frozen_string_literal: true
+
+module TechDebt
+  module Collectors
+    class BaseCollector
+      attr_reader :config
+
+      def initialize(config)
+        @config = config
+      end
+
+      def call
+        raise NotImplementedError, "#{self.class} must implement #call"
+      end
+    end
+  end
+end
-Original file line number
+Diff line change
@@ Expand Up / @@ -4,3 +4,4 @@ @@
     *.gem
     Gemfile.lock
     .DS_Store
+    .env