Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
32ece2e
[feat] add deterministic duplicate code report CLI
b4prog Jun 25, 2026
a4e1b14
[ci] wrap CodeRabbit workflow script in async IIFE
b4prog Jun 25, 2026
cb9c7e8
[fix] deduplicate resolved explicit source files
b4prog Jun 25, 2026
d998211
[fix] reject overlapping duplicate ranges in the same file
b4prog Jun 25, 2026
d886d25
[docs] document cargo installation from GitHub and local source
b4prog Jun 25, 2026
a61844b
[chore] add clippy lint threshold configuration
b4prog Jun 25, 2026
cbe018a
[docs] document agent verification requirements and local checks
b4prog Jun 25, 2026
945d07c
[refactor] reduce nesting in duplicate and language helpers
b4prog Jun 25, 2026
9417bae
[chore] satisfy stricter clippy lint requirements
b4prog Jun 25, 2026
5e77570
[ci] add Clippy validation to the Rust CI workflow
b4prog Jun 25, 2026
436936f
[ci] enable CodeRabbit request changes approval workflow
b4prog Jun 25, 2026
f9054a0
[ci] enable detailed CodeRabbit reviews and disable poems
b4prog Jun 25, 2026
1ea1269
[test] add coverage for parser, discovery, duplicate, and path edge c…
b4prog Jun 25, 2026
b9ecef2
[test] make duplicate sort fixture fail without sorting
b4prog Jun 25, 2026
38662c8
[feat] add CLI help output for duplicate reports
b4prog Jun 25, 2026
17e24d4
[refactor] rename duplicate mitigation line patterns
b4prog Jun 25, 2026
d6178ea
[feat] add punctuation duplicate mitigation patterns
b4prog Jun 25, 2026
0366065
[fix] use language registry for default duplicate report extensions
b4prog Jun 25, 2026
606268d
[feat] add verbose mode
b4prog Jun 25, 2026
49ff176
[feat] add regex duplicate mitigation patterns
b4prog Jun 25, 2026
e4461c6
[fix] reject double-dash CLI options
b4prog Jun 25, 2026
71d01df
[feat] add git branch duplicate report scanning
b4prog Jun 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
reviews:
request_changes_workflow: true
review_details: true
poem: false
34 changes: 34 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
name: Rust CI

on:
push:
branches:
- main
pull_request:

permissions:
contents: read

jobs:
rust:
name: Build, test, and format
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Install Rust toolchain
run: rustup toolchain install stable --profile minimal --component rustfmt --component clippy

- name: Check formatting
run: cargo fmt --all -- --check

- name: Run Clippy
run: cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity -W clippy::pedantic -W clippy::nursery -W clippy::cargo

- name: Build
run: cargo build --locked --all-targets

- name: Test
run: cargo test --locked --all-targets
79 changes: 79 additions & 0 deletions .github/workflows/coderabbit-review.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: CodeRabbit Review Gate

on:
pull_request_review:
types:
- submitted
- edited
- dismissed

permissions:
contents: read
pull-requests: read

jobs:
coderabbit-review:
name: Validate CodeRabbit review
if: github.event.pull_request.draft == false && github.event.review.user.login == 'coderabbitai[bot]'
runs-on: ubuntu-latest

steps:
- name: Check CodeRabbit review state
env:
GITHUB_TOKEN: ${{ github.token }}
GITHUB_REPOSITORY: ${{ github.repository }}
PR_NUMBER: ${{ github.event.pull_request.number }}
PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }}
run: |
node <<'NODE'
const token = process.env.GITHUB_TOKEN;
const [owner, repo] = process.env.GITHUB_REPOSITORY.split("/");
const prNumber = process.env.PR_NUMBER;
const headSha = process.env.PR_HEAD_SHA;

async function fetchReviews(page = 1, reviews = []) {
const url = `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}/reviews?per_page=100&page=${page}`;
const response = await fetch(url, {
headers: {
Authorization: `Bearer ${token}`,
Accept: "application/vnd.github+json",
"X-GitHub-Api-Version": "2022-11-28",
},
});

if (!response.ok) {
const body = await response.text();
throw new Error(`GitHub review lookup failed: ${response.status} ${body}`);
}

const pageReviews = await response.json();
if (pageReviews.length === 0) {
return reviews;
}
return fetchReviews(page + 1, reviews.concat(pageReviews));
}

(async () => {
const reviews = await fetchReviews();
const codeRabbitReviews = reviews
.filter((review) => review.user?.login === "coderabbitai[bot]")
.filter((review) => review.commit_id === headSha)
.sort((left, right) => new Date(left.submitted_at) - new Date(right.submitted_at));

const latestReview = codeRabbitReviews.at(-1);
if (!latestReview) {
console.error(`CodeRabbit has not submitted a review for ${headSha}.`);
process.exit(1);
}

if (latestReview.state === "CHANGES_REQUESTED") {
console.error("CodeRabbit requested changes on this pull request.");
process.exit(1);
}

console.log(`CodeRabbit review state for ${headSha}: ${latestReview.state}`);
})().catch((error) => {
console.error(error);
process.exit(1);
});
NODE
Comment thread
coderabbitai[bot] marked this conversation as resolved.
19 changes: 19 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Agent Instructions

These instructions apply to code agents working in this repository, including Codex.

## Before finishing a change

Run the repository verification commands from the workspace root and fix any issues before handing work back:

```bash
cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity -W clippy::pedantic -W clippy::nursery -W clippy::cargo
rtk cargo build --locked --all-targets
```

## Notes

- Treat Clippy warnings as errors for generated or edited code.
- Prefer changes that satisfy the repository `clippy.toml` configuration without adding `#[allow(...)]` attributes unless a maintainer explicitly asks for them.
- If a command cannot be run in the current environment, call that out clearly in the handoff.
61 changes: 61 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[package]
name = "codem8"
version = "0.1.0"
edition = "2021"
license = "MIT"
description = "A deterministic source code analysis CLI for duplicate code reports."
repository = "https://github.com/b4prog/CodeM8"
keywords = ["cli", "duplicate-detection", "source-code", "analysis"]
categories = ["command-line-utilities", "development-tools"]

[dependencies]
regex = "1"
xxhash-rust = { version = "0.8", features = ["xxh3"] }
154 changes: 153 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,153 @@
# CodeM8
# CodeM8

CodeM8 is a Rust command-line application for deterministic source code reports.
The initial report detects duplicated line-based code blocks in a repository:

```bash
codem8 --report-duplicate
```

The duplicate report is designed for both human developers and coding agents. It
trims source lines, ignores empty lines, hashes normalized lines with XXH3
128-bit, classifies syntax-only lines as block-only, groups repeated blocks, and
prints a stable plain-text report sorted by duplicate weight.

## Installation

Install `codem8` from the GitHub source with Cargo:

```bash
cargo install --git https://github.com/b4prog/CodeM8 codem8
```

Build from a local checkout with Cargo:

```bash
cargo build --release
```

Install from a local checkout:

```bash
cargo install --path .
```

Run from the local checkout without installing:

```bash
cargo run -- --report-duplicate
```

## Usage

Analyze TypeScript files from the current directory:

```bash
codem8 --report-duplicate
```

Analyze multiple extensions:

```bash
codem8 --report-duplicate -file-extension=ts,tsx,js,jsx
```

Analyze an explicit list of files instead of recursively discovering files:

```bash
codem8 --report-duplicate -file-extension=ts,js -files=src/a.ts,src/b.js
```

Analyze files changed on the current local Git branch compared to the origin
base branch:

```bash
codem8 --report-duplicate -git-branch
```

Include duplicate block metrics:

```bash
codem8 --report-duplicate -verbose
```

## Duplicate Report

By default, CodeM8 analyzes `.ts` files. Recursive discovery skips common
irrelevant directories such as `.git`, `node_modules`, `target`, `dist`,
`build`, `coverage`, `.next`, `.nuxt`, `.svelte-kit`, `.idea`, and `.vscode`.
Symbolic links are not followed.

Every non-empty line is normalized with Rust string trimming, so leading and
trailing Unicode whitespace are removed before hashing and comparison. Empty
trimmed lines are ignored. CodeM8 currently expects UTF-8 source files; invalid
UTF-8 produces a clear error rather than lossy output.

Use `-git-branch` to analyze only files changed on the current local branch
compared to the origin base branch. CodeM8 resolves that base from `origin/HEAD`
with `origin/main` and `origin/master` fallbacks. This includes committed,
staged, unstaged, and untracked files that still exist in the worktree. The
option requires a Git repository and cannot be combined with `-files`.

Duplicate block weight is calculated as:

```text
(occurrences - 1) * duplicated_line_count * cumulative_normalized_character_count
```

Reports are sorted deterministically by descending weight, then by line count,
character count, first location, and normalized block text.

By default, each duplicate block prints the duplicated code before its
locations. Use `-verbose` to also show weight, line count, and occurrence
count. Character counts are used internally for scoring and sorting, but are
not printed.

## Language Heuristics

CodeM8 includes a hard-coded registry of block-only line patterns for common
languages and markup formats:

- TypeScript / JavaScript
- Rust
- C / C++ / Objective-C
- C#
- Java / Kotlin / Scala
- Go
- Python
- Ruby
- PHP
- Swift
- Shell
- PowerShell
- HTML / XML
- CSS / SCSS / Sass / Less
- SQL
- YAML / JSON / TOML

Block-only lines, such as braces or closing tags, cannot start a duplicate by
themselves. They can still be included inside a larger duplicated block when
surrounding comparison lines match.

## Development

Run the full local verification set:

```bash
cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity
rtk cargo build --locked --all-targets
cargo test --all-targets
```

The repository includes GitHub Actions workflows for Rust CI and a CodeRabbit
review gate. CI verifies formatting, build success, and tests on pushes and pull
requests. The CodeRabbit gate runs when CodeRabbit submits or edits a pull
request review and fails if CodeRabbit requests changes on the current PR head.

## Dependency Policy

CodeM8 avoids external packages for functionality that is simple to implement
and maintain directly. The first implementation uses one runtime dependency,
`xxhash-rust`, for the required XXH3 128-bit hash implementation. The crate is
widely used and permissively licensed under MIT or Apache-2.0.
5 changes: 5 additions & 0 deletions clippy.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
too-many-lines-threshold = 80
too-many-arguments-threshold = 5
type-complexity-threshold = 200
excessive-nesting-threshold = 4
cognitive-complexity-threshold = 20
Loading
Loading