-
Notifications
You must be signed in to change notification settings - Fork 0
Initialize CodeM8 project #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
32ece2e
[feat] add deterministic duplicate code report CLI
b4prog a4e1b14
[ci] wrap CodeRabbit workflow script in async IIFE
b4prog cb9c7e8
[fix] deduplicate resolved explicit source files
b4prog d998211
[fix] reject overlapping duplicate ranges in the same file
b4prog d886d25
[docs] document cargo installation from GitHub and local source
b4prog a61844b
[chore] add clippy lint threshold configuration
b4prog cbe018a
[docs] document agent verification requirements and local checks
b4prog 945d07c
[refactor] reduce nesting in duplicate and language helpers
b4prog 9417bae
[chore] satisfy stricter clippy lint requirements
b4prog 5e77570
[ci] add Clippy validation to the Rust CI workflow
b4prog 436936f
[ci] enable CodeRabbit request changes approval workflow
b4prog f9054a0
[ci] enable detailed CodeRabbit reviews and disable poems
b4prog 1ea1269
[test] add coverage for parser, discovery, duplicate, and path edge c…
b4prog b9ecef2
[test] make duplicate sort fixture fail without sorting
b4prog 38662c8
[feat] add CLI help output for duplicate reports
b4prog 17e24d4
[refactor] rename duplicate mitigation line patterns
b4prog d6178ea
[feat] add punctuation duplicate mitigation patterns
b4prog 0366065
[fix] use language registry for default duplicate report extensions
b4prog 606268d
[feat] add verbose mode
b4prog 49ff176
[feat] add regex duplicate mitigation patterns
b4prog e4461c6
[fix] reject double-dash CLI options
b4prog 71d01df
[feat] add git branch duplicate report scanning
b4prog File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| reviews: | ||
| request_changes_workflow: true | ||
| review_details: true | ||
| poem: false |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| name: Rust CI | ||
|
|
||
| on: | ||
| push: | ||
| branches: | ||
| - main | ||
| pull_request: | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| jobs: | ||
| rust: | ||
| name: Build, test, and format | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Checkout repository | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Install Rust toolchain | ||
| run: rustup toolchain install stable --profile minimal --component rustfmt --component clippy | ||
|
|
||
| - name: Check formatting | ||
| run: cargo fmt --all -- --check | ||
|
|
||
| - name: Run Clippy | ||
| run: cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity -W clippy::pedantic -W clippy::nursery -W clippy::cargo | ||
|
|
||
| - name: Build | ||
| run: cargo build --locked --all-targets | ||
|
|
||
| - name: Test | ||
| run: cargo test --locked --all-targets |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| name: CodeRabbit Review Gate | ||
|
|
||
| on: | ||
| pull_request_review: | ||
| types: | ||
| - submitted | ||
| - edited | ||
| - dismissed | ||
|
|
||
| permissions: | ||
| contents: read | ||
| pull-requests: read | ||
|
|
||
| jobs: | ||
| coderabbit-review: | ||
| name: Validate CodeRabbit review | ||
| if: github.event.pull_request.draft == false && github.event.review.user.login == 'coderabbitai[bot]' | ||
| runs-on: ubuntu-latest | ||
|
|
||
| steps: | ||
| - name: Check CodeRabbit review state | ||
| env: | ||
| GITHUB_TOKEN: ${{ github.token }} | ||
| GITHUB_REPOSITORY: ${{ github.repository }} | ||
| PR_NUMBER: ${{ github.event.pull_request.number }} | ||
| PR_HEAD_SHA: ${{ github.event.pull_request.head.sha }} | ||
| run: | | ||
| node <<'NODE' | ||
| const token = process.env.GITHUB_TOKEN; | ||
| const [owner, repo] = process.env.GITHUB_REPOSITORY.split("/"); | ||
| const prNumber = process.env.PR_NUMBER; | ||
| const headSha = process.env.PR_HEAD_SHA; | ||
|
|
||
| async function fetchReviews(page = 1, reviews = []) { | ||
| const url = `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}/reviews?per_page=100&page=${page}`; | ||
| const response = await fetch(url, { | ||
| headers: { | ||
| Authorization: `Bearer ${token}`, | ||
| Accept: "application/vnd.github+json", | ||
| "X-GitHub-Api-Version": "2022-11-28", | ||
| }, | ||
| }); | ||
|
|
||
| if (!response.ok) { | ||
| const body = await response.text(); | ||
| throw new Error(`GitHub review lookup failed: ${response.status} ${body}`); | ||
| } | ||
|
|
||
| const pageReviews = await response.json(); | ||
| if (pageReviews.length === 0) { | ||
| return reviews; | ||
| } | ||
| return fetchReviews(page + 1, reviews.concat(pageReviews)); | ||
| } | ||
|
|
||
| (async () => { | ||
| const reviews = await fetchReviews(); | ||
| const codeRabbitReviews = reviews | ||
| .filter((review) => review.user?.login === "coderabbitai[bot]") | ||
| .filter((review) => review.commit_id === headSha) | ||
| .sort((left, right) => new Date(left.submitted_at) - new Date(right.submitted_at)); | ||
|
|
||
| const latestReview = codeRabbitReviews.at(-1); | ||
| if (!latestReview) { | ||
| console.error(`CodeRabbit has not submitted a review for ${headSha}.`); | ||
| process.exit(1); | ||
| } | ||
|
|
||
| if (latestReview.state === "CHANGES_REQUESTED") { | ||
| console.error("CodeRabbit requested changes on this pull request."); | ||
| process.exit(1); | ||
| } | ||
|
|
||
| console.log(`CodeRabbit review state for ${headSha}: ${latestReview.state}`); | ||
| })().catch((error) => { | ||
| console.error(error); | ||
| process.exit(1); | ||
| }); | ||
| NODE | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| # Agent Instructions | ||
|
|
||
| These instructions apply to code agents working in this repository, including Codex. | ||
|
|
||
| ## Before finishing a change | ||
|
|
||
| Run the repository verification commands from the workspace root and fix any issues before handing work back: | ||
|
|
||
| ```bash | ||
| cargo fmt --all -- --check | ||
| cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity -W clippy::pedantic -W clippy::nursery -W clippy::cargo | ||
| rtk cargo build --locked --all-targets | ||
| ``` | ||
|
|
||
| ## Notes | ||
|
|
||
| - Treat Clippy warnings as errors for generated or edited code. | ||
| - Prefer changes that satisfy the repository `clippy.toml` configuration without adding `#[allow(...)]` attributes unless a maintainer explicitly asks for them. | ||
| - If a command cannot be run in the current environment, call that out clearly in the handoff. |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| [package] | ||
| name = "codem8" | ||
| version = "0.1.0" | ||
| edition = "2021" | ||
| license = "MIT" | ||
| description = "A deterministic source code analysis CLI for duplicate code reports." | ||
| repository = "https://github.com/b4prog/CodeM8" | ||
| keywords = ["cli", "duplicate-detection", "source-code", "analysis"] | ||
| categories = ["command-line-utilities", "development-tools"] | ||
|
|
||
| [dependencies] | ||
| regex = "1" | ||
| xxhash-rust = { version = "0.8", features = ["xxh3"] } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1,153 @@ | ||
| # CodeM8 | ||
| # CodeM8 | ||
|
|
||
| CodeM8 is a Rust command-line application for deterministic source code reports. | ||
| The initial report detects duplicated line-based code blocks in a repository: | ||
|
|
||
| ```bash | ||
| codem8 --report-duplicate | ||
| ``` | ||
|
|
||
| The duplicate report is designed for both human developers and coding agents. It | ||
| trims source lines, ignores empty lines, hashes normalized lines with XXH3 | ||
| 128-bit, classifies syntax-only lines as block-only, groups repeated blocks, and | ||
| prints a stable plain-text report sorted by duplicate weight. | ||
|
|
||
| ## Installation | ||
|
|
||
| Install `codem8` from the GitHub source with Cargo: | ||
|
|
||
| ```bash | ||
| cargo install --git https://github.com/b4prog/CodeM8 codem8 | ||
| ``` | ||
|
|
||
| Build from a local checkout with Cargo: | ||
|
|
||
| ```bash | ||
| cargo build --release | ||
| ``` | ||
|
|
||
| Install from a local checkout: | ||
|
|
||
| ```bash | ||
| cargo install --path . | ||
| ``` | ||
|
|
||
| Run from the local checkout without installing: | ||
|
|
||
| ```bash | ||
| cargo run -- --report-duplicate | ||
| ``` | ||
|
|
||
| ## Usage | ||
|
|
||
| Analyze TypeScript files from the current directory: | ||
|
|
||
| ```bash | ||
| codem8 --report-duplicate | ||
| ``` | ||
|
|
||
| Analyze multiple extensions: | ||
|
|
||
| ```bash | ||
| codem8 --report-duplicate -file-extension=ts,tsx,js,jsx | ||
| ``` | ||
|
|
||
| Analyze an explicit list of files instead of recursively discovering files: | ||
|
|
||
| ```bash | ||
| codem8 --report-duplicate -file-extension=ts,js -files=src/a.ts,src/b.js | ||
| ``` | ||
|
|
||
| Analyze files changed on the current local Git branch compared to the origin | ||
| base branch: | ||
|
|
||
| ```bash | ||
| codem8 --report-duplicate -git-branch | ||
| ``` | ||
|
|
||
| Include duplicate block metrics: | ||
|
|
||
| ```bash | ||
| codem8 --report-duplicate -verbose | ||
| ``` | ||
|
|
||
| ## Duplicate Report | ||
|
|
||
| By default, CodeM8 analyzes `.ts` files. Recursive discovery skips common | ||
| irrelevant directories such as `.git`, `node_modules`, `target`, `dist`, | ||
| `build`, `coverage`, `.next`, `.nuxt`, `.svelte-kit`, `.idea`, and `.vscode`. | ||
| Symbolic links are not followed. | ||
|
|
||
| Every non-empty line is normalized with Rust string trimming, so leading and | ||
| trailing Unicode whitespace are removed before hashing and comparison. Empty | ||
| trimmed lines are ignored. CodeM8 currently expects UTF-8 source files; invalid | ||
| UTF-8 produces a clear error rather than lossy output. | ||
|
|
||
| Use `-git-branch` to analyze only files changed on the current local branch | ||
| compared to the origin base branch. CodeM8 resolves that base from `origin/HEAD` | ||
| with `origin/main` and `origin/master` fallbacks. This includes committed, | ||
| staged, unstaged, and untracked files that still exist in the worktree. The | ||
| option requires a Git repository and cannot be combined with `-files`. | ||
|
|
||
| Duplicate block weight is calculated as: | ||
|
|
||
| ```text | ||
| (occurrences - 1) * duplicated_line_count * cumulative_normalized_character_count | ||
| ``` | ||
|
|
||
| Reports are sorted deterministically by descending weight, then by line count, | ||
| character count, first location, and normalized block text. | ||
|
|
||
| By default, each duplicate block prints the duplicated code before its | ||
| locations. Use `-verbose` to also show weight, line count, and occurrence | ||
| count. Character counts are used internally for scoring and sorting, but are | ||
| not printed. | ||
|
|
||
| ## Language Heuristics | ||
|
|
||
| CodeM8 includes a hard-coded registry of block-only line patterns for common | ||
| languages and markup formats: | ||
|
|
||
| - TypeScript / JavaScript | ||
| - Rust | ||
| - C / C++ / Objective-C | ||
| - C# | ||
| - Java / Kotlin / Scala | ||
| - Go | ||
| - Python | ||
| - Ruby | ||
| - PHP | ||
| - Swift | ||
| - Shell | ||
| - PowerShell | ||
| - HTML / XML | ||
| - CSS / SCSS / Sass / Less | ||
| - SQL | ||
| - YAML / JSON / TOML | ||
|
|
||
| Block-only lines, such as braces or closing tags, cannot start a duplicate by | ||
| themselves. They can still be included inside a larger duplicated block when | ||
| surrounding comparison lines match. | ||
|
|
||
| ## Development | ||
|
|
||
| Run the full local verification set: | ||
|
|
||
| ```bash | ||
| cargo fmt --all -- --check | ||
| cargo clippy --workspace --all-targets --all-features -- -D warnings -W clippy::too_many_lines -W clippy::too_many_arguments -W clippy::type_complexity -W clippy::excessive_nesting -W clippy::cognitive_complexity | ||
| rtk cargo build --locked --all-targets | ||
| cargo test --all-targets | ||
| ``` | ||
|
|
||
| The repository includes GitHub Actions workflows for Rust CI and a CodeRabbit | ||
| review gate. CI verifies formatting, build success, and tests on pushes and pull | ||
| requests. The CodeRabbit gate runs when CodeRabbit submits or edits a pull | ||
| request review and fails if CodeRabbit requests changes on the current PR head. | ||
|
|
||
| ## Dependency Policy | ||
|
|
||
| CodeM8 avoids external packages for functionality that is simple to implement | ||
| and maintain directly. The first implementation uses one runtime dependency, | ||
| `xxhash-rust`, for the required XXH3 128-bit hash implementation. The crate is | ||
| widely used and permissively licensed under MIT or Apache-2.0. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| too-many-lines-threshold = 80 | ||
| too-many-arguments-threshold = 5 | ||
| type-complexity-threshold = 200 | ||
| excessive-nesting-threshold = 4 | ||
| cognitive-complexity-threshold = 20 |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.