Thanks for your interest in making LLM detection better! 🎲
- Fork the repo
- Clone your fork
- Make your changes
- Test against a few repos (see Testing)
- Open a PR
This is the most impactful way to contribute. Here's how:
Before writing code, answer these questions:
- What does it measure? (e.g. "comment density per file")
- Why does it indicate LLM usage? (e.g. "LLMs over-comment; humans are lazier")
- What's the scoring range? (follow the existing pattern: positive points for suspicious, negative for clearly human)
- What are the false positive risks?
All scoring happens in score_repo() in llm_detector.py. Follow the existing pattern:
# --- N. Your Signal Name (X-Y pts, can subtract up to -Z) ---
if some_condition:
if very_suspicious:
score += max_points
reasons.append(f"Description of what was found [{max_points:+.0f}]")
elif somewhat_suspicious:
score += partial_points
reasons.append(f"Milder description [{partial_points:+.0f}]")
elif clearly_human:
penalty = -Z
score += penalty
reasons.append(f"Human-like pattern [{penalty:+.0f}]")Key principles:
- Use
authored_totalnottotal_changes— generated files should already be filtered out - Exclude bot authors — use
is_bot_author()where relevant - Include negative signals — if your heuristic can identify clearly human patterns, subtract points
- Always append to
reasons— every scoring decision should be explainable
Update print_report() to show the raw data for your signal, and update the JSON output in main() if applicable.
- Add your signal to the Scoring Signals table
- Add any new thresholds to the relevant threshold tables
- Update the point ranges in the Note
Always test against a spread of repos:
# Known LLM-generated (should score high)
python3 llm_detector.py anthropics/claudes-c-compiler
# Known human multi-contributor (should score low)
python3 llm_detector.py django/django
python3 llm_detector.py golang/go
# Known human single-contributor (trickier — should still score low)
python3 llm_detector.py some/solo-human-project
# Your target repo
python3 llm_detector.py owner/repoGolden rule: Don't inflate scores on known-human repos just to catch one more LLM repo. False positives are worse than false negatives for a tool like this.
- Zero dependencies — stdlib only. Don't add
requests,numpy, or anything else. - Type hints — use them for function signatures
- Docstrings — every function gets one
- f-strings — for all string formatting
Found a repo that scores wrong? Open an issue with:
- The repo URL
- The score it got
- What you think the score should be
- Why (what the tool is getting wrong)
This is extremely valuable — it helps us tune thresholds and find blind spots.
- Diff complexity / entropy analysis — are the diffs structured or chaotic?
- File-type breakdown — LLMs love generating configs and boilerplate
- Comment density analysis — LLMs over-explain; humans under-explain
- Code style consistency — LLMs are eerily consistent across files
- Cross-file similarity — LLMs repeat patterns; humans get creative (or sloppy)
- Language-specific tuning — different languages have different "normal" velocities
- Commit time-of-day analysis — 4am coding sessions hitting 500 lines/hr?
By contributing, you agree that your contributions will be licensed under the MIT License.