ML Research Project Template

AI-assisted documentation and process framework for reproducible ML research.

Getting Started

1. Get the template

Option A: Clone and start fresh

git clone https://github.com/pr4deepr/ml-research-template.git my-project
cd my-project
rm -rf .git
git init

Option B: Download manually

Go to the repository page, click Code > Download ZIP, and extract the contents into your project folder.

2. Set up with your AI tool

Open your project folder in your AI coding tool, then tell it: "Read setup.md and help me set up this project"

Claude Code (CLI):

cd my-project
claude

Claude Code (Desktop app or VS Code / JetBrains extension): Open the project folder, then start a new conversation.

Other tools (Cursor, Copilot, Windsurf, etc.): Open the project folder in the editor -- the AI assistant will automatically pick up the template files.

That's it. The AI reads your answers and fills in CLAUDE.md, methodology, experiment roadmap, and everything else. You review and adjust.

No AI tool? Fill in setup.md manually, then use it as a reference to populate the template files yourself. Start with CLAUDE.md, then methodology.md, then experiment_roadmap.md.

What to Fill In When

When	Files	Why
Day 1	CLAUDE.md, user_role.md, environment.yml, setup.md, methodology.md (scaffold)	Define the project
After first experiment	methodology.md (refine metrics), EXPERIMENT_LOG.md	Lock in eval protocol
Ongoing	architecture_decisions.md, experiment_roadmap.md, CHANGELOG.md	Track decisions
Mid/late project	critical_assessment.md, retrospective.md, manuscript/	Reflect and write up

Structure

project/
├── CLAUDE.md                       # Navigation hub (start here)
├── README.md                       # This file
├── setup.md                        # Project setup questionnaire
├── CHANGELOG.md                    # Chronological decisions
├── environment.yml                 # Conda environment
├── data/
│   └── README.md                   # Data provenance
├── docs/
│   ├── methodology.md              # Eval metrics, protocol, constraints
│   ├── architecture_decisions.md   # Choices with evidence
│   ├── experiment_roadmap.md       # What to run and why
│   ├── critical_assessment.md      # Honest limitations
│   └── retrospective.md           # Lessons learned
├── evaluation/
│   └── README.md                   # Shared eval code principles
├── experiments/
│   ├── EXPERIMENT_LOG.md           # ALL results, one file
│   └── configs/                    # YAML config per experiment
│       └── README.md               # Config format guide
├── references/                     # External feedback & literature
│   └── README.md                   # What goes here
├── manuscript/
│   └── paper_outline.md            # Paper structure
└── memory/                         # Cross-session context
    ├── MEMORY.md                   # Index
    ├── project_key_learnings.md    # Full narrative
    └── user_role.md                # User background

Customization by Project Type

The AI assistant will adjust the template based on your project type. Here's what changes:

Supervised Learning (classification / regression)

Metrics: accuracy, F1, AUC as primary; cross-validation protocol
Workflow: baseline > hyperparameter search > ablations > final eval
Constraints emphasis: stratified splits, class imbalance handling
Relax: screening constraint if dataset is small enough for fast full runs
Suggested first experiment: train baseline model, establish metric floor

Self-Supervised / Unsupervised Learning

Metrics: downstream probe as primary (not pretraining loss), random-init baseline required
Workflow: pretraining screen > probe evaluation > representation analysis
Constraints emphasis: proxy metric warning (training loss != feature quality), effective N for clustering
Skip: manuscript/ sections on supervised baselines if not applicable
Suggested first experiment: train with default config, evaluate with linear probe vs random init

Fine-tuning / Transfer Learning (LLMs, foundation models)

Metrics: downstream task metrics, catastrophic forgetting checks
Workflow: baseline (zero-shot) > prompt engineering > fine-tune > eval
Constraints emphasis: data contamination checks, eval set leakage, cost tracking
Skip: architecture_decisions.md if using a frozen pretrained model; experiment_roadmap.md tiers if iterating on prompts
Suggested first experiment: zero-shot baseline on your eval set

Applied / Engineering (deployment-focused)

Metrics: latency, throughput, accuracy-at-threshold
Workflow: baseline > optimization > A/B test > deploy
Constraints emphasis: production-readiness, reproducibility across environments
Skip: manuscript/ (unless writing a technical report), retrospective Part B (unless novel findings)
Suggested first experiment: benchmark current system, establish performance baseline

Core Principles

1. One source of truth per information type

Results > EXPERIMENT_LOG.md. Decisions > architecture_decisions.md. Status > CLAUDE.md. Methodology > methodology.md.

2. Navigation hub, not monolith

CLAUDE.md links to everything but stays under ~100-150 lines.

3. Save incrementally, skip existing

Never lose progress to crashes. Save results per experiment. On restart, skip what's done.

4. Always compare to baseline

Include random init / majority class / "doing nothing" in every evaluation. Without it, you can't tell if your method helps.

5. Report effective N

Independent samples != correlated observations. Know your independent unit.

6. Map experiments to deliverables

Every experiment serves a paper section or a decision. If it doesn't, don't run it.

7. Update protocol

When something changes, CLAUDE.md's Update Protocol tells you which files need updating.

8. Test the eval, not just the model

The evaluation code is what you make decisions from. If it has a bug, every decision downstream is wrong.

9. Document WHY, not just WHAT

"We use X because screen showed Y" beats "We use X."

10. Memory for cross-session continuity

project_key_learnings.md is a handoff to a colleague who's never seen the project.

What This Template Does NOT Include

Model code (project-specific)
Data loading (project-specific)
Training loops (project-specific)
CI/CD configuration
Metric visualization (use W&B/TensorBoard when complexity demands it)

The template provides documentation and process structure around whatever code you write.

Using With Other AI Tools

This template is designed for Claude Code but 90% is model-agnostic markdown. Only the memory/ directory and CLAUDE.md filename are Claude-specific.

Tool	Changes needed
ChatGPT	Rename `CLAUDE.md` > `PROJECT.md`. Copy `memory/project_key_learnings.md` into Custom Instructions.
Gemini CLI	Rename `CLAUDE.md` > `GEMINI.md`. Memory files work as context docs.
Cursor	Rename `CLAUDE.md` > `.cursorrules`. Memory files > paste into project context.
Copilot / Windsurf	Rename `CLAUDE.md` > `AGENTS.md`. Same structure works.
Open source (Ollama, etc.)	No persistent memory. Paste `project_key_learnings.md` into system prompt each session.

The setup.md questionnaire works with any AI tool that can read files.

What's universal vs Claude-specific

Component	Universal	Claude-specific
All `docs/`, `experiments/`, `manuscript/`, `data/`, `references/`	Yes	--
Update Protocol, Core Principles	Yes	--
`.gitignore`, `environment.yml`, `CHANGELOG.md`	Yes	--
`CLAUDE.md` (navigation hub)	Concept is universal	Filename and session startup
`memory/` (persistent context)	Concept is universal	YAML frontmatter, auto-loading

Origin

Built during a real ML research project (2026-04-03 to 2026-04-06), then generalized. Every pattern solves a real problem: orphaned GPU processes, lost experiment results, stale docs, wrong metrics, inflated sample sizes, and context loss between sessions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Research Project Template

Getting Started

1. Get the template

2. Set up with your AI tool

What to Fill In When

Structure

Customization by Project Type

Supervised Learning (classification / regression)

Self-Supervised / Unsupervised Learning

Fine-tuning / Transfer Learning (LLMs, foundation models)

Applied / Engineering (deployment-focused)

Core Principles

1. One source of truth per information type

2. Navigation hub, not monolith

3. Save incrementally, skip existing

4. Always compare to baseline

5. Report effective N

6. Map experiments to deliverables

7. Update protocol

8. Test the eval, not just the model

9. Document WHY, not just WHAT

10. Memory for cross-session continuity

What This Template Does NOT Include

Using With Other AI Tools

What's universal vs Claude-specific

Origin

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
data		data
docs		docs
evaluation		evaluation
experiments		experiments
manuscript		manuscript
memory		memory
references		references
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
environment.yml		environment.yml
setup.md		setup.md

Folders and files

Latest commit

History

Repository files navigation

ML Research Project Template

Getting Started

1. Get the template

2. Set up with your AI tool

What to Fill In When

Structure

Customization by Project Type

Supervised Learning (classification / regression)

Self-Supervised / Unsupervised Learning

Fine-tuning / Transfer Learning (LLMs, foundation models)

Applied / Engineering (deployment-focused)

Core Principles

1. One source of truth per information type

2. Navigation hub, not monolith

3. Save incrementally, skip existing

4. Always compare to baseline

5. Report effective N

6. Map experiments to deliverables

7. Update protocol

8. Test the eval, not just the model

9. Document WHY, not just WHAT

10. Memory for cross-session continuity

What This Template Does NOT Include

Using With Other AI Tools

What's universal vs Claude-specific

Origin

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages