把 Claude Code 变成一个完整的虚拟数据科学团队 — 62 个 AI 智能体、45 条工作流命令、10 个自动化钩子,完整映射真实 DS/ML 团队的组织架构。
Turn Claude Code into a complete virtual data science organization — 62 AI agents, 45 workflow commands, 10 automated hooks, and a full coordination system mirroring real DS/ML team hierarchy.
🇨🇳 中文文档
Claude Code DS Studios 将 Claude Code 变成一整个数据科学部门。62 个专业智能体覆盖 ML 全生命周期:数据工程、特征工程、模型训练、部署上线、监控运维、分析报表、实验设计和数据治理。
这不是一个传统的应用程序——而是一个元框架。你把 .claude/ 目录和 CLAUDE.md 复制到自己的数据科学项目中,就能获得完整的 AI 团队协助。
灵感来源于 Claude-Code-Game-Studios,在其基础上大幅扩展:
| 特性 | Game Studios | DS Studios |
|---|---|---|
| 智能体 | 48 | 62 |
| 斜杠命令 | 37 | 45 |
| 规则文件 | 11 | 15 |
| 自动化钩子 | 8 | 10 |
| 模板 | 28 | 35 |
| 技术栈专家 | 3 | 12 |
| MCP 集成 | 无 | 原生支持 |
| Notebook 支持 | 无 | 原生 (NotebookEdit) |
| 数据血缘 | 无 | /data-lineage |
| 实验追踪 | 无 | /experiment + MLflow |
| 模型注册表 | 无 | Model Registry |
| 团队编排 | 无 | 3 条编排命令 |
| 成本优化 | 无 | cost-optimizer 智能体 |
| 数据治理 | 无 | 完整框架 |
# 1. 克隆仓库(或将 .claude/ 目录复制到你的 DS 项目中)
git clone https://github.com/your-org/Claude-Code-DS-Studios.git
cd Claude-Code-DS-Studios
# 2. 启动 Claude Code
claude
# 3. 运行引导流程
/start/start 命令会自动检测你的项目状态并引导设置:
- 全新项目? 帮你定义范围、选择技术栈、搭建项目骨架
- 已有项目? 分析现有内容,识别缺失的部分
- 回来继续? 恢复上次的会话状态,展示下一步
[人类数据科学家]
|
+--------------+--------+--------+--------------+
| | | | |
首席数据官 ML架构总监 数据工程总监 分析总监 项目经理
(chief-data- (ml-arch- (de-dir) (analytics- (ds-pm)
officer) dir) dir)
| | | | |
(战略) +---+---+ +--+--+ +---+---+ (协调)
| | | | | | | |
ML主管 | NLP DE主管 治理 分析 可视化 实验
| CV | | | | | |
+--+-+ | ++-+ +-+-+ +-+ | +-+ |
| | | | | | | | | | | | | | |
[33 个专家 + 12 个技术栈专家]
战略决策、架构审查、资源分配
领域管理、团队协调、质量标准制定
数据工程、ML、分析、MLOps 等领域的实际执行
Python、R、SQL、Spark、dbt、AWS/GCP/Azure 等技术深度专家
10 个 Shell 钩子在 Claude Code 生命周期的关键节点自动运行,无需人工干预:
| 钩子 | 触发时机 | 作用 |
|---|---|---|
session-start.sh |
会话启动 | 加载项目上下文,恢复上次会话状态 |
detect-gaps.sh |
会话启动 | 检测缺失的依赖文件、配置、测试等 |
pre-tool-use.sh |
执行命令前 | 阻止删除原始数据、阻止暴露凭证 |
validate-commit.sh |
执行命令前 | 拦截大文件提交、.env 文件暂存 |
validate-push.sh |
执行命令前 | 阻止 force-push,警告直接推送到 main |
post-tool-use.sh |
写文件后 | 检测硬编码的密码和绝对路径 |
notebook-save.sh |
写文件后 | 保存 Notebook 时检查质量 |
pre-compact.sh |
上下文压缩前 | 保存会话状态,防止信息丢失 |
stop.sh |
会话结束 | 归档状态文件,记录会话日志 |
log-agent.sh |
子智能体启动 | 记录智能体调用审计日志 |
| 类别 | 命令 |
|---|---|
| 入门 | /start、/setup-stack、/configure-mcp |
| 数据 | /ingest、/eda、/data-profile、/data-quality、/data-catalog、/schema-design、/data-lineage |
| 特征 | /feature-engineer、/feature-store、/feature-select |
| 建模 | /train、/evaluate、/hyperopt、/experiment、/compare-models、/explain、/automl |
| 部署 | /deploy、/serve、/monitor、/retrain、/rollback |
| 流水线 | /pipeline、/orchestrate、/schedule、/backfill |
| 分析 | /dashboard、/report、/ab-test、/cohort-analysis、/forecast |
| 质量 | /code-review、/notebook-review、/sql-review、/security-audit |
| 团队 | /brainstorm、/sprint-plan、/team-eda、/team-modeling、/team-deploy |
| 元信息 | /status、/help、/agent-roster |
智能体可通过 NotebookEdit 工具直接创建和编辑 Jupyter Notebook。/notebook-review 命令执行 DS 专用的质量检查。
/experiment 命令集成 MLflow,追踪参数、指标和产物。experimentation-lead 智能体协调 A/B 测试和 ML 实验。
完整的治理框架:data-governance-lead、data-quality-engineer 和 metadata-engineer 三个智能体协同保障数据质量、血缘和合规。
project/experiments/session-state/active.md 作为跨会话的持久记忆。钩子自动在上下文压缩前保存状态,会话启动时恢复。对话可能丢失,但文件不会。
AWS(SageMaker)、GCP(Vertex AI)、Azure(Azure ML)三大云平台的技术栈专家,提供云原生方案同时保持可移植性。
cost-optimizer 智能体监控云平台计算成本,推荐实例调整、Spot 实例和高效查询模式。
project/ 目录是标准的数据科学项目模板:
project/
├── data/{raw,interim,processed,external,features}/ # 数据(raw 不可变)
├── notebooks/{exploratory,modeling,evaluation,reporting}/ # Notebook
├── src/{data,features,models,pipelines,serving,visualization,utils}/ # 生产代码
├── dbt/{models,seeds,macros,tests}/ # dbt 数据转换
├── dags/ # Airflow DAG
├── configs/ # 外部化配置
├── tests/{unit,integration,data}/ # 测试
├── models/ # 模型产物
├── reports/figures/ # 报告和图表
├── experiments/ # 实验记录
└── docker/ # Docker 配置
用户驱动,非自主执行。 每个任务遵循:
提问 → 给出选项 → 用户决定 → 出草稿 → 用户批准
- 智能体在写文件前必须征求用户许可
- 多文件修改需要整体批准
- 不能自动提交代码
- Notebook 修改需要逐节或逐单元格确认
完整文档在 .claude/docs/:
| 文件 | 内容 |
|---|---|
quick-start.md |
快速入门指南 |
agent-roster.md |
完整智能体列表 |
coordination-map.md |
委派和工作流模式 |
coordination-rules.md |
5 条核心协调规则 |
coding-standards.md |
Python、SQL、Notebook 编码规范 |
context-management.md |
上下文管理与恢复策略 |
mcp-integration-guide.md |
MCP 服务器配置 |
skills-reference.md |
Skills 技能参考 |
stack-guides/ |
各技术栈深度指南 |
- Claude Code(最新版)
- Git
- Python 3.10+(推荐)
jq(钩子依赖 —brew install jq/apt install jq)
🇺🇸 English Documentation
Claude Code DS Studios transforms Claude Code into an entire data science department. 62 specialized agents span the full ML lifecycle: data engineering, feature engineering, model training, deployment, monitoring, analytics, experimentation, and governance.
This is not a traditional application — it's a meta-framework. Copy the .claude/ directory and CLAUDE.md into your own data science project to get a fully coordinated AI team.
Inspired by Claude-Code-Game-Studios, expanded significantly:
| Feature | Game Studios | DS Studios |
|---|---|---|
| Agents | 48 | 62 |
| Slash Commands | 37 | 45 |
| Rules | 11 | 15 |
| Hooks | 8 | 10 |
| Templates | 28 | 35 |
| Stack Specialists | 3 | 12 |
| MCP Integration | None | Native |
| Notebook Support | None | Native (NotebookEdit) |
| Data Lineage | None | /data-lineage |
| Experiment Tracking | None | /experiment + MLflow |
| Model Registry | None | Model Registry |
| Team Orchestration | None | 3 commands |
| Cost Optimization | None | cost-optimizer agent |
| Data Governance | None | Full framework |
# 1. Clone this repo (or copy .claude/ into your DS project)
git clone https://github.com/your-org/Claude-Code-DS-Studios.git
cd Claude-Code-DS-Studios
# 2. Open Claude Code
claude
# 3. Run the onboarding flow
/startThe /start command detects your project state and guides you through setup:
- No project yet? Helps you define scope, choose a stack, and scaffold the project
- Existing project? Analyzes what you have and identifies gaps
- Returning? Recovers your session state and shows next steps
[Human Data Scientist]
|
+--------------+-----------+-----------+--------------+
| | | | |
chief-data-officer ml-arch-dir de-dir analytics-dir ds-pm
| | | | |
(strategy) +---+---+ +--+--+ +---+---+ (coordination)
| | | | | | | |
ml-lead | nlp de-lead gov ana viz exp
| cv | | | | | |
+--+--+ | ++-+ +-+-+ +-+-+ | +-+ |
| | | | | | | | | | | | | | | |
[33 Specialists + 12 Stack Experts]
Strategic decision-making, architecture review, resource allocation
Domain ownership, team coordination, quality standards
Hands-on implementation across data engineering, ML, analytics, MLOps
Deep expertise in specific technologies: Python, R, SQL, Spark, dbt, cloud platforms
10 shell hooks run automatically at key Claude Code lifecycle events:
| Hook | Trigger | Purpose |
|---|---|---|
session-start.sh |
SessionStart | Load project context, recover previous session |
detect-gaps.sh |
SessionStart | Flag missing dependencies, configs, tests |
pre-tool-use.sh |
PreToolUse (Bash) | Block raw data deletion, credential exposure |
validate-commit.sh |
PreToolUse (Bash) | Catch large files, .env staging, notebook bloat |
validate-push.sh |
PreToolUse (Bash) | Block force-push, warn on main branch pushes |
post-tool-use.sh |
PostToolUse (Write/Edit) | Detect hardcoded credentials and paths |
notebook-save.sh |
PostToolUse (Write/Edit) | Validate notebook quality on save |
pre-compact.sh |
PreCompact | Dump session state to survive context compression |
stop.sh |
Stop | Archive state file, log session summary |
log-agent.sh |
SubagentStart | Audit trail of agent invocations |
| Category | Commands |
|---|---|
| Onboarding | /start, /setup-stack, /configure-mcp |
| Data | /ingest, /eda, /data-profile, /data-quality, /data-catalog, /schema-design, /data-lineage |
| Feature | /feature-engineer, /feature-store, /feature-select |
| Modeling | /train, /evaluate, /hyperopt, /experiment, /compare-models, /explain, /automl |
| Deployment | /deploy, /serve, /monitor, /retrain, /rollback |
| Pipeline | /pipeline, /orchestrate, /schedule, /backfill |
| Analytics | /dashboard, /report, /ab-test, /cohort-analysis, /forecast |
| Quality | /code-review, /notebook-review, /sql-review, /security-audit |
| Team | /brainstorm, /sprint-plan, /team-eda, /team-modeling, /team-deploy |
| Meta | /status, /help, /agent-roster |
Agents create and edit Jupyter notebooks via the NotebookEdit tool. /notebook-review applies DS-specific quality standards.
/experiment integrates with MLflow for tracking parameters, metrics, and artifacts. experimentation-lead coordinates A/B tests and ML experiments.
A complete framework with data-governance-lead, data-quality-engineer, and metadata-engineer agents ensuring quality, lineage, and compliance.
project/experiments/session-state/active.md serves as persistent memory across compactions and crashes. Hooks auto-save before compression and auto-recover on startup. Conversations are ephemeral; files persist.
Stack specialists for AWS (SageMaker), GCP (Vertex AI), and Azure (Azure ML) provide cloud-native guidance while maintaining portability.
The cost-optimizer agent monitors compute costs and recommends right-sizing, spot instances, and efficient query patterns.
The project/ directory is a standard data science project template:
project/
├── data/{raw,interim,processed,external,features}/ # Data (raw is immutable)
├── notebooks/{exploratory,modeling,evaluation,reporting}/ # Notebooks
├── src/{data,features,models,pipelines,serving,visualization,utils}/ # Production code
├── dbt/{models,seeds,macros,tests}/ # dbt transformations
├── dags/ # Airflow DAGs
├── configs/ # Externalized configs
├── tests/{unit,integration,data}/ # Tests
├── models/ # Model artifacts
├── reports/figures/ # Reports and charts
├── experiments/ # Experiment tracking
└── docker/ # Docker setup
User-driven collaboration, not autonomous execution. Every task follows:
Question → Options → Decision → Draft → Approval
- Agents MUST ask before writing files
- Multi-file changes require full changeset approval
- No commits without user instruction
- Notebook changes require cell-by-cell or section approval
Full documentation in .claude/docs/:
| File | Content |
|---|---|
quick-start.md |
Getting started guide |
agent-roster.md |
Complete agent list with roles and tiers |
coordination-map.md |
Delegation and workflow patterns |
coordination-rules.md |
5 core coordination rules |
coding-standards.md |
Python, SQL, notebook standards |
context-management.md |
Context budget and recovery strategies |
mcp-integration-guide.md |
MCP server configuration |
skills-reference.md |
Skills reference |
stack-guides/ |
Per-technology deep guides |
- Claude Code (latest version)
- Git
- Python 3.10+ (recommended)
jq(for hooks —brew install jq/apt install jq)
MIT