[docs] Add AGENTS.md - AI agent coding guide#2922
[docs] Add AGENTS.md - AI agent coding guide#2922vaibhavk1992 wants to merge 5 commits intoapache:mainfrom
Conversation
This documentation extracts and documents coding conventions, patterns, and standards from the existing Apache Fluss codebase to assist AI coding agents. All rules and examples are derived from actual source code, Checkstyle configuration, and build files. - 11 comprehensive sections covering critical rules, API patterns, testing, dependencies, configuration, and build/CI - 100+ concrete code examples with DO/DON'T comparisons - Direct file references to canonical examples in the codebase - Fully compliant with Apache generative AI guidelines Also updated .gitignore to exclude CLAUDE.md (personal development notes)
|
Hi @vaibhavk1992, thanks for the initial PR! Just sharing some quick observations. I took a look and it looks quite solid with lots of useful information for coding agents. For comparison I examined Airflow's AGENTS.md (https://github.com/apache/airflow/blob/main/AGENTS.md). I noticed that in Airflow's AGENTS.md they have a section that tells the AI how to properly run git commands in addition to identifying as AI-generated: https://github.com/apache/airflow/blob/main/AGENTS.md#creating-pull-requests There's also a section of explicit boundaries for the agent that may have some overlap for Fluss: https://github.com/apache/airflow/blob/main/AGENTS.md#boundaries I noticed that the AGENTS.md files here and in Airflow both focus primarily on code contribution rather than deployment. I've read that AGENTS.md could serve a dual purpose: guiding contributors, and helping developers use AI to quickly set up and deploy a new repository from day one. However, I do believe the saying is that AGENTS.md is intended to be relatively brief (<500 lines?), and this document has already exceeded that. I am not sure how this could be best balanced. Thanks again for your work on this. Edit: A workaround for the lengthier AGENTS.md is to provide a breadcrumbs route, pointing the agent towards further documentation in other subfolders for additional information (e.g., Edit 2: I just saw here that module-level AGENTS.md would be in phase 3: https://cwiki.apache.org/confluence/display/FLUSS/FIP-34%3A+Making+Fluss+an+AI-Native+Project |
Restore trailing newline at end of .gitignore file to match apache/fluss upstream. Previous commit accidentally removed it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@qzyu999 Thanks for the reviewing it. |
|
|
||
| ### Package Structure | ||
|
|
||
| See CLAUDE.md for full module/package organization. Key modules: `fluss-common`, `fluss-rpc`, `fluss-client`, `fluss-server`. |
There was a problem hiding this comment.
I don't see a CLAUDE.md in the repo, actually if you check with Airflow they do have a CLAUDE.md, but it simply points to AGENTS.md in its contents: https://github.com/apache/airflow/blob/main/CLAUDE.md. I am thinking we could do the same.
|
|
||
| ## 10. Module Boundaries | ||
|
|
||
| **Module structure:** See CLAUDE.md for full module organization |
There was a problem hiding this comment.
Same as my previous comment above, no visible CLAUDE.md.
AGENTS.md
Outdated
|
|
||
| Detailed explanation of changes and motivation. | ||
|
|
||
| Co-Authored-By: Claude <ai-assistant@anthropic.com> |
There was a problem hiding this comment.
I believe we should make this doc agent agnostic, Airflow's AGENT.md mentions <Agent Name and Version> and links to an explicit separate post about Gen-AI assissted contributions: https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions. Perhaps we need our own such Fluss-specific post?
Edit: I just saw here the the AI-assisted contribution guidelines are phase 2: https://cwiki.apache.org/confluence/display/FLUSS/FIP-34%3A+Making+Fluss+an+AI-Native+Project
There was a problem hiding this comment.
Yes, I think we can add that in phase 2 under a different issue.
AGENTS.md
Outdated
|
|
||
| **Component tags:** `[client]`, `[server]`, `[rpc]`, `[flink]`, `[spark]`, `[docs]`, `[build]`, `[test]` | ||
|
|
||
| **AI-generated code identification:** ALWAYS include `Co-Authored-By: Claude <ai-assistant@anthropic.com>` in commit messages for AI-generated changes. |
There was a problem hiding this comment.
Same as above, referring to being agent-agnostic with <Agent Name and Version>.
|
Hi @vaibhavk1992, awesome job on the compression! I noted some examples where it's specifically referring to some form of "Claude", when I think we should try to be more agent agnostic. I refer to some examples in the Airflow Edit: I just saw here https://lists.apache.org/thread/xm35s36fsqt8dyhbkkvq05nwm7l48rp2 that this was mentioned explicitly: The link also mentions the same |
|
@qzyu999 I use claude.md for my local setup. Too make it agnostic I made an agents.Md. As per FIP and this task it needs just an Agents.md. I can get rid of claude.md wherever it is being referred. Please confirm. |
Hi @vaibhavk1992, actually it mentions here to keep the Edit: I also mention above that the repo should have a |
Following the Apache Airflow repository pattern, CLAUDE.md is now a symlink to AGENTS.md. This maintains a single source of truth while supporting both filename conventions for AI coding assistants. Co-Authored-By: Claude <ai-assistant@anthropic.com>
|
@qzyu999 I have done the suggested changes. Hopefully things look better now. |
qzyu999
left a comment
There was a problem hiding this comment.
Hi @vaibhavk1992, thanks for addressing my points in your PR. My overall goal was to try and map your AGENTS.md with what I understand to be a general template (although the concept is new and likely evolving). I believe you resolved them well, reducing the 1k+ lines to under 500 while keeping the core message.
I took a closer look at https://cwiki.apache.org/confluence/display/FLUSS/FIP-34%3A+Making+Fluss+an+AI-Native+Project (something I should've done better, apologies) and I am noting the following based on this being part of Phase 1.
We will add an AGENTS.md at the root of apache/fluss covering:
- Prerequisites and environment setup (Java 11, Maven wrapper)
- Build and test commands
- Repository structure and module overview (17 modules)
- Architecture boundaries (CoordinatorServer vs TabletServer, Log vs KV, Client vs Server)
- Coding standards (Java 11 target, Spotless formatting, Checkstyle)
- Testing conventions (JUnit 5 + AssertJ, *Test.java / *ITCase.java naming)
- Commit message format and PR conventions
- AI-assisted contribution rules (Generated-by tag, no AI Co-Authored-By)
- Based on the above requirements from the thread, I believe we still need the module overview, which perhaps could fit in section 3 (similar in spirit to here: https://github.com/apache/airflow/blob/main/AGENTS.md#repository-structure, but the exact way it's done may differ).
- The thread explicitly states "no AI Co-Authored-By", but it says that in the Commit Guidelines section (as we have previously discussed). Additionally, in 1.3 of the above thread it mentions:
Following the ASF Generative Tooling Guidance, we will update .github/PULL_REQUEST_TEMPLATE.md to add:
An AI disclosure checkbox
A Generated-by: tag section
This aligns with what Flink, Airflow, and Paimon are doing.
Based on this it may make sense to include .github/PULL_REQUEST_TEMPLATE.md (https://github.com/apache/fluss/blob/main/.github/PULL_REQUEST_TEMPLATE.md) into this PR, as the changes there are tightly linked to the changes here.
BTW, here are the AGENTS.md and PR templates from those respective repos:
- Airflow
- Flink
- No AGENTS.md as of yet
- https://github.com/apache/flink/blob/master/.github/PULL_REQUEST_TEMPLATE.md
- Paimon
Only in the Airflow AGENTS.md do I see the AI-generated checkbox guideline, which we could use as an example.
I do think that there is some vagueness, as there seems to be overlap between phases 1 and 2. I would like to see what others have to say. I think overall LGTM, but there are these minor points that could definitely benefit from a second pair of eyes. Great work @vaibhavk1992!
|
Hi @vaibhavk1992, thanks for putting this together — really solid guide overall! One thing I noticed: the doc mentions "Java 11" in Prerequisites (Section 11) and briefly notes "runtime Java 8 compatible" in the CI section, but there's no explicit rule telling AI agents that all code must compile at Java 8 source level. This is actually one of the most common pitfalls for AI coding agents — if they see "Java 11" as a prerequisite, they'll naturally reach for Java 9+ APIs and language features, which will then fail the I'd suggest adding a dedicated rule in Section 1 (Critical Rules), something like: ### Java Version Compatibility
**Source level: Java 8** — All code MUST compile with JDK 8. CI enforces this via `compile-on-jdk8`.
**FORBIDDEN Java 9+ features:**
- ❌ `var` keyword (Java 10)
- ❌ `List.of()`, `Map.of()`, `Set.of()` (Java 9) → ✅ `Collections.singletonList()`, `Collections.unmodifiableMap()`
- ❌ `Optional.isEmpty()` (Java 11) → ✅ `!optional.isPresent()`
- ❌ `String.strip()`, `String.isBlank()` (Java 11) → ✅ `String.trim()`, `string.trim().isEmpty()`
- ❌ Switch expressions, text blocks, records, sealed classes, pattern matching
- ❌ `Map.entry()`, `Map.ofEntries()` (Java 9)
- ❌ `InputStream.transferTo()` (Java 9)
- ❌ `Stream.toList()` (Java 16) → ✅ `Collectors.toList()`This follows the same DO/DON'T pattern already used in the doc (e.g., forbidden imports, MapUtils), so it should fit in naturally. Given how frequently AI agents trip over this, I think it deserves a prominent spot in the Critical Rules section. What do you think? |
- Add Java Version Compatibility section to clarify Java 8 source level requirement - Add Repository Structure overview with module descriptions - Remove Co-Authored-By references from commit guidelines - Add ASF-compliant AI disclosure checkbox to PR template - Add Generated-by tag template aligned with Apache Airflow approach
|
@qzyu999 I have additional details and suggestions.
|
|
@platinumhamburg Good catch, I have added this too. |
This documentation extracts and documents coding conventions, patterns, and standards from the existing Apache Fluss codebase to assist AI coding agents. All rules and examples are derived from actual source code, Checkstyle configuration, and build files.
Also updated .gitignore to exclude CLAUDE.md (personal development notes)
Purpose
Linked issue: #2921
Brief change log
Tests
API and Format
Documentation