From ba61fd43fee3488c55cfee14314bced6c6459026 Mon Sep 17 00:00:00 2001 From: vaibhav kumar Date: Tue, 24 Mar 2026 15:31:39 +0530 Subject: [PATCH 1/5] [docs] Add AGENTS.md - AI agent coding guide This documentation extracts and documents coding conventions, patterns, and standards from the existing Apache Fluss codebase to assist AI coding agents. All rules and examples are derived from actual source code, Checkstyle configuration, and build files. - 11 comprehensive sections covering critical rules, API patterns, testing, dependencies, configuration, and build/CI - 100+ concrete code examples with DO/DON'T comparisons - Direct file references to canonical examples in the codebase - Fully compliant with Apache generative AI guidelines Also updated .gitignore to exclude CLAUDE.md (personal development notes) --- .gitignore | 1 + AGENTS.md | 1729 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 1730 insertions(+) create mode 100644 AGENTS.md diff --git a/.gitignore b/.gitignore index 845fa7ff97..1fe4b030fe 100644 --- a/.gitignore +++ b/.gitignore @@ -24,6 +24,7 @@ dependency-reduced-pom.xml ### claude code ### .claude/ +CLAUDE.md ### Mac OS ### .DS_Store diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..37e92c7c35 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1729 @@ + + +# Apache Fluss - AI Agent Coding Guide + +This document provides AI coding agents with comprehensive project-specific context for Apache Fluss, including critical conventions, architectural patterns, and quality standards. + +## About This Document + +This documentation was created by systematically analyzing the Apache Fluss codebase, including: +- Checkstyle configuration (`tools/maven/checkstyle.xml`) +- Existing code patterns and conventions +- Build configurations and CI pipelines +- Test frameworks and utilities + +All rules, patterns, and examples are derived from the actual project source code. The structure and explanations were organized with AI assistance (Claude Code) to make the content accessible to AI coding agents. + +**Content Sources:** +- Critical rules: Extracted from Checkstyle enforcement rules +- Code patterns: Identified from production code in `fluss-client`, `fluss-server`, `fluss-common` +- Testing patterns: Documented from existing test classes and test utilities +- All code examples: Real examples from the Apache Fluss codebase + +## Table of Contents + +1. [Critical Rules (MUST/NEVER)](#1-critical-rules-mustnever) +2. [API Design Patterns](#2-api-design-patterns) +3. [Code Organization](#3-code-organization) +4. [Error Handling](#4-error-handling) +5. [Concurrency & Thread Safety](#5-concurrency--thread-safety) +6. [Testing Standards](#6-testing-standards) +7. [Dependencies & Shading](#7-dependencies--shading) +8. [Configuration Patterns](#8-configuration-patterns) +9. [Serialization & RPC](#9-serialization--rpc) +10. [Module Boundaries](#10-module-boundaries) +11. [Build & CI](#11-build--ci) + +--- + +## 1. Critical Rules (MUST/NEVER) + +These rules catch 80% of common violations. **Enforced by Checkstyle** - violations will fail CI. + +### 1.1 Dependency Management (MANDATORY) + +#### NEVER Use These Imports: + +```java +// ❌ DON'T: +import com.google.common.* // Use fluss-shaded-guava +import com.fasterxml.jackson.* // Use fluss-shaded-jackson2 +import org.codehaus.jackson.* // Use fluss-shaded-jackson2 +import com.google.common.base.Preconditions; // Use Fluss Preconditions +import com.google.common.annotations.VisibleForTesting; // Use Fluss annotation +import org.apache.commons.lang.*; // Use commons-lang3 +import org.apache.commons.lang3.Validate; // Use Fluss Preconditions +import org.apache.commons.lang3.SerializationUtils; // Use Fluss InstantiationUtil +``` + +#### ALWAYS Use Shaded Versions: + +| Forbidden | Required | +|-----------|----------| +| `com.google.common.*` | `org.apache.fluss.shaded.guava.*` | +| `com.fasterxml.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | +| `org.codehaus.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | +| `io.netty.*` | `org.apache.fluss.shaded.netty4.*` | +| `org.apache.arrow.*` | `org.apache.fluss.shaded.arrow.*` | +| `org.apache.zookeeper.*` | `org.apache.fluss.shaded.zookeeper38.*` | + +**Rationale:** Prevents dependency conflicts with user applications. + +### 1.2 Utility Classes (MANDATORY) + +#### NEVER Instantiate ConcurrentHashMap: + +```java +// ❌ DON'T: +Map map = new ConcurrentHashMap<>(); + +// ✅ DO: +import org.apache.fluss.utils.MapUtils; +Map map = MapUtils.newConcurrentMap(); +``` + +**Rationale:** See https://github.com/apache/fluss/issues/375 + +#### ALWAYS Use Fluss Preconditions: + +```java +// ❌ DON'T: +import com.google.common.base.Preconditions; +import org.apache.commons.lang3.Validate; + +// ✅ DO: +import static org.apache.fluss.utils.Preconditions.checkNotNull; +import static org.apache.fluss.utils.Preconditions.checkArgument; +import static org.apache.fluss.utils.Preconditions.checkState; + +public void process(String input) { + checkNotNull(input, "input cannot be null"); + checkArgument(!input.isEmpty(), "input cannot be empty"); +} +``` + +**MUST import statically** - non-static imports will fail Checkstyle. + +**Reference:** `fluss-common/src/main/java/org/apache/fluss/utils/Preconditions.java` + +#### ALWAYS Use Fluss VisibleForTesting: + +```java +// ❌ DON'T: +import com.google.common.annotations.VisibleForTesting; + +// ✅ DO: +import org.apache.fluss.annotation.VisibleForTesting; + +@VisibleForTesting +void helperMethod() { } +``` + +### 1.3 Banned Methods (MANDATORY) + +#### NEVER Use These Methods: + +```java +// ❌ DON'T - These read system properties incorrectly: +Boolean.getBoolean("property.name") +Integer.getInteger("property.name") +Long.getLong("property.name") + +// ✅ DO - Read system properties correctly: +Boolean.parseBoolean(System.getProperty("property.name")) +Integer.parseInt(System.getProperty("property.name")) +Long.parseLong(System.getProperty("property.name")) +``` + +#### NEVER Use SerializationUtils: + +```java +// ❌ DON'T: +import org.apache.commons.lang3.SerializationUtils; + +// ✅ DO: +import org.apache.fluss.utils.InstantiationUtil; +``` + +### 1.4 Testing Assertions (MANDATORY) + +#### ALWAYS Use AssertJ (NOT JUnit Assertions): + +```java +// ❌ DON'T: +import org.junit.jupiter.api.Assertions; +Assertions.assertEquals(expected, actual); +Assertions.assertTrue(condition); +Assertions.assertThrows(Exception.class, () -> doSomething()); + +// ✅ DO: +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +assertThat(actual).isEqualTo(expected); +assertThat(condition).isTrue(); +assertThat(list).hasSize(3).contains("a", "b", "c"); +assertThatThrownBy(() -> doSomething()) + .isInstanceOf(IllegalArgumentException.class) + .hasMessageContaining("expected message"); +``` + +**Rationale:** AssertJ provides more readable and expressive assertions. + +**Plugin:** Use "Assertions2Assertj" IntelliJ plugin to convert existing JUnit assertions. + +#### NEVER Set @Timeout on Tests: + +```java +// ❌ DON'T: +import org.junit.jupiter.api.Timeout; +@Timeout(5) +@Test +void testSomething() { } + +// ✅ DO: +// Rely on global timeout instead +@Test +void testSomething() { } +``` + +**Rationale:** Per-test timeouts make debugging harder. Use global timeout configuration. + +### 1.5 Code Style (MANDATORY) + +#### NEVER Use Star Imports: + +```java +// ❌ DON'T: +import java.util.*; +import org.apache.fluss.client.*; + +// ✅ DO: +import java.util.List; +import java.util.Map; +import org.apache.fluss.client.Connection; +import org.apache.fluss.client.Admin; +``` + +**Enforcement:** Set import threshold to 9999 in IDE settings. + +#### NEVER Have Trailing Whitespace: + +Run `./mvnw spotless:apply` before committing to auto-fix. + +#### ALWAYS Use Java-Style Array Declarations: + +```java +// ❌ DON'T: +String args[] +int values[] + +// ✅ DO: +String[] args +int[] values +``` + +#### ALWAYS Require Braces: + +```java +// ❌ DON'T: +if (condition) doSomething(); + +// ✅ DO: +if (condition) { + doSomething(); +} +``` + +### 1.6 Comments & Documentation (MANDATORY) + +#### NEVER Use TODOs with Usernames: + +```java +// ❌ DON'T: +// TODO(username): Fix this +// TODO(john.doe) Implement feature + +// ✅ DO: +// TODO: Fix this +// TODO Implement feature +``` + +#### NEVER Use These Comment Tags: + +```java +// ❌ DON'T: +// FIXME: Should be refactored +// XXX: Hack +// @author John Doe + +// ✅ DO: +// TODO: Should be refactored +// (No XXX or @author tags - use git history instead) +``` + +### 1.7 File Size Limits (MANDATORY) + +**Maximum file length:** 3000 lines + +If a file exceeds this limit, split it into multiple classes. + +### 1.8 Javadoc Requirements (MANDATORY) + +#### ALWAYS Document Protected and Public APIs: + +```java +// ✅ DO: +/** + * Creates a new connection to the Fluss cluster. + * + * @param configuration The configuration for the connection + * @return A new connection instance + * @throws FlussException if connection fails + */ +public static Connection createConnection(Configuration configuration) + throws FlussException { + // implementation +} +``` + +**Required:** All protected and public classes, interfaces, enums, and methods must have Javadoc. + +--- + +## 2. API Design Patterns + +### 2.1 API Stability Annotations + +Use annotations to signal API stability guarantees to users: + +```java +@PublicStable // Stable - breaking changes only in major versions +@PublicEvolving // May change in minor versions (new features being stabilized) +@Internal // Not public API - can change anytime +@VisibleForTesting // Exposed only for testing +``` + +**Rules:** +- **ALWAYS annotate** classes/interfaces in public packages (`fluss-client`, `fluss-common` public APIs) +- Use `@PublicStable` for core APIs: `Connection`, `Admin`, `ConfigOption`, `TableDescriptor` +- Use `@PublicEvolving` for new features: `LakeTieringManager`, new connector APIs +- Use `@Internal` for: RPC messages, server internals, `*JsonSerde` classes, test utilities + +**Examples:** +```java +@PublicStable +public class ConnectionFactory { + // Stable public API +} + +@PublicEvolving +public interface LakeTieringManager { + // New feature, API may evolve +} + +@Internal +public class MetadataJsonSerde { + // Internal implementation detail +} +``` + +**Reference:** `fluss-common/src/main/java/org/apache/fluss/annotation/` + +### 2.2 Builder Pattern + +Use fluent builder pattern for complex object construction: + +```java +// ✅ Example: ConfigOption builder +ConfigOption tempDirs = ConfigBuilder + .key("tmp.dir") + .stringType() + .defaultValue("/tmp") + .withDescription("Temporary directory path"); + +ConfigOption> ports = ConfigBuilder + .key("application.ports") + .intType() + .asList() + .defaultValues(8000, 8001, 8002); + +ConfigOption timeout = ConfigBuilder + .key("client.timeout") + .durationType() + .defaultValue(Duration.ofSeconds(30)); +``` + +**Pattern structure:** +- Create static inner class `Builder` +- Use method chaining (return `this`) +- Provide `build()` method to construct final object +- Make constructor private or package-private + +**Reference:** `fluss-common/src/main/java/org/apache/fluss/config/ConfigBuilder.java` + +### 2.3 Factory Pattern + +Use static factory methods for object creation (not public constructors): + +```java +@PublicEvolving +public class ConnectionFactory { + + // Private constructor - prevent direct instantiation + private ConnectionFactory() {} + + /** + * Creates a new connection to Fluss cluster. + */ + public static Connection createConnection(Configuration conf) { + return new FlussConnection(conf); + } + + /** + * Creates connection with custom metric registry. + */ + public static Connection createConnection( + Configuration conf, + MetricRegistry metricRegistry) { + return new FlussConnection(conf, metricRegistry); + } +} + +// ✅ Usage: +Connection conn = ConnectionFactory.createConnection(config); +``` + +**Rules:** +- Make constructor private +- Provide static factory methods +- Return interface types (not concrete implementations) +- Allow method overloading for different parameter combinations + +**Reference:** `fluss-client/src/main/java/org/apache/fluss/client/ConnectionFactory.java` + +### 2.4 Interface Segregation Pattern + +Provide both generic and typed interface variants: + +```java +// Generic interface (works with raw types) +public interface Lookuper extends AutoCloseable { + CompletableFuture lookup(InternalRow key); +} + +// Typed interface (works with POJOs) +public interface TypedLookuper extends AutoCloseable { + CompletableFuture lookup(T key); +} + +// Same pattern for writers: +public interface AppendWriter extends AutoCloseable { + CompletableFuture append(InternalRow row); +} + +public interface TypedAppendWriter extends AutoCloseable { + CompletableFuture append(T record); +} +``` + +**Implementation:** Typed variant delegates to generic version internally. + +### 2.5 Result Objects Pattern + +Use immutable result objects for return values: + +```java +public final class AppendResult { + private final long offset; + private final long timestamp; + + public AppendResult(long offset, long timestamp) { + this.offset = offset; + this.timestamp = timestamp; + } + + public long getOffset() { return offset; } + public long getTimestamp() { return timestamp; } + + @Override + public boolean equals(Object o) { /* ... */ } + + @Override + public int hashCode() { /* ... */ } + + @Override + public String toString() { /* ... */ } +} +``` + +**Rules:** +- Mark class `final` +- Make all fields `private final` +- No setters (only getters) +- Initialize all fields in constructor +- Implement `equals()`, `hashCode()`, `toString()` + +### 2.6 Thread Safety Documentation + +Document thread safety using annotations: + +```java +@ThreadSafe +public class Connection { + // Multiple threads can safely use this connection +} + +@NotThreadSafe +public class Lookuper { + // Each thread should have its own Lookuper instance +} +``` + +**Javadoc example:** +```java +/** + * Connection to a Fluss cluster. + * + *

This class is thread-safe. A single connection can be shared + * across multiple threads. However, Table instances obtained from + * the connection are NOT thread-safe - each thread should create + * its own Table instance. + */ +@ThreadSafe +public interface Connection extends AutoCloseable { } +``` + +--- + +## 3. Code Organization + +### 3.1 Naming Conventions + +#### Interfaces: +No prefix or suffix - use plain descriptive names: +```java +Connection, Admin, Table, Lookuper, AppendWriter, LogScanner +``` + +#### Implementation Classes: +Suffix with `Impl` when directly implementing an interface: +```java +FlussConnection (implements Connection) +AdminImpl (implements Admin) +LogScannerImpl (implements LogScanner) +TypedAppendWriterImpl (implements TypedAppendWriter) +``` + +Alternative: Use descriptive implementation name: +```java +NettyClient, RocksDBKvStore, ZooKeeperCoordinator +``` + +#### Abstract Base Classes: +Prefix with `Abstract`: +```java +AbstractAutoCloseableRegistry +AbstractAuthorizer +AbstractGoal +AbstractIterator +``` + +#### Utility Classes: +Suffix with `Utils`, make constructor private, all methods static: +```java +ArrayUtils, MapUtils, CollectionUtils, StringUtils +BytesUtils, BinaryStringUtils +ExceptionUtils, TimeUtils, DateTimeUtils +NetUtils, IOUtils, FileUtils +MathUtils, MurmurHashUtils +``` + +#### Test Classes: +- Unit tests: `*Test.java` (e.g., `ConfigBuilderTest`, `KvWriteBatchTest`) +- Integration tests: `*ITCase.java` (e.g., `Flink120TableSourceITCase`, `ServerITCaseBase`) +- Base test classes: `*TestBase.java` or `*TestUtils.java` +- Test utilities: Prefix with `Testing` (e.g., `TestingRemoteLogStorage`, `TestingMetricGroups`) + +#### Exceptions: +Descriptive name + `Exception` suffix: +```java +TableNotExistException +DatabaseNotExistException +InvalidRecordException +StaleMetadataException +OutOfOrderSequenceException +``` + +### 3.2 Package Structure + +Standard package organization: + +``` +fluss-common/ # Common utilities, data types, configs + /annotation # API stability annotations + /config # Configuration framework + /exception # Exception hierarchy + /fs # Filesystem abstraction + /memory # Memory management + /metrics # Metrics definitions + /record # Record formats (log records, KV records) + /row # Row types and encoding + /types # Data types + /utils # Utility classes + +fluss-rpc/ # RPC layer + /gateway # RPC gateway interfaces + /messages # Protobuf messages + /netty # Netty-based implementation + +fluss-client/ # Java client SDK + /admin # Admin operations + /write # Write APIs + /lookup # Point lookup APIs + /table # Table abstraction + /scanner # Scanning/reading operations + /metadata # Metadata management + /token # Security token management + /connection # Connection management + +fluss-server/ # Server implementations + /coordinator # CoordinatorServer + /tablet # TabletServer + /log # LogStore implementation + /kv # KvStore implementation + /replica # Replication logic + /metadata # Metadata management + /zk # ZooKeeper integration + /metrics # Server metrics + /authorizer # Authorization +``` + +### 3.3 Class Member Organization + +**Field ordering:** +1. Static constants (`public static final`) +2. Static fields (`private static`) +3. Instance fields (`private` or `protected`) + +**Method ordering:** +1. Constructors +2. Static factory methods +3. Public methods +4. Package-private methods +5. Protected methods +6. Private methods +7. Static utility methods (at bottom) + +**Modifier order** (enforced by Checkstyle): +``` +public, protected, private, abstract, static, final, +transient, volatile, synchronized, native, strictfp +``` + +Example: +```java +public class Example { + // 1. Static constants + public static final int DEFAULT_SIZE = 100; + private static final Logger LOG = LoggerFactory.getLogger(Example.class); + + // 2. Static fields + private static int instanceCount = 0; + + // 3. Instance fields + private final String name; + private int value; + + // 4. Constructor + public Example(String name) { + this.name = name; + instanceCount++; + } + + // 5. Static factory method + public static Example create(String name) { + return new Example(name); + } + + // 6. Public methods + public void doSomething() { } + + // 7. Private methods + private void helperMethod() { } + + // 8. Static utility (at bottom) + private static String formatName(String name) { + return name.toUpperCase(); + } +} +``` + +### 3.4 Import Organization + +**Order** (enforced by Spotless): +1. `org.apache.fluss` imports +2. Blank line +3. Other imports (`javax`, `java`, third-party) +4. Blank line +5. Static imports (`\#`) + +**Example:** +```java +package org.apache.fluss.client; + +import org.apache.fluss.config.Configuration; +import org.apache.fluss.exception.FlussException; + +import javax.annotation.Nullable; + +import java.io.IOException; +import java.util.concurrent.CompletableFuture; + +import static org.apache.fluss.utils.Preconditions.checkNotNull; +import static org.apache.fluss.utils.Preconditions.checkState; +``` + +**Enforcement:** Run `./mvnw spotless:apply` to auto-format imports. + +--- + +## 4. Error Handling + +### 4.1 Exception Hierarchy + +``` +FlussException (base checked exception) +├── ApiException (client-facing errors) +│ ├── TableNotExistException +│ ├── DatabaseNotExistException +│ ├── InvalidOffsetException +│ ├── InvalidConfigException +│ └── ... +├── RetriableException (transient errors) +│ ├── NotEnoughReplicasException +│ ├── SchemaNotExistException +│ ├── TimeoutException +│ └── ... +├── InvalidMetadataException +│ └── StaleMetadataException +└── FlussRuntimeException (base unchecked exception) + ├── CorruptRecordException + ├── LogStorageException + └── ... +``` + +**When to extend:** +- `ApiException`: User-facing errors (invalid input, not found, etc.) +- `RetriableException`: Transient failures (retry may succeed) +- `FlussRuntimeException`: Programming errors or unrecoverable failures + +### 4.2 Input Validation Pattern + +Use Preconditions for input validation at API boundaries: + +```java +import static org.apache.fluss.utils.Preconditions.checkArgument; +import static org.apache.fluss.utils.Preconditions.checkNotNull; +import static org.apache.fluss.utils.Preconditions.checkState; + +public void createTable(String databaseName, TableDescriptor descriptor) { + // Null checks + checkNotNull(databaseName, "database name cannot be null"); + checkNotNull(descriptor, "table descriptor cannot be null"); + + // Argument validation + checkArgument(!databaseName.isEmpty(), "database name cannot be empty"); + checkArgument(descriptor.getBucketCount() > 0, + "bucket count must be positive, got: %s", descriptor.getBucketCount()); + + // State validation + checkState(isInitialized(), "connection not initialized"); + checkState(!isClosed(), "connection is closed"); +} +``` + +**Preconditions API:** +- `checkNotNull(T obj, String message, Object... args)` → throws `NullPointerException` +- `checkArgument(boolean condition, String message, Object... args)` → throws `IllegalArgumentException` +- `checkState(boolean condition, String message, Object... args)` → throws `IllegalStateException` + +**Message templates:** Use `%s` placeholders (not `String.format` style). + +### 4.3 Error Propagation + +#### For Async Operations: + +```java +public CompletableFuture append(InternalRow row) { + return CompletableFuture.supplyAsync(() -> { + try { + return doAppend(row); + } catch (IOException e) { + throw ExceptionUtils.wrapAsUnchecked(e); + } + }, executorService); +} +``` + +#### Composition with Error Handling: + +```java +return fetchMetadata(tablePath) + .thenCompose(metadata -> fetchData(metadata)) + .thenApply(data -> processData(data)) + .exceptionally(ex -> { + Throwable cause = ExceptionUtils.stripCompletionException(ex); + if (cause instanceof TableNotExistException) { + LOG.warn("Table not found: {}", tablePath); + return null; + } + throw ExceptionUtils.wrapAsUnchecked(cause); + }); +``` + +**Utilities:** +- `ExceptionUtils.wrapAsUnchecked(Exception e)`: Wrap checked exceptions +- `ExceptionUtils.stripCompletionException(Throwable t)`: Unwrap `CompletionException` + +--- + +## 5. Concurrency & Thread Safety + +### 5.1 Thread Safety Annotations + +Document synchronization requirements: + +```java +import javax.annotation.concurrent.GuardedBy; +import javax.annotation.concurrent.ThreadSafe; +import javax.annotation.concurrent.NotThreadSafe; + +@ThreadSafe +public class ServerConnection { + private final Object lock = new Object(); + + @GuardedBy("lock") + private volatile boolean connected; + + @GuardedBy("lock") + private Channel channel; + + public void connect() { + synchronized (lock) { + if (connected) { + return; + } + channel = createChannel(); + connected = true; + } + } + + public void disconnect() { + synchronized (lock) { + if (!connected) { + return; + } + channel.close(); + connected = false; + } + } +} +``` + +**Rules:** +- Mark thread-safe classes with `@ThreadSafe` +- Use `@GuardedBy("lockName")` to document protected fields +- Declare lock fields explicitly (avoid implicit `this` locks) +- Use `volatile` for fields accessed outside synchronized blocks + +**Reference:** `fluss-server/src/main/java/org/apache/fluss/server/log/LogTablet.java` + +### 5.2 CompletableFuture Patterns + +#### Async API Pattern: + +```java +public CompletableFuture lookup(InternalRow key) { + return CompletableFuture.supplyAsync(() -> { + // Perform async lookup operation + return performLookup(key); + }, executorService); +} +``` + +#### Composition: + +```java +CompletableFuture tableInfoFuture = fetchTableInfo(tablePath); + +return tableInfoFuture + .thenCompose(tableInfo -> fetchSchema(tableInfo.getSchemaId())) + .thenCombine(fetchPartitions(tablePath), (schema, partitions) -> { + return new TableMetadata(schema, partitions); + }) + .thenApply(metadata -> processMetadata(metadata)) + .exceptionally(ex -> handleError(ex)); +``` + +#### Waiting for Multiple Futures: + +```java +import org.apache.fluss.utils.concurrent.FutureUtils; + +List> futures = new ArrayList<>(); +futures.add(future1); +futures.add(future2); +futures.add(future3); + +CompletableFuture allFutures = FutureUtils.completeAll(futures); +allFutures.get(); // Wait for all +``` + +**Utilities:** +- `FutureUtils.completeAll(Collection>)`: Wait for all futures +- `FutureUtils.orTimeout(CompletableFuture, Duration)`: Add timeout + +### 5.3 Map Creation Rules + +**NEVER instantiate ConcurrentHashMap directly** (enforced by Checkstyle): + +```java +// ❌ DON'T: +Map map = new ConcurrentHashMap<>(); + +// ✅ DO: +import org.apache.fluss.utils.MapUtils; +Map map = MapUtils.newConcurrentMap(); +``` + +**Other MapUtils methods:** +- `MapUtils.newHashMap()`: Create HashMap with default capacity +- `MapUtils.newHashMap(int capacity)`: Create HashMap with specified capacity +- `MapUtils.newLinkedHashMap()`: Create LinkedHashMap + +### 5.4 Resource Management + +Use try-with-resources for AutoCloseable resources: + +```java +// ✅ Single resource: +try (Connection connection = ConnectionFactory.createConnection(conf)) { + Admin admin = connection.getAdmin(); + admin.createDatabase("my_db", DatabaseDescriptor.EMPTY, false).get(); +} // Auto-closed + +// ✅ Multiple resources: +try (Connection connection = ConnectionFactory.createConnection(conf); + Admin admin = connection.getAdmin(); + Table table = connection.getTable(tablePath)) { + // Use resources +} // All auto-closed in reverse order +``` + +**Implement AutoCloseable:** +```java +public class MyResource implements AutoCloseable { + private volatile boolean closed = false; + + @Override + public void close() throws IOException { + if (closed) { + return; + } + try { + // Release resources + cleanupResources(); + } finally { + closed = true; + } + } +} +``` + +--- + +## 6. Testing Standards + +### 6.1 Test Framework + +- **JUnit 5 (Jupiter)** for all tests +- **AssertJ** for assertions (NOT JUnit assertions) +- **Mockito** for mocking (use sparingly - prefer custom test doubles) + +### 6.2 Test Naming + +Use descriptive test names that explain what is being tested: + +```java +@Test +void testAppendWithValidData() { } + +@Test +void testLookupThrowsExceptionWhenTableNotExist() { } + +@Test +void testTryAppendWithWriteLimit() { } + +@Test +void testHandleHybridSnapshotLogSplitChangesAndFetch() { } +``` + +**Test class naming:** +- Unit tests: `*Test.java` +- Integration tests: `*ITCase.java` + +### 6.3 Assertion Patterns + +**ALWAYS use AssertJ** (Checkstyle enforces this): + +```java +import static org.assertj.core.api.Assertions.assertThat; +import static org.assertj.core.api.Assertions.assertThatThrownBy; + +@Test +void testValidation() { + // Basic assertions + assertThat(result).isNotNull(); + assertThat(result.getOffset()).isEqualTo(100L); + assertThat(result.isComplete()).isTrue(); + + // Collection assertions + assertThat(list).hasSize(3); + assertThat(list).contains("a", "b", "c"); + assertThat(list).containsExactly("a", "b", "c"); // Order matters + assertThat(map).containsEntry("key", "value"); + + // Exception assertions + assertThatThrownBy(() -> service.lookup(null)) + .isInstanceOf(NullPointerException.class) + .hasMessageContaining("cannot be null"); + + // Future assertions (custom FlussAssertions) + assertThatFuture(completableFuture) + .eventuallySucceeds() + .isEqualTo(expectedValue); +} +``` + +**Never use JUnit assertions:** +```java +// ❌ DON'T: +import org.junit.jupiter.api.Assertions; +Assertions.assertEquals(expected, actual); +Assertions.assertTrue(condition); +``` + +### 6.4 Test Base Classes + +Use test base classes for common setup: + +```java +// For Flink integration tests +class MyFlinkTest extends FlinkTestBase { + // Provides: FLUSS_CLUSTER_EXTENSION, connection, admin + // Helper methods: createTable(), writeRows(), etc. + + @Test + void testFlinkSource() throws Exception { + TablePath tablePath = TablePath.of("test_db", "test_table"); + createTable(tablePath, DEFAULT_PK_TABLE_DESCRIPTOR); + // Test implementation + } +} + +// For server tests +class MyServerTest extends ServerTestBase { + // Provides: ZooKeeper setup, server configuration +} + +// For lake tiering tests +class MyLakeTest extends FlinkPaimonTieringTestBase { + // Provides: Paimon lake integration setup +} +``` + +**Common base classes:** +- `FlinkTestBase`: Flink cluster + Fluss cluster +- `ServerTestBase`: Coordinator/Tablet server setup +- `FlinkTieringTestBase`: Lake tiering infrastructure +- `LogTestBase`, `KvTestBase`: Record format testing + +**Reference:** `fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlinkTestBase.java` + +### 6.5 Test Extensions + +Use JUnit 5 extensions for complex setup: + +```java +@RegisterExtension +public static final AllCallbackWrapper ZOO_KEEPER_EXTENSION = + new AllCallbackWrapper<>(new ZooKeeperExtension()); + +@RegisterExtension +public static final FlussClusterExtension FLUSS_CLUSTER_EXTENSION = + FlussClusterExtension.builder() + .setNumOfTabletServers(3) + .setClusterConf(new Configuration() + .set(ConfigOptions.KV_MAX_RETAINED_SNAPSHOTS, Integer.MAX_VALUE)) + .build(); + +@Test +void testWithCluster() { + // Extensions handle cluster start/stop +} +``` + +**Available extensions:** +- `FlussClusterExtension`: Embedded Fluss cluster +- `ZooKeeperExtension`: Embedded ZooKeeper server +- `ParameterizedTestExtension`: Custom parameterized testing + +### 6.6 Parameterized Tests + +```java +@ParameterizedTest +@ValueSource(ints = {1, 2, 4, 8}) +void testWithDifferentBucketCounts(int bucketCount) { + // Test logic using bucketCount +} + +@ParameterizedTest +@CsvSource({ + "LOG_TABLE, false", + "PRIMARY_KEY_TABLE, true" +}) +void testTableTypes(String tableType, boolean hasPrimaryKey) { + // Test logic +} + +@ParameterizedTest +@MethodSource("provideTableConfigs") +void testWithCustomProvider(TableConfig config) { + // Test logic +} + +private static Stream provideTableConfigs() { + return Stream.of( + new TableConfig("log", false), + new TableConfig("pk", true) + ); +} +``` + +### 6.7 Test Utilities + +**CommonTestUtils:** +```java +import org.apache.fluss.testutils.common.CommonTestUtils; + +// Wait for condition with timeout +CommonTestUtils.waitUntil( + () -> server.isRunning(), + Duration.ofSeconds(30), + "Server failed to start" +); + +// Wait for value to become available +Optional result = CommonTestUtils.waitValue( + () -> fetchValue(), + Duration.ofSeconds(10), + "Value not found" +); + +// Retry assertion with backoff +CommonTestUtils.retry( + Duration.ofSeconds(5), + () -> assertThat(getValue()).isEqualTo(expected) +); +``` + +**FlussAssertions:** +```java +import org.apache.fluss.testutils.common.FlussAssertions; + +// Custom future assertions +assertThatFuture(completableFuture) + .eventuallySucceeds() + .isEqualTo(expectedValue); + +// Exception chain matching +FlussAssertions.assertThatChainOfCauses(exception) + .anyCauseMatches(TableNotExistException.class, "table_name"); +``` + +### 6.8 Parallel Test Execution + +**Opt-in to parallelism** (tests are sequential by default): + +```java +import org.junit.jupiter.api.parallel.Execution; +import org.junit.jupiter.api.parallel.ExecutionMode; + +@Execution(ExecutionMode.CONCURRENT) +class MyParallelTest { + // Tests in this class can run in parallel + + @Test + void test1() { } + + @Test + void test2() { } +} +``` + +**Only use for truly independent tests** - most tests should remain sequential. + +--- + +## 7. Dependencies & Shading + +### 7.1 Shaded Dependencies + +**ALWAYS use shaded versions** to prevent dependency conflicts: + +| Instead of | Use | +|------------|-----| +| `com.google.common.*` | `org.apache.fluss.shaded.guava.*` | +| `com.fasterxml.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | +| `org.codehaus.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | +| `io.netty.*` | `org.apache.fluss.shaded.netty4.*` | +| `org.apache.arrow.*` | `org.apache.fluss.shaded.arrow.*` | +| `org.apache.zookeeper.*` | `org.apache.fluss.shaded.zookeeper38.*` | + +**Example:** +```java +// ❌ DON'T: +import com.google.common.collect.ImmutableList; +import com.fasterxml.jackson.databind.ObjectMapper; + +// ✅ DO: +import org.apache.fluss.shaded.guava32.com.google.common.collect.ImmutableList; +import org.apache.fluss.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper; +``` + +### 7.2 Utility Class Substitutions + +Use Fluss utilities instead of external libraries: + +| Instead of | Use | +|------------|-----| +| Guava `Preconditions` | `org.apache.fluss.utils.Preconditions` | +| Guava `VisibleForTesting` | `org.apache.fluss.annotation.VisibleForTesting` | +| Commons `Validate` | `org.apache.fluss.utils.Preconditions` | +| Commons `SerializationUtils` | `org.apache.fluss.utils.InstantiationUtil` | + +**Available Fluss utilities:** +- `ArrayUtils`, `MapUtils`, `CollectionUtils`: Collection operations +- `BytesUtils`, `BinaryStringUtils`: Byte/string manipulation +- `ExceptionUtils`: Error handling utilities +- `FutureUtils`: CompletableFuture utilities +- `TimeUtils`, `DateTimeUtils`: Time handling +- `NetUtils`, `IOUtils`, `FileUtils`: I/O operations +- `MathUtils`, `MurmurHashUtils`: Math operations + +### 7.3 Module Dependencies + +**Dependency hierarchy:** + +``` +fluss-common (foundation) + ↑ +fluss-rpc + ↑ +fluss-client ← fluss-server (peer modules) + ↑ ↑ + | | + +----+------+ + | + connectors (flink, spark, kafka) + | + lake integrations (iceberg, paimon, lance) +``` + +**Rules:** +- `fluss-common` depends only on: JDK + shaded libs (guava, jackson, arrow, etc.) +- `fluss-client` **cannot** depend on `fluss-server` +- `fluss-server` **cannot** depend on connectors +- Connectors **can** depend on `fluss-client` +- Define interfaces in lower-level modules, implementations in higher-level modules + +--- + +## 8. Configuration Patterns + +### 8.1 ConfigOption Definition + +Use `ConfigBuilder` to define configuration options: + +```java +import org.apache.fluss.config.ConfigBuilder; +import org.apache.fluss.config.ConfigOption; + +// String option with default +public static final ConfigOption BOOTSTRAP_SERVERS = + ConfigBuilder.key("bootstrap.servers") + .stringType() + .defaultValue("localhost:9092") + .withDescription("Comma-separated list of Fluss server addresses"); + +// Integer option +public static final ConfigOption CLIENT_TIMEOUT_MS = + ConfigBuilder.key("client.request.timeout.ms") + .intType() + .defaultValue(30000) + .withDescription("Request timeout in milliseconds"); + +// Duration option +public static final ConfigOption SESSION_TIMEOUT = + ConfigBuilder.key("client.session.timeout") + .durationType() + .defaultValue(Duration.ofSeconds(60)) + .withDescription("Client session timeout duration"); + +// MemorySize option +public static final ConfigOption BUFFER_SIZE = + ConfigBuilder.key("client.buffer.size") + .memoryType() + .defaultValue(MemorySize.ofMebiBytes(32)) + .withDescription("Buffer size for network operations"); + +// List option +public static final ConfigOption> HOSTS = + ConfigBuilder.key("client.hosts") + .stringType() + .asList() + .defaultValues("localhost", "127.0.0.1") + .withDescription("List of host addresses to connect to"); + +// No default value +public static final ConfigOption USERNAME = + ConfigBuilder.key("user.name") + .stringType() + .noDefaultValue() + .withDescription("Username for authentication"); + +// Password option (sensitive) +public static final ConfigOption PASSWORD = + ConfigBuilder.key("user.password") + .passwordType() + .noDefaultValue() + .withDescription("Password for authentication (stored securely)"); + +// Deprecated keys support +public static final ConfigOption THRESHOLD = + ConfigBuilder.key("cpu.utilization.threshold") + .doubleType() + .defaultValue(0.9) + .withDeprecatedKeys("cpu.threshold", "old.cpu.threshold") + .withDescription("CPU utilization threshold"); +``` + +**Reference:** `fluss-common/src/main/java/org/apache/fluss/config/ConfigBuilder.java` + +### 8.2 Configuration Usage + +```java +Configuration conf = new Configuration(); + +// Set values +conf.setString("bootstrap.servers", "server1:9092,server2:9092"); +conf.setInteger(CLIENT_TIMEOUT_MS, 60000); + +// Get values with defaults +String servers = conf.getString(BOOTSTRAP_SERVERS); +int timeout = conf.getInteger(CLIENT_TIMEOUT_MS); +Duration sessionTimeout = conf.get(SESSION_TIMEOUT); + +// Get optional values +Optional username = conf.getOptional(USERNAME); +if (username.isPresent()) { + // Use username +} +``` + +### 8.3 Configuration Naming Convention + +Use hierarchical dot-separated keys with hyphens for compound words: + +``` +{category}.{subcategory}.{option-name} + +Examples: +- remote.data.dir +- remote.fs.write-buffer-size +- client.request.timeout.ms +- default.bucket.number +- kv.snapshot.interval +``` + +--- + +## 9. Serialization & RPC + +### 9.1 Protocol Buffers + +**Proto file conventions:** + +```protobuf +syntax = "proto2"; // Currently proto2, migrating to proto3 + +package fluss.rpc.messages; + +option java_package = "org.apache.fluss.rpc.messages"; +option java_outer_classname = "FlussApiProtos"; +option optimize_for = LITE_RUNTIME; // Smaller code size + +message GetTableSchemaRequest { + required PbTablePath table_path = 1; + optional int32 schema_id = 2; +} + +message GetTableSchemaResponse { + required PbSchema schema = 1; + optional string error_message = 2; +} +``` + +**Rules for proto3 migration:** +- **DO NOT** use `default` keyword (prepare for proto3) +- **DO NOT** use `enum` type for now (will be supported in proto3) +- Use `required` for mandatory fields (proto2 only) +- Use `optional` for optional fields +- Use `repeated` for lists + +### 9.2 Regenerating Protobuf Code + +After modifying `.proto` files, regenerate Java code: + +```bash +./mvnw clean install -DskipTests -pl fluss-protogen,fluss-rpc +``` + +**Reference:** `fluss-rpc/src/main/proto/FlussApi.proto` + +### 9.3 RPC Message Pattern + +```java +@Internal +public class GetTableSchemaRequest { + private final TablePath tablePath; + @Nullable + private final Integer schemaId; + + public GetTableSchemaRequest(TablePath tablePath, @Nullable Integer schemaId) { + this.tablePath = checkNotNull(tablePath); + this.schemaId = schemaId; + } + + public TablePath getTablePath() { return tablePath; } + public Optional getSchemaId() { + return Optional.ofNullable(schemaId); + } +} + +@Internal +public class GetTableSchemaResponse { + private final Schema schema; + + public GetTableSchemaResponse(Schema schema) { + this.schema = checkNotNull(schema); + } + + public Schema getSchema() { return schema; } +} +``` + +**Rules:** +- Mark RPC messages as `@Internal` +- Use immutable pattern (final fields, no setters) +- Provide clear constructors and getters +- Use `@Nullable` for optional fields + +--- + +## 10. Module Boundaries + +### 10.1 Core Modules + +**fluss-common**: Foundation layer +- **Depends on:** JDK, shaded libraries only +- **Used by:** All other modules +- **Contains:** Utilities, data types, configurations, exceptions + +**fluss-rpc**: RPC communication layer +- **Depends on:** `fluss-common` +- **Used by:** `fluss-client`, `fluss-server` +- **Contains:** RPC gateways, protobuf messages, Netty implementation + +**fluss-client**: Java client SDK +- **Depends on:** `fluss-common`, `fluss-rpc` +- **Cannot depend on:** `fluss-server` (strict boundary) +- **Used by:** Applications, connectors +- **Contains:** Connection, Admin, Table, Writers, Scanners, Lookupers + +**fluss-server**: Server implementations +- **Depends on:** `fluss-common`, `fluss-rpc` +- **Cannot depend on:** `fluss-client`, connectors +- **Contains:** CoordinatorServer, TabletServer, LogStore, KvStore + +### 10.2 Connector Modules + +**fluss-flink**: Apache Flink connector +- **Structure:** + - `fluss-flink-common`: Common Flink connector code + - `fluss-flink-tiering`: Lake tiering service + - `fluss-flink-1.18`, `1.19`, `1.20`, `2.2`: Version-specific implementations +- **Depends on:** `fluss-client`, specific Flink version +- **Shading:** Each version shades Flink dependencies + +**fluss-spark**: Apache Spark connector +- **Structure:** + - `fluss-spark-common`: Common Spark connector code + - `fluss-spark-ut`: Unit test utilities + - `fluss-spark-3.4`, `3.5`: Version-specific implementations +- **Depends on:** `fluss-client`, specific Spark version + +**fluss-kafka**: Kafka compatibility layer +- **Provides:** Kafka-compatible API surface +- **Depends on:** `fluss-client` + +### 10.3 Lake Integration Modules + +**fluss-lake**: Base lake integration +- Sub-modules: + - `fluss-lake-iceberg`: Apache Iceberg format + - `fluss-lake-paimon`: Apache Paimon format + - `fluss-lake-lance`: Lance columnar format + +### 10.4 Cross-Module Communication + +**Rules:** +1. Use **interfaces** (not implementations) for cross-module APIs +2. Define interfaces in **lower-level modules** +3. Implementations can be in **higher-level modules** +4. **Never** create circular dependencies + +**Example:** +```java +// In fluss-common: +public interface FileSystem { + InputStream openFile(Path path) throws IOException; + OutputStream createFile(Path path) throws IOException; +} + +// In fluss-filesystems/fluss-fs-hdfs: +public class HdfsFileSystem implements FileSystem { + // HDFS-specific implementation +} +``` + +--- + +## 11. Build & CI + +### 11.1 Build Commands + +**Full build (skip tests):** +```bash +./mvnw clean install -DskipTests +``` + +**Parallel build (faster):** +```bash +./mvnw clean install -DskipTests -T 1C +``` + +**Build and run all tests:** +```bash +./mvnw clean verify +``` + +**Test specific module:** +```bash +./mvnw verify -pl fluss-server +./mvnw verify -pl fluss-client +``` + +**Test by stage (as in CI):** +```bash +# Core tests (excludes Flink, Spark, Lake) +./mvnw verify -pl '!fluss-flink/**,!fluss-spark/**,!fluss-lake/**' + +# Flink tests +./mvnw verify -pl fluss-flink/** + +# Spark tests +./mvnw verify -Pspark3 -pl fluss-spark/** + +# Lake tests +./mvnw verify -pl fluss-lake/** +``` + +**Single test class:** +```bash +./mvnw test -Dtest=ConfigBuilderTest -pl fluss-common +``` + +**Single test method:** +```bash +./mvnw test -Dtest=ConfigBuilderTest#testStringType -pl fluss-common +``` + +### 11.2 Code Formatting + +**Auto-format before committing:** +```bash +./mvnw spotless:apply +``` + +**Check formatting without changing files:** +```bash +./mvnw spotless:check +``` + +**Format style:** +- Java: google-java-format (AOSP style) +- Scala: scalafmt (v3.10.2) +- Config: `.scalafmt.conf` in repository root + +**IntelliJ IDEA plugin:** +- Install: google-java-format v1.7.0.6 +- **DO NOT update** - newer versions have incompatibilities + +**Reference:** `tools/maven/checkstyle.xml` + +### 11.3 CI Pipeline + +**GitHub Actions workflow** runs tests in four parallel stages: + +1. **compile-on-jdk8**: Compile with Java 8 for compatibility +2. **core**: Test all modules except Flink, Spark, Lake +3. **flink**: Test Flink connector (all versions) +4. **spark3**: Test Spark connector +5. **lake**: Test Lake integrations + legacy Flink (1.18, 1.19) + +**Java compatibility:** +- **Build on:** Java 11 (required) +- **Runtime:** Java 8 compatible (validated with `-Pjava8` profile) +- **CI validates:** Both Java 8 compile and Java 11 tests + +**Build command used in CI:** +```bash +# Compile +mvn -T 1C -B clean install -DskipTests + +# Test with coverage +mvn -T 1C verify -Ptest-coverage +``` + +**Reference:** `.github/workflows/ci.yaml` + +### 11.4 Test Coverage + +**Generate coverage report:** +```bash +./mvnw verify -Ptest-coverage +``` + +**View report:** +Open `fluss-test-coverage/target/site/jacoco-aggregate/index.html` in browser. + +**Coverage tool:** JaCoCo (aggregated across all modules) + +### 11.5 License Headers + +All files must have Apache 2.0 license header (enforced by Apache RAT plugin): + +```java +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +``` + +**Enforcement:** Run `./mvnw validate` to check license headers. + +--- + +## Additional Resources + +### Key Source Files for Reference + +1. **Checkstyle configuration:** `tools/maven/checkstyle.xml` + - All MUST/NEVER rules and style enforcement + +2. **Builder pattern:** `fluss-common/src/main/java/org/apache/fluss/config/ConfigBuilder.java` + - Canonical fluent API design + +3. **Factory pattern:** `fluss-client/src/main/java/org/apache/fluss/client/ConnectionFactory.java` + - Static factory methods with private constructor + +4. **Preconditions:** `fluss-common/src/main/java/org/apache/fluss/utils/Preconditions.java` + - Input validation utility (must be imported statically) + +5. **Test base:** `fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlinkTestBase.java` + - Common test setup patterns + +6. **CI configuration:** `.github/workflows/ci.yaml` + - Test stages and build commands + +### Development Checklist + +Before submitting a PR: + +- [ ] Run `./mvnw spotless:apply` to format code +- [ ] Run `./mvnw clean verify` to ensure all tests pass +- [ ] Check that no forbidden imports are used +- [ ] Verify all public APIs have Javadoc +- [ ] Ensure tests use AssertJ (not JUnit assertions) +- [ ] Verify Preconditions are imported statically +- [ ] Check that ConcurrentHashMap is not directly instantiated +- [ ] Ensure file length < 3000 lines +- [ ] Remove trailing whitespace +- [ ] Follow naming conventions (Impl suffix, Abstract prefix, etc.) + +--- + +## Document Information + +**Last Updated:** 2026-03-24 +**Fluss Version:** 0.10-SNAPSHOT +**Generated-by:** AI-assisted analysis of Apache Fluss codebase using Claude Code + +**Content Verification:** +All rules, patterns, and code examples in this document are derived from and verifiable against the Apache Fluss source code. The patterns are not generated—they are documented observations of existing conventions enforced by Checkstyle, Spotless, and established through the project's codebase. + +**Contributing:** +For questions, corrections, or suggestions about this guide, please open an issue at https://github.com/apache/fluss/issues + +**License:** +This document is licensed under the Apache License 2.0, consistent with the Apache Fluss project. From 460f790e3e75ef5555f4991925033acbd0b40907 Mon Sep 17 00:00:00 2001 From: vaibhav kumar Date: Thu, 26 Mar 2026 16:09:23 +0530 Subject: [PATCH 2/5] Updating AGENTS.md --- .gitignore | 3 +- AGENTS.md | 1722 +++++++--------------------------------------------- 2 files changed, 218 insertions(+), 1507 deletions(-) diff --git a/.gitignore b/.gitignore index 1fe4b030fe..fd06d2827a 100644 --- a/.gitignore +++ b/.gitignore @@ -24,7 +24,6 @@ dependency-reduced-pom.xml ### claude code ### .claude/ -CLAUDE.md ### Mac OS ### .DS_Store @@ -45,4 +44,4 @@ website/package-lock.json website/versioned_docs website/versioned_sidebars website/versions.json -website/pnpm-lock.yaml +website/pnpm-lock.yaml \ No newline at end of file diff --git a/AGENTS.md b/AGENTS.md index 37e92c7c35..6eeeb65126 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -18,1712 +18,424 @@ # Apache Fluss - AI Agent Coding Guide -This document provides AI coding agents with comprehensive project-specific context for Apache Fluss, including critical conventions, architectural patterns, and quality standards. - -## About This Document - -This documentation was created by systematically analyzing the Apache Fluss codebase, including: -- Checkstyle configuration (`tools/maven/checkstyle.xml`) -- Existing code patterns and conventions -- Build configurations and CI pipelines -- Test frameworks and utilities - -All rules, patterns, and examples are derived from the actual project source code. The structure and explanations were organized with AI assistance (Claude Code) to make the content accessible to AI coding agents. - -**Content Sources:** -- Critical rules: Extracted from Checkstyle enforcement rules -- Code patterns: Identified from production code in `fluss-client`, `fluss-server`, `fluss-common` -- Testing patterns: Documented from existing test classes and test utilities -- All code examples: Real examples from the Apache Fluss codebase - -## Table of Contents - -1. [Critical Rules (MUST/NEVER)](#1-critical-rules-mustnever) -2. [API Design Patterns](#2-api-design-patterns) -3. [Code Organization](#3-code-organization) -4. [Error Handling](#4-error-handling) -5. [Concurrency & Thread Safety](#5-concurrency--thread-safety) -6. [Testing Standards](#6-testing-standards) -7. [Dependencies & Shading](#7-dependencies--shading) -8. [Configuration Patterns](#8-configuration-patterns) -9. [Serialization & RPC](#9-serialization--rpc) -10. [Module Boundaries](#10-module-boundaries) -11. [Build & CI](#11-build--ci) +AI coding guide for Apache Fluss with critical rules, patterns, and standards derived from codebase analysis and Checkstyle enforcement. ---- - -## 1. Critical Rules (MUST/NEVER) - -These rules catch 80% of common violations. **Enforced by Checkstyle** - violations will fail CI. - -### 1.1 Dependency Management (MANDATORY) - -#### NEVER Use These Imports: - -```java -// ❌ DON'T: -import com.google.common.* // Use fluss-shaded-guava -import com.fasterxml.jackson.* // Use fluss-shaded-jackson2 -import org.codehaus.jackson.* // Use fluss-shaded-jackson2 -import com.google.common.base.Preconditions; // Use Fluss Preconditions -import com.google.common.annotations.VisibleForTesting; // Use Fluss annotation -import org.apache.commons.lang.*; // Use commons-lang3 -import org.apache.commons.lang3.Validate; // Use Fluss Preconditions -import org.apache.commons.lang3.SerializationUtils; // Use Fluss InstantiationUtil -``` - -#### ALWAYS Use Shaded Versions: - -| Forbidden | Required | -|-----------|----------| -| `com.google.common.*` | `org.apache.fluss.shaded.guava.*` | -| `com.fasterxml.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | -| `org.codehaus.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | -| `io.netty.*` | `org.apache.fluss.shaded.netty4.*` | -| `org.apache.arrow.*` | `org.apache.fluss.shaded.arrow.*` | -| `org.apache.zookeeper.*` | `org.apache.fluss.shaded.zookeeper38.*` | - -**Rationale:** Prevents dependency conflicts with user applications. - -### 1.2 Utility Classes (MANDATORY) - -#### NEVER Instantiate ConcurrentHashMap: - -```java -// ❌ DON'T: -Map map = new ConcurrentHashMap<>(); - -// ✅ DO: -import org.apache.fluss.utils.MapUtils; -Map map = MapUtils.newConcurrentMap(); -``` - -**Rationale:** See https://github.com/apache/fluss/issues/375 - -#### ALWAYS Use Fluss Preconditions: - -```java -// ❌ DON'T: -import com.google.common.base.Preconditions; -import org.apache.commons.lang3.Validate; - -// ✅ DO: -import static org.apache.fluss.utils.Preconditions.checkNotNull; -import static org.apache.fluss.utils.Preconditions.checkArgument; -import static org.apache.fluss.utils.Preconditions.checkState; - -public void process(String input) { - checkNotNull(input, "input cannot be null"); - checkArgument(!input.isEmpty(), "input cannot be empty"); -} -``` +**Purpose:** This guide helps AI coding agents contribute to Apache Fluss by providing project-specific conventions, architectural patterns, and quality standards. It covers both code contribution (Sections 1-10) and deployment/setup guidance (Section 11). -**MUST import statically** - non-static imports will fail Checkstyle. +**Sections:** 1. Critical Rules | 2. API Patterns | 3. Code Organization | 4. Error Handling | 5. Concurrency | 6. Testing | 7. Dependencies | 8. Configuration | 9. Serialization/RPC | 10. Module Boundaries | 11. Build & CI | 12. Git & Pull Requests | 13. AI Agent Boundaries -**Reference:** `fluss-common/src/main/java/org/apache/fluss/utils/Preconditions.java` - -#### ALWAYS Use Fluss VisibleForTesting: - -```java -// ❌ DON'T: -import com.google.common.annotations.VisibleForTesting; - -// ✅ DO: -import org.apache.fluss.annotation.VisibleForTesting; +--- -@VisibleForTesting -void helperMethod() { } -``` +## 1. Critical Rules (MUST/NEVER) -### 1.3 Banned Methods (MANDATORY) +**Enforced by Checkstyle** - violations will fail CI. -#### NEVER Use These Methods: +### Dependencies & Utilities +**FORBIDDEN imports** (use shaded versions - see Section 7): ```java -// ❌ DON'T - These read system properties incorrectly: -Boolean.getBoolean("property.name") -Integer.getInteger("property.name") -Long.getLong("property.name") - -// ✅ DO - Read system properties correctly: -Boolean.parseBoolean(System.getProperty("property.name")) -Integer.parseInt(System.getProperty("property.name")) -Long.parseLong(System.getProperty("property.name")) +import com.google.common.* // → org.apache.fluss.shaded.guava.* +import com.fasterxml.jackson.*, org.codehaus.jackson.* // → org.apache.fluss.shaded.jackson2.* +import io.netty.* // → org.apache.fluss.shaded.netty4.* +import org.apache.arrow.* // → org.apache.fluss.shaded.arrow.* +import org.apache.zookeeper.* // → org.apache.fluss.shaded.zookeeper38.* ``` -#### NEVER Use SerializationUtils: - +**MANDATORY utility substitutions:** ```java -// ❌ DON'T: -import org.apache.commons.lang3.SerializationUtils; - -// ✅ DO: -import org.apache.fluss.utils.InstantiationUtil; +// ❌ new ConcurrentHashMap<>() → ✅ MapUtils.newConcurrentMap() (see https://github.com/apache/fluss/issues/375) +// ❌ com.google.common.base.Preconditions → ✅ org.apache.fluss.utils.Preconditions (import statically) +// ❌ com.google.common.annotations.VisibleForTesting → ✅ org.apache.fluss.annotation.VisibleForTesting +// ❌ org.apache.commons.lang3.SerializationUtils → ✅ org.apache.fluss.utils.InstantiationUtil +// ❌ Boolean.getBoolean("prop") → ✅ Boolean.parseBoolean(System.getProperty("prop")) ``` -### 1.4 Testing Assertions (MANDATORY) - -#### ALWAYS Use AssertJ (NOT JUnit Assertions): +### Testing +**MANDATORY: Use AssertJ, NOT JUnit assertions:** ```java -// ❌ DON'T: -import org.junit.jupiter.api.Assertions; -Assertions.assertEquals(expected, actual); -Assertions.assertTrue(condition); -Assertions.assertThrows(Exception.class, () -> doSomething()); +// ❌ Assertions.assertEquals(expected, actual) +// ✅ assertThat(actual).isEqualTo(expected) -// ✅ DO: import static org.assertj.core.api.Assertions.assertThat; import static org.assertj.core.api.Assertions.assertThatThrownBy; -assertThat(actual).isEqualTo(expected); -assertThat(condition).isTrue(); -assertThat(list).hasSize(3).contains("a", "b", "c"); -assertThatThrownBy(() -> doSomething()) - .isInstanceOf(IllegalArgumentException.class) - .hasMessageContaining("expected message"); -``` - -**Rationale:** AssertJ provides more readable and expressive assertions. - -**Plugin:** Use "Assertions2Assertj" IntelliJ plugin to convert existing JUnit assertions. - -#### NEVER Set @Timeout on Tests: - -```java -// ❌ DON'T: -import org.junit.jupiter.api.Timeout; -@Timeout(5) -@Test -void testSomething() { } - -// ✅ DO: -// Rely on global timeout instead -@Test -void testSomething() { } -``` - -**Rationale:** Per-test timeouts make debugging harder. Use global timeout configuration. - -### 1.5 Code Style (MANDATORY) - -#### NEVER Use Star Imports: - -```java -// ❌ DON'T: -import java.util.*; -import org.apache.fluss.client.*; - -// ✅ DO: -import java.util.List; -import java.util.Map; -import org.apache.fluss.client.Connection; -import org.apache.fluss.client.Admin; -``` - -**Enforcement:** Set import threshold to 9999 in IDE settings. - -#### NEVER Have Trailing Whitespace: - -Run `./mvnw spotless:apply` before committing to auto-fix. - -#### ALWAYS Use Java-Style Array Declarations: - -```java -// ❌ DON'T: -String args[] -int values[] - -// ✅ DO: -String[] args -int[] values -``` - -#### ALWAYS Require Braces: - -```java -// ❌ DON'T: -if (condition) doSomething(); - -// ✅ DO: -if (condition) { - doSomething(); -} +assertThat(list).hasSize(3).contains("a", "b"); +assertThatThrownBy(() -> doSomething()).isInstanceOf(IllegalArgumentException.class); ``` -### 1.6 Comments & Documentation (MANDATORY) - -#### NEVER Use TODOs with Usernames: - -```java -// ❌ DON'T: -// TODO(username): Fix this -// TODO(john.doe) Implement feature - -// ✅ DO: -// TODO: Fix this -// TODO Implement feature -``` - -#### NEVER Use These Comment Tags: - -```java -// ❌ DON'T: -// FIXME: Should be refactored -// XXX: Hack -// @author John Doe - -// ✅ DO: -// TODO: Should be refactored -// (No XXX or @author tags - use git history instead) -``` - -### 1.7 File Size Limits (MANDATORY) - -**Maximum file length:** 3000 lines - -If a file exceeds this limit, split it into multiple classes. +**NEVER use @Timeout on tests** - rely on global timeout -### 1.8 Javadoc Requirements (MANDATORY) +### Code Style -#### ALWAYS Document Protected and Public APIs: +- **NEVER use star imports** (`import java.util.*;`) - set IDE threshold to 9999 +- **NEVER have trailing whitespace** - run `./mvnw spotless:apply` +- **ALWAYS use Java-style arrays:** `String[] args` NOT `String args[]` +- **ALWAYS require braces:** `if (x) { doIt(); }` NOT `if (x) doIt();` +- **NEVER use TODO(username)** - use `TODO:` without username +- **NEVER use FIXME, XXX, or @author tags** - use git history -```java -// ✅ DO: -/** - * Creates a new connection to the Fluss cluster. - * - * @param configuration The configuration for the connection - * @return A new connection instance - * @throws FlussException if connection fails - */ -public static Connection createConnection(Configuration configuration) - throws FlussException { - // implementation -} -``` +### Documentation -**Required:** All protected and public classes, interfaces, enums, and methods must have Javadoc. +- **File size limit:** 3000 lines max +- **Javadoc REQUIRED:** All protected/public classes, interfaces, enums, methods --- ## 2. API Design Patterns -### 2.1 API Stability Annotations - -Use annotations to signal API stability guarantees to users: +### API Stability Annotations ```java @PublicStable // Stable - breaking changes only in major versions -@PublicEvolving // May change in minor versions (new features being stabilized) +@PublicEvolving // May change in minor versions @Internal // Not public API - can change anytime -@VisibleForTesting // Exposed only for testing -``` - -**Rules:** -- **ALWAYS annotate** classes/interfaces in public packages (`fluss-client`, `fluss-common` public APIs) -- Use `@PublicStable` for core APIs: `Connection`, `Admin`, `ConfigOption`, `TableDescriptor` -- Use `@PublicEvolving` for new features: `LakeTieringManager`, new connector APIs -- Use `@Internal` for: RPC messages, server internals, `*JsonSerde` classes, test utilities - -**Examples:** -```java -@PublicStable -public class ConnectionFactory { - // Stable public API -} - -@PublicEvolving -public interface LakeTieringManager { - // New feature, API may evolve -} - -@Internal -public class MetadataJsonSerde { - // Internal implementation detail -} ``` +**Usage:** `@PublicStable` for core APIs (`Connection`, `Admin`); `@PublicEvolving` for new features; `@Internal` for RPC/internals **Reference:** `fluss-common/src/main/java/org/apache/fluss/annotation/` -### 2.2 Builder Pattern - -Use fluent builder pattern for complex object construction: +### Builder Pattern ```java -// ✅ Example: ConfigOption builder -ConfigOption tempDirs = ConfigBuilder - .key("tmp.dir") - .stringType() - .defaultValue("/tmp") - .withDescription("Temporary directory path"); - -ConfigOption> ports = ConfigBuilder - .key("application.ports") - .intType() - .asList() - .defaultValues(8000, 8001, 8002); - ConfigOption timeout = ConfigBuilder .key("client.timeout") .durationType() .defaultValue(Duration.ofSeconds(30)); ``` -**Pattern structure:** -- Create static inner class `Builder` -- Use method chaining (return `this`) -- Provide `build()` method to construct final object -- Make constructor private or package-private - +**Pattern:** Static inner `Builder` class, method chaining, private constructor, `build()` method **Reference:** `fluss-common/src/main/java/org/apache/fluss/config/ConfigBuilder.java` -### 2.3 Factory Pattern - -Use static factory methods for object creation (not public constructors): +### Factory Pattern ```java -@PublicEvolving public class ConnectionFactory { + private ConnectionFactory() {} // Private constructor - // Private constructor - prevent direct instantiation - private ConnectionFactory() {} - - /** - * Creates a new connection to Fluss cluster. - */ public static Connection createConnection(Configuration conf) { return new FlussConnection(conf); } - - /** - * Creates connection with custom metric registry. - */ - public static Connection createConnection( - Configuration conf, - MetricRegistry metricRegistry) { - return new FlussConnection(conf, metricRegistry); - } } - -// ✅ Usage: -Connection conn = ConnectionFactory.createConnection(config); ``` -**Rules:** -- Make constructor private -- Provide static factory methods -- Return interface types (not concrete implementations) -- Allow method overloading for different parameter combinations - +**Rules:** Private constructor, static factory methods, return interface types **Reference:** `fluss-client/src/main/java/org/apache/fluss/client/ConnectionFactory.java` -### 2.4 Interface Segregation Pattern +### Additional Patterns -Provide both generic and typed interface variants: - -```java -// Generic interface (works with raw types) -public interface Lookuper extends AutoCloseable { - CompletableFuture lookup(InternalRow key); -} - -// Typed interface (works with POJOs) -public interface TypedLookuper extends AutoCloseable { - CompletableFuture lookup(T key); -} - -// Same pattern for writers: -public interface AppendWriter extends AutoCloseable { - CompletableFuture append(InternalRow row); -} - -public interface TypedAppendWriter extends AutoCloseable { - CompletableFuture append(T record); -} -``` - -**Implementation:** Typed variant delegates to generic version internally. - -### 2.5 Result Objects Pattern - -Use immutable result objects for return values: - -```java -public final class AppendResult { - private final long offset; - private final long timestamp; - - public AppendResult(long offset, long timestamp) { - this.offset = offset; - this.timestamp = timestamp; - } - - public long getOffset() { return offset; } - public long getTimestamp() { return timestamp; } - - @Override - public boolean equals(Object o) { /* ... */ } - - @Override - public int hashCode() { /* ... */ } - - @Override - public String toString() { /* ... */ } -} -``` - -**Rules:** -- Mark class `final` -- Make all fields `private final` -- No setters (only getters) -- Initialize all fields in constructor -- Implement `equals()`, `hashCode()`, `toString()` - -### 2.6 Thread Safety Documentation - -Document thread safety using annotations: - -```java -@ThreadSafe -public class Connection { - // Multiple threads can safely use this connection -} - -@NotThreadSafe -public class Lookuper { - // Each thread should have its own Lookuper instance -} -``` - -**Javadoc example:** -```java -/** - * Connection to a Fluss cluster. - * - *

This class is thread-safe. A single connection can be shared - * across multiple threads. However, Table instances obtained from - * the connection are NOT thread-safe - each thread should create - * its own Table instance. - */ -@ThreadSafe -public interface Connection extends AutoCloseable { } -``` +- **Interface Segregation:** Provide generic (`Lookuper`) and typed (`TypedLookuper`) variants +- **Result Objects:** Immutable `final` classes with `private final` fields, no setters, implement `equals()`/`hashCode()`/`toString()` +- **Thread Safety:** Document with `@ThreadSafe` or `@NotThreadSafe` annotations --- ## 3. Code Organization -### 3.1 Naming Conventions - -#### Interfaces: -No prefix or suffix - use plain descriptive names: -```java -Connection, Admin, Table, Lookuper, AppendWriter, LogScanner -``` - -#### Implementation Classes: -Suffix with `Impl` when directly implementing an interface: -```java -FlussConnection (implements Connection) -AdminImpl (implements Admin) -LogScannerImpl (implements LogScanner) -TypedAppendWriterImpl (implements TypedAppendWriter) -``` +### Naming Conventions -Alternative: Use descriptive implementation name: -```java -NettyClient, RocksDBKvStore, ZooKeeperCoordinator -``` +| Type | Convention | Example | +|------|-----------|---------| +| **Interface** | Plain descriptive name | `Connection`, `Admin`, `LogScanner` | +| **Implementation** | Suffix `Impl` or descriptive name | `AdminImpl`, `NettyClient`, `RocksDBKvStore` | +| **Abstract class** | Prefix `Abstract` | `AbstractIterator`, `AbstractGoal` | +| **Utility class** | Suffix `Utils` (private constructor, static methods) | `MapUtils`, `StringUtils`, `IOUtils` | +| **Test class** | Suffix `Test` (unit) or `ITCase` (integration) | `ConfigBuilderTest`, `ServerITCaseBase` | +| **Test utility** | Prefix `Testing` | `TestingRemoteLogStorage` | +| **Exception** | Suffix `Exception` | `TableNotExistException` | -#### Abstract Base Classes: -Prefix with `Abstract`: -```java -AbstractAutoCloseableRegistry -AbstractAuthorizer -AbstractGoal -AbstractIterator -``` +### Package Structure -#### Utility Classes: -Suffix with `Utils`, make constructor private, all methods static: -```java -ArrayUtils, MapUtils, CollectionUtils, StringUtils -BytesUtils, BinaryStringUtils -ExceptionUtils, TimeUtils, DateTimeUtils -NetUtils, IOUtils, FileUtils -MathUtils, MurmurHashUtils -``` +See CLAUDE.md for full module/package organization. Key modules: `fluss-common`, `fluss-rpc`, `fluss-client`, `fluss-server`. -#### Test Classes: -- Unit tests: `*Test.java` (e.g., `ConfigBuilderTest`, `KvWriteBatchTest`) -- Integration tests: `*ITCase.java` (e.g., `Flink120TableSourceITCase`, `ServerITCaseBase`) -- Base test classes: `*TestBase.java` or `*TestUtils.java` -- Test utilities: Prefix with `Testing` (e.g., `TestingRemoteLogStorage`, `TestingMetricGroups`) +### Class Member Order -#### Exceptions: -Descriptive name + `Exception` suffix: -```java -TableNotExistException -DatabaseNotExistException -InvalidRecordException -StaleMetadataException -OutOfOrderSequenceException -``` +**Fields:** Static constants → static fields → instance fields +**Methods:** Constructors → static factories → public → package-private → protected → private → static utilities +**Modifier order:** `public protected private abstract static final transient volatile synchronized native strictfp` (Checkstyle enforced) -### 3.2 Package Structure +### Imports -Standard package organization: - -``` -fluss-common/ # Common utilities, data types, configs - /annotation # API stability annotations - /config # Configuration framework - /exception # Exception hierarchy - /fs # Filesystem abstraction - /memory # Memory management - /metrics # Metrics definitions - /record # Record formats (log records, KV records) - /row # Row types and encoding - /types # Data types - /utils # Utility classes - -fluss-rpc/ # RPC layer - /gateway # RPC gateway interfaces - /messages # Protobuf messages - /netty # Netty-based implementation - -fluss-client/ # Java client SDK - /admin # Admin operations - /write # Write APIs - /lookup # Point lookup APIs - /table # Table abstraction - /scanner # Scanning/reading operations - /metadata # Metadata management - /token # Security token management - /connection # Connection management - -fluss-server/ # Server implementations - /coordinator # CoordinatorServer - /tablet # TabletServer - /log # LogStore implementation - /kv # KvStore implementation - /replica # Replication logic - /metadata # Metadata management - /zk # ZooKeeper integration - /metrics # Server metrics - /authorizer # Authorization -``` - -### 3.3 Class Member Organization - -**Field ordering:** -1. Static constants (`public static final`) -2. Static fields (`private static`) -3. Instance fields (`private` or `protected`) - -**Method ordering:** -1. Constructors -2. Static factory methods -3. Public methods -4. Package-private methods -5. Protected methods -6. Private methods -7. Static utility methods (at bottom) - -**Modifier order** (enforced by Checkstyle): -``` -public, protected, private, abstract, static, final, -transient, volatile, synchronized, native, strictfp -``` - -Example: -```java -public class Example { - // 1. Static constants - public static final int DEFAULT_SIZE = 100; - private static final Logger LOG = LoggerFactory.getLogger(Example.class); - - // 2. Static fields - private static int instanceCount = 0; - - // 3. Instance fields - private final String name; - private int value; - - // 4. Constructor - public Example(String name) { - this.name = name; - instanceCount++; - } - - // 5. Static factory method - public static Example create(String name) { - return new Example(name); - } - - // 6. Public methods - public void doSomething() { } - - // 7. Private methods - private void helperMethod() { } - - // 8. Static utility (at bottom) - private static String formatName(String name) { - return name.toUpperCase(); - } -} -``` - -### 3.4 Import Organization - -**Order** (enforced by Spotless): -1. `org.apache.fluss` imports -2. Blank line -3. Other imports (`javax`, `java`, third-party) -4. Blank line -5. Static imports (`\#`) - -**Example:** -```java -package org.apache.fluss.client; - -import org.apache.fluss.config.Configuration; -import org.apache.fluss.exception.FlussException; - -import javax.annotation.Nullable; - -import java.io.IOException; -import java.util.concurrent.CompletableFuture; - -import static org.apache.fluss.utils.Preconditions.checkNotNull; -import static org.apache.fluss.utils.Preconditions.checkState; -``` - -**Enforcement:** Run `./mvnw spotless:apply` to auto-format imports. +**Order:** `org.apache.fluss.*` → blank line → other imports → blank line → static imports +**Enforcement:** `./mvnw spotless:apply` --- ## 4. Error Handling -### 4.1 Exception Hierarchy - -``` -FlussException (base checked exception) -├── ApiException (client-facing errors) -│ ├── TableNotExistException -│ ├── DatabaseNotExistException -│ ├── InvalidOffsetException -│ ├── InvalidConfigException -│ └── ... -├── RetriableException (transient errors) -│ ├── NotEnoughReplicasException -│ ├── SchemaNotExistException -│ ├── TimeoutException -│ └── ... -├── InvalidMetadataException -│ └── StaleMetadataException -└── FlussRuntimeException (base unchecked exception) - ├── CorruptRecordException - ├── LogStorageException - └── ... -``` - -**When to extend:** -- `ApiException`: User-facing errors (invalid input, not found, etc.) -- `RetriableException`: Transient failures (retry may succeed) -- `FlussRuntimeException`: Programming errors or unrecoverable failures - -### 4.2 Input Validation Pattern - -Use Preconditions for input validation at API boundaries: +**Exception hierarchy:** `FlussException` (checked) → `ApiException` (user errors), `RetriableException` (transient), `FlussRuntimeException` (unchecked) -```java -import static org.apache.fluss.utils.Preconditions.checkArgument; -import static org.apache.fluss.utils.Preconditions.checkNotNull; -import static org.apache.fluss.utils.Preconditions.checkState; - -public void createTable(String databaseName, TableDescriptor descriptor) { - // Null checks - checkNotNull(databaseName, "database name cannot be null"); - checkNotNull(descriptor, "table descriptor cannot be null"); - - // Argument validation - checkArgument(!databaseName.isEmpty(), "database name cannot be empty"); - checkArgument(descriptor.getBucketCount() > 0, - "bucket count must be positive, got: %s", descriptor.getBucketCount()); - - // State validation - checkState(isInitialized(), "connection not initialized"); - checkState(!isClosed(), "connection is closed"); -} -``` - -**Preconditions API:** -- `checkNotNull(T obj, String message, Object... args)` → throws `NullPointerException` -- `checkArgument(boolean condition, String message, Object... args)` → throws `IllegalArgumentException` -- `checkState(boolean condition, String message, Object... args)` → throws `IllegalStateException` - -**Message templates:** Use `%s` placeholders (not `String.format` style). - -### 4.3 Error Propagation - -#### For Async Operations: - -```java -public CompletableFuture append(InternalRow row) { - return CompletableFuture.supplyAsync(() -> { - try { - return doAppend(row); - } catch (IOException e) { - throw ExceptionUtils.wrapAsUnchecked(e); - } - }, executorService); -} -``` +**Input validation:** Use `Preconditions.checkNotNull()`, `checkArgument()`, `checkState()` (see Section 1) at API boundaries with `%s` placeholders -#### Composition with Error Handling: - -```java -return fetchMetadata(tablePath) - .thenCompose(metadata -> fetchData(metadata)) - .thenApply(data -> processData(data)) - .exceptionally(ex -> { - Throwable cause = ExceptionUtils.stripCompletionException(ex); - if (cause instanceof TableNotExistException) { - LOG.warn("Table not found: {}", tablePath); - return null; - } - throw ExceptionUtils.wrapAsUnchecked(cause); - }); -``` - -**Utilities:** -- `ExceptionUtils.wrapAsUnchecked(Exception e)`: Wrap checked exceptions -- `ExceptionUtils.stripCompletionException(Throwable t)`: Unwrap `CompletionException` +**Error propagation:** Async operations use `ExceptionUtils.wrapAsUnchecked()` to wrap checked exceptions; use `.exceptionally()` for CompletableFuture error handling --- ## 5. Concurrency & Thread Safety -### 5.1 Thread Safety Annotations - -Document synchronization requirements: - -```java -import javax.annotation.concurrent.GuardedBy; -import javax.annotation.concurrent.ThreadSafe; -import javax.annotation.concurrent.NotThreadSafe; - -@ThreadSafe -public class ServerConnection { - private final Object lock = new Object(); - - @GuardedBy("lock") - private volatile boolean connected; - - @GuardedBy("lock") - private Channel channel; - - public void connect() { - synchronized (lock) { - if (connected) { - return; - } - channel = createChannel(); - connected = true; - } - } - - public void disconnect() { - synchronized (lock) { - if (!connected) { - return; - } - channel.close(); - connected = false; - } - } -} -``` - -**Rules:** -- Mark thread-safe classes with `@ThreadSafe` -- Use `@GuardedBy("lockName")` to document protected fields -- Declare lock fields explicitly (avoid implicit `this` locks) -- Use `volatile` for fields accessed outside synchronized blocks - -**Reference:** `fluss-server/src/main/java/org/apache/fluss/server/log/LogTablet.java` - -### 5.2 CompletableFuture Patterns - -#### Async API Pattern: - -```java -public CompletableFuture lookup(InternalRow key) { - return CompletableFuture.supplyAsync(() -> { - // Perform async lookup operation - return performLookup(key); - }, executorService); -} -``` - -#### Composition: - -```java -CompletableFuture tableInfoFuture = fetchTableInfo(tablePath); - -return tableInfoFuture - .thenCompose(tableInfo -> fetchSchema(tableInfo.getSchemaId())) - .thenCombine(fetchPartitions(tablePath), (schema, partitions) -> { - return new TableMetadata(schema, partitions); - }) - .thenApply(metadata -> processMetadata(metadata)) - .exceptionally(ex -> handleError(ex)); -``` - -#### Waiting for Multiple Futures: - -```java -import org.apache.fluss.utils.concurrent.FutureUtils; - -List> futures = new ArrayList<>(); -futures.add(future1); -futures.add(future2); -futures.add(future3); - -CompletableFuture allFutures = FutureUtils.completeAll(futures); -allFutures.get(); // Wait for all -``` - -**Utilities:** -- `FutureUtils.completeAll(Collection>)`: Wait for all futures -- `FutureUtils.orTimeout(CompletableFuture, Duration)`: Add timeout - -### 5.3 Map Creation Rules - -**NEVER instantiate ConcurrentHashMap directly** (enforced by Checkstyle): - -```java -// ❌ DON'T: -Map map = new ConcurrentHashMap<>(); - -// ✅ DO: -import org.apache.fluss.utils.MapUtils; -Map map = MapUtils.newConcurrentMap(); -``` - -**Other MapUtils methods:** -- `MapUtils.newHashMap()`: Create HashMap with default capacity -- `MapUtils.newHashMap(int capacity)`: Create HashMap with specified capacity -- `MapUtils.newLinkedHashMap()`: Create LinkedHashMap +**Thread safety annotations:** `@ThreadSafe`, `@NotThreadSafe`, `@GuardedBy("lockName")` for documentation -### 5.4 Resource Management +**Locking:** Explicit lock objects (`private final Object lock`), use `synchronized(lock)`, `volatile` for fields accessed outside locks -Use try-with-resources for AutoCloseable resources: +**ConcurrentHashMap:** NEVER instantiate directly - use `MapUtils.newConcurrentMap()` (Checkstyle enforced - see Section 1) -```java -// ✅ Single resource: -try (Connection connection = ConnectionFactory.createConnection(conf)) { - Admin admin = connection.getAdmin(); - admin.createDatabase("my_db", DatabaseDescriptor.EMPTY, false).get(); -} // Auto-closed - -// ✅ Multiple resources: -try (Connection connection = ConnectionFactory.createConnection(conf); - Admin admin = connection.getAdmin(); - Table table = connection.getTable(tablePath)) { - // Use resources -} // All auto-closed in reverse order -``` +**CompletableFuture:** Use `.thenCompose()`, `.thenCombine()`, `.thenApply()` for composition; `FutureUtils.completeAll()` for multiple futures -**Implement AutoCloseable:** -```java -public class MyResource implements AutoCloseable { - private volatile boolean closed = false; - - @Override - public void close() throws IOException { - if (closed) { - return; - } - try { - // Release resources - cleanupResources(); - } finally { - closed = true; - } - } -} -``` +**Resources:** Use try-with-resources for `AutoCloseable`; implement with idempotent `close()` method --- ## 6. Testing Standards -### 6.1 Test Framework - -- **JUnit 5 (Jupiter)** for all tests -- **AssertJ** for assertions (NOT JUnit assertions) -- **Mockito** for mocking (use sparingly - prefer custom test doubles) +**Framework:** JUnit 5 with AssertJ assertions (MANDATORY - see Section 1.4) -### 6.2 Test Naming +**Test naming:** Descriptive method names (`testAppendWithValidData`); classes: `*Test.java` (unit), `*ITCase.java` (integration) -Use descriptive test names that explain what is being tested: - -```java -@Test -void testAppendWithValidData() { } - -@Test -void testLookupThrowsExceptionWhenTableNotExist() { } - -@Test -void testTryAppendWithWriteLimit() { } - -@Test -void testHandleHybridSnapshotLogSplitChangesAndFetch() { } -``` - -**Test class naming:** -- Unit tests: `*Test.java` -- Integration tests: `*ITCase.java` - -### 6.3 Assertion Patterns - -**ALWAYS use AssertJ** (Checkstyle enforces this): +### Assertions ```java import static org.assertj.core.api.Assertions.assertThat; import static org.assertj.core.api.Assertions.assertThatThrownBy; -@Test -void testValidation() { - // Basic assertions - assertThat(result).isNotNull(); - assertThat(result.getOffset()).isEqualTo(100L); - assertThat(result.isComplete()).isTrue(); - - // Collection assertions - assertThat(list).hasSize(3); - assertThat(list).contains("a", "b", "c"); - assertThat(list).containsExactly("a", "b", "c"); // Order matters - assertThat(map).containsEntry("key", "value"); - - // Exception assertions - assertThatThrownBy(() -> service.lookup(null)) - .isInstanceOf(NullPointerException.class) - .hasMessageContaining("cannot be null"); - - // Future assertions (custom FlussAssertions) - assertThatFuture(completableFuture) - .eventuallySucceeds() - .isEqualTo(expectedValue); -} -``` - -**Never use JUnit assertions:** -```java -// ❌ DON'T: -import org.junit.jupiter.api.Assertions; -Assertions.assertEquals(expected, actual); -Assertions.assertTrue(condition); +assertThat(result).isNotNull().isEqualTo(expected); +assertThat(list).hasSize(3).contains("a", "b"); +assertThatThrownBy(() -> service.lookup(null)) + .isInstanceOf(NullPointerException.class) + .hasMessageContaining("cannot be null"); ``` -### 6.4 Test Base Classes - -Use test base classes for common setup: - -```java -// For Flink integration tests -class MyFlinkTest extends FlinkTestBase { - // Provides: FLUSS_CLUSTER_EXTENSION, connection, admin - // Helper methods: createTable(), writeRows(), etc. - - @Test - void testFlinkSource() throws Exception { - TablePath tablePath = TablePath.of("test_db", "test_table"); - createTable(tablePath, DEFAULT_PK_TABLE_DESCRIPTOR); - // Test implementation - } -} - -// For server tests -class MyServerTest extends ServerTestBase { - // Provides: ZooKeeper setup, server configuration -} - -// For lake tiering tests -class MyLakeTest extends FlinkPaimonTieringTestBase { - // Provides: Paimon lake integration setup -} -``` +### Test Base Classes -**Common base classes:** -- `FlinkTestBase`: Flink cluster + Fluss cluster -- `ServerTestBase`: Coordinator/Tablet server setup +Common base classes for setup: +- `FlinkTestBase`: Flink + Fluss cluster (see `fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlinkTestBase.java`) +- `ServerTestBase`: Coordinator/TabletServer setup - `FlinkTieringTestBase`: Lake tiering infrastructure - `LogTestBase`, `KvTestBase`: Record format testing -**Reference:** `fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlinkTestBase.java` - -### 6.5 Test Extensions - -Use JUnit 5 extensions for complex setup: +### Extensions ```java -@RegisterExtension -public static final AllCallbackWrapper ZOO_KEEPER_EXTENSION = - new AllCallbackWrapper<>(new ZooKeeperExtension()); - @RegisterExtension public static final FlussClusterExtension FLUSS_CLUSTER_EXTENSION = - FlussClusterExtension.builder() - .setNumOfTabletServers(3) - .setClusterConf(new Configuration() - .set(ConfigOptions.KV_MAX_RETAINED_SNAPSHOTS, Integer.MAX_VALUE)) - .build(); - -@Test -void testWithCluster() { - // Extensions handle cluster start/stop -} -``` - -**Available extensions:** -- `FlussClusterExtension`: Embedded Fluss cluster -- `ZooKeeperExtension`: Embedded ZooKeeper server -- `ParameterizedTestExtension`: Custom parameterized testing - -### 6.6 Parameterized Tests - -```java -@ParameterizedTest -@ValueSource(ints = {1, 2, 4, 8}) -void testWithDifferentBucketCounts(int bucketCount) { - // Test logic using bucketCount -} - -@ParameterizedTest -@CsvSource({ - "LOG_TABLE, false", - "PRIMARY_KEY_TABLE, true" -}) -void testTableTypes(String tableType, boolean hasPrimaryKey) { - // Test logic -} - -@ParameterizedTest -@MethodSource("provideTableConfigs") -void testWithCustomProvider(TableConfig config) { - // Test logic -} - -private static Stream provideTableConfigs() { - return Stream.of( - new TableConfig("log", false), - new TableConfig("pk", true) - ); -} -``` - -### 6.7 Test Utilities - -**CommonTestUtils:** -```java -import org.apache.fluss.testutils.common.CommonTestUtils; - -// Wait for condition with timeout -CommonTestUtils.waitUntil( - () -> server.isRunning(), - Duration.ofSeconds(30), - "Server failed to start" -); - -// Wait for value to become available -Optional result = CommonTestUtils.waitValue( - () -> fetchValue(), - Duration.ofSeconds(10), - "Value not found" -); - -// Retry assertion with backoff -CommonTestUtils.retry( - Duration.ofSeconds(5), - () -> assertThat(getValue()).isEqualTo(expected) -); -``` - -**FlussAssertions:** -```java -import org.apache.fluss.testutils.common.FlussAssertions; - -// Custom future assertions -assertThatFuture(completableFuture) - .eventuallySucceeds() - .isEqualTo(expectedValue); - -// Exception chain matching -FlussAssertions.assertThatChainOfCauses(exception) - .anyCauseMatches(TableNotExistException.class, "table_name"); -``` - -### 6.8 Parallel Test Execution - -**Opt-in to parallelism** (tests are sequential by default): - -```java -import org.junit.jupiter.api.parallel.Execution; -import org.junit.jupiter.api.parallel.ExecutionMode; - -@Execution(ExecutionMode.CONCURRENT) -class MyParallelTest { - // Tests in this class can run in parallel - - @Test - void test1() { } - - @Test - void test2() { } -} + FlussClusterExtension.builder().setNumOfTabletServers(3).build(); ``` -**Only use for truly independent tests** - most tests should remain sequential. +**Available:** `FlussClusterExtension`, `ZooKeeperExtension` --- ## 7. Dependencies & Shading -### 7.1 Shaded Dependencies - -**ALWAYS use shaded versions** to prevent dependency conflicts: - -| Instead of | Use | -|------------|-----| -| `com.google.common.*` | `org.apache.fluss.shaded.guava.*` | -| `com.fasterxml.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | -| `org.codehaus.jackson.*` | `org.apache.fluss.shaded.jackson2.*` | -| `io.netty.*` | `org.apache.fluss.shaded.netty4.*` | -| `org.apache.arrow.*` | `org.apache.fluss.shaded.arrow.*` | -| `org.apache.zookeeper.*` | `org.apache.fluss.shaded.zookeeper38.*` | - -**Example:** -```java -// ❌ DON'T: -import com.google.common.collect.ImmutableList; -import com.fasterxml.jackson.databind.ObjectMapper; - -// ✅ DO: -import org.apache.fluss.shaded.guava32.com.google.common.collect.ImmutableList; -import org.apache.fluss.shaded.jackson2.com.fasterxml.jackson.databind.ObjectMapper; -``` - -### 7.2 Utility Class Substitutions - -Use Fluss utilities instead of external libraries: - -| Instead of | Use | -|------------|-----| -| Guava `Preconditions` | `org.apache.fluss.utils.Preconditions` | -| Guava `VisibleForTesting` | `org.apache.fluss.annotation.VisibleForTesting` | -| Commons `Validate` | `org.apache.fluss.utils.Preconditions` | -| Commons `SerializationUtils` | `org.apache.fluss.utils.InstantiationUtil` | - -**Available Fluss utilities:** -- `ArrayUtils`, `MapUtils`, `CollectionUtils`: Collection operations -- `BytesUtils`, `BinaryStringUtils`: Byte/string manipulation -- `ExceptionUtils`: Error handling utilities -- `FutureUtils`: CompletableFuture utilities -- `TimeUtils`, `DateTimeUtils`: Time handling -- `NetUtils`, `IOUtils`, `FileUtils`: I/O operations -- `MathUtils`, `MurmurHashUtils`: Math operations +**Shaded dependencies:** See Section 1 for forbidden imports. Always use `org.apache.fluss.shaded.*` versions (guava, jackson, netty, arrow, zookeeper) -### 7.3 Module Dependencies +**Fluss utilities:** `Preconditions`, `MapUtils`, `ArrayUtils`, `CollectionUtils`, `BytesUtils`, `ExceptionUtils`, `FutureUtils`, `TimeUtils`, `IOUtils`, `FileUtils` -**Dependency hierarchy:** - -``` -fluss-common (foundation) - ↑ -fluss-rpc - ↑ -fluss-client ← fluss-server (peer modules) - ↑ ↑ - | | - +----+------+ - | - connectors (flink, spark, kafka) - | - lake integrations (iceberg, paimon, lance) -``` - -**Rules:** -- `fluss-common` depends only on: JDK + shaded libs (guava, jackson, arrow, etc.) -- `fluss-client` **cannot** depend on `fluss-server` -- `fluss-server` **cannot** depend on connectors -- Connectors **can** depend on `fluss-client` -- Define interfaces in lower-level modules, implementations in higher-level modules +**Module dependency rules:** +- `fluss-common` (foundation) → `fluss-rpc` → `fluss-client`/`fluss-server` (peers) → connectors → lake +- `fluss-client` CANNOT depend on `fluss-server`; define interfaces in lower modules --- ## 8. Configuration Patterns -### 8.1 ConfigOption Definition - -Use `ConfigBuilder` to define configuration options: - -```java -import org.apache.fluss.config.ConfigBuilder; -import org.apache.fluss.config.ConfigOption; - -// String option with default -public static final ConfigOption BOOTSTRAP_SERVERS = - ConfigBuilder.key("bootstrap.servers") - .stringType() - .defaultValue("localhost:9092") - .withDescription("Comma-separated list of Fluss server addresses"); - -// Integer option -public static final ConfigOption CLIENT_TIMEOUT_MS = - ConfigBuilder.key("client.request.timeout.ms") - .intType() - .defaultValue(30000) - .withDescription("Request timeout in milliseconds"); - -// Duration option -public static final ConfigOption SESSION_TIMEOUT = - ConfigBuilder.key("client.session.timeout") - .durationType() - .defaultValue(Duration.ofSeconds(60)) - .withDescription("Client session timeout duration"); - -// MemorySize option -public static final ConfigOption BUFFER_SIZE = - ConfigBuilder.key("client.buffer.size") - .memoryType() - .defaultValue(MemorySize.ofMebiBytes(32)) - .withDescription("Buffer size for network operations"); - -// List option -public static final ConfigOption> HOSTS = - ConfigBuilder.key("client.hosts") - .stringType() - .asList() - .defaultValues("localhost", "127.0.0.1") - .withDescription("List of host addresses to connect to"); - -// No default value -public static final ConfigOption USERNAME = - ConfigBuilder.key("user.name") - .stringType() - .noDefaultValue() - .withDescription("Username for authentication"); - -// Password option (sensitive) -public static final ConfigOption PASSWORD = - ConfigBuilder.key("user.password") - .passwordType() - .noDefaultValue() - .withDescription("Password for authentication (stored securely)"); - -// Deprecated keys support -public static final ConfigOption THRESHOLD = - ConfigBuilder.key("cpu.utilization.threshold") - .doubleType() - .defaultValue(0.9) - .withDeprecatedKeys("cpu.threshold", "old.cpu.threshold") - .withDescription("CPU utilization threshold"); -``` - +**ConfigOption definition:** Use `ConfigBuilder.key("name").{type}Type().defaultValue(x).withDescription("...")` +**Types:** `stringType()`, `intType()`, `durationType()`, `memoryType()`, `passwordType()`, `asList()`, `noDefaultValue()`, `withDeprecatedKeys()` **Reference:** `fluss-common/src/main/java/org/apache/fluss/config/ConfigBuilder.java` -### 8.2 Configuration Usage - -```java -Configuration conf = new Configuration(); - -// Set values -conf.setString("bootstrap.servers", "server1:9092,server2:9092"); -conf.setInteger(CLIENT_TIMEOUT_MS, 60000); - -// Get values with defaults -String servers = conf.getString(BOOTSTRAP_SERVERS); -int timeout = conf.getInteger(CLIENT_TIMEOUT_MS); -Duration sessionTimeout = conf.get(SESSION_TIMEOUT); - -// Get optional values -Optional username = conf.getOptional(USERNAME); -if (username.isPresent()) { - // Use username -} -``` - -### 8.3 Configuration Naming Convention - -Use hierarchical dot-separated keys with hyphens for compound words: +**Usage:** `conf.setString()`, `conf.getInteger()`, `conf.get()`, `conf.getOptional()` -``` -{category}.{subcategory}.{option-name} - -Examples: -- remote.data.dir -- remote.fs.write-buffer-size -- client.request.timeout.ms -- default.bucket.number -- kv.snapshot.interval -``` +**Naming:** Hierarchical dot-separated keys with hyphens: `{category}.{subcategory}.{option-name}` (e.g., `client.request.timeout.ms`) --- ## 9. Serialization & RPC -### 9.1 Protocol Buffers +**Protocol Buffers:** proto2 (migrating to proto3); DO NOT use `default` keyword or `enum` type; use `required`/`optional`/`repeated` +**Regenerate:** `./mvnw clean install -DskipTests -pl fluss-protogen,fluss-rpc` +**RPC messages:** Mark `@Internal`, immutable (final fields, no setters), use `@Nullable` for optional fields -**Proto file conventions:** +--- -```protobuf -syntax = "proto2"; // Currently proto2, migrating to proto3 +## 10. Module Boundaries -package fluss.rpc.messages; +**Module structure:** See CLAUDE.md for full module organization -option java_package = "org.apache.fluss.rpc.messages"; -option java_outer_classname = "FlussApiProtos"; -option optimize_for = LITE_RUNTIME; // Smaller code size +**Core:** `fluss-common` (foundation) → `fluss-rpc` → `fluss-client`/`fluss-server` (peers, cannot cross-depend) +**Connectors:** `fluss-flink` (1.18/1.19/1.20/2.2), `fluss-spark` (3.4/3.5), `fluss-kafka` - depend on `fluss-client` +**Lake:** `fluss-lake-iceberg`, `fluss-lake-paimon`, `fluss-lake-lance` -message GetTableSchemaRequest { - required PbTablePath table_path = 1; - optional int32 schema_id = 2; -} +**Rules:** Define interfaces in lower modules, implementations in higher modules; no circular dependencies -message GetTableSchemaResponse { - required PbSchema schema = 1; - optional string error_message = 2; -} -``` +--- -**Rules for proto3 migration:** -- **DO NOT** use `default` keyword (prepare for proto3) -- **DO NOT** use `enum` type for now (will be supported in proto3) -- Use `required` for mandatory fields (proto2 only) -- Use `optional` for optional fields -- Use `repeated` for lists +## 11. Build, CI & Deployment -### 9.2 Regenerating Protobuf Code +### Quick Start (Deployment) -After modifying `.proto` files, regenerate Java code: +**Prerequisites:** Java 11, Maven 3.8.6+, Unix-like environment +**Build from source:** ```bash -./mvnw clean install -DskipTests -pl fluss-protogen,fluss-rpc +git clone https://github.com/apache/fluss.git +cd fluss +./mvnw clean install -DskipTests -T 1C ``` -**Reference:** `fluss-rpc/src/main/proto/FlussApi.proto` +**Binary location:** `fluss-dist/target/fluss-*-bin.tgz` -### 9.3 RPC Message Pattern - -```java -@Internal -public class GetTableSchemaRequest { - private final TablePath tablePath; - @Nullable - private final Integer schemaId; - - public GetTableSchemaRequest(TablePath tablePath, @Nullable Integer schemaId) { - this.tablePath = checkNotNull(tablePath); - this.schemaId = schemaId; - } - - public TablePath getTablePath() { return tablePath; } - public Optional getSchemaId() { - return Optional.ofNullable(schemaId); - } -} - -@Internal -public class GetTableSchemaResponse { - private final Schema schema; - - public GetTableSchemaResponse(Schema schema) { - this.schema = checkNotNull(schema); - } - - public Schema getSchema() { return schema; } -} +**Start cluster (local development):** +```bash +# 1. Start ZooKeeper (separate process) +# 2. Start CoordinatorServer +./bin/coordinator-server.sh start +# 3. Start TabletServer(s) +./bin/tablet-server.sh start ``` -**Rules:** -- Mark RPC messages as `@Internal` -- Use immutable pattern (final fields, no setters) -- Provide clear constructors and getters -- Use `@Nullable` for optional fields +**Configuration:** Edit `conf/server.yaml` for `zookeeper.address`, `bind.listeners`, `tablet-server.id` ---- +### Build Commands -## 10. Module Boundaries +**Build:** `./mvnw clean install -DskipTests` (parallel: `-T 1C`); test: `./mvnw clean verify` +**Test specific:** `./mvnw verify -pl fluss-server`; single test: `./mvnw test -Dtest=ConfigBuilderTest -pl fluss-common` +**Format code:** `./mvnw spotless:apply` (google-java-format AOSP style; IntelliJ plugin v1.7.0.6 - DO NOT update) -### 10.1 Core Modules - -**fluss-common**: Foundation layer -- **Depends on:** JDK, shaded libraries only -- **Used by:** All other modules -- **Contains:** Utilities, data types, configurations, exceptions - -**fluss-rpc**: RPC communication layer -- **Depends on:** `fluss-common` -- **Used by:** `fluss-client`, `fluss-server` -- **Contains:** RPC gateways, protobuf messages, Netty implementation - -**fluss-client**: Java client SDK -- **Depends on:** `fluss-common`, `fluss-rpc` -- **Cannot depend on:** `fluss-server` (strict boundary) -- **Used by:** Applications, connectors -- **Contains:** Connection, Admin, Table, Writers, Scanners, Lookupers - -**fluss-server**: Server implementations -- **Depends on:** `fluss-common`, `fluss-rpc` -- **Cannot depend on:** `fluss-client`, connectors -- **Contains:** CoordinatorServer, TabletServer, LogStore, KvStore - -### 10.2 Connector Modules - -**fluss-flink**: Apache Flink connector -- **Structure:** - - `fluss-flink-common`: Common Flink connector code - - `fluss-flink-tiering`: Lake tiering service - - `fluss-flink-1.18`, `1.19`, `1.20`, `2.2`: Version-specific implementations -- **Depends on:** `fluss-client`, specific Flink version -- **Shading:** Each version shades Flink dependencies - -**fluss-spark**: Apache Spark connector -- **Structure:** - - `fluss-spark-common`: Common Spark connector code - - `fluss-spark-ut`: Unit test utilities - - `fluss-spark-3.4`, `3.5`: Version-specific implementations -- **Depends on:** `fluss-client`, specific Spark version - -**fluss-kafka**: Kafka compatibility layer -- **Provides:** Kafka-compatible API surface -- **Depends on:** `fluss-client` - -### 10.3 Lake Integration Modules - -**fluss-lake**: Base lake integration -- Sub-modules: - - `fluss-lake-iceberg`: Apache Iceberg format - - `fluss-lake-paimon`: Apache Paimon format - - `fluss-lake-lance`: Lance columnar format - -### 10.4 Cross-Module Communication - -**Rules:** -1. Use **interfaces** (not implementations) for cross-module APIs -2. Define interfaces in **lower-level modules** -3. Implementations can be in **higher-level modules** -4. **Never** create circular dependencies - -**Example:** -```java -// In fluss-common: -public interface FileSystem { - InputStream openFile(Path path) throws IOException; - OutputStream createFile(Path path) throws IOException; -} +### CI Pipeline -// In fluss-filesystems/fluss-fs-hdfs: -public class HdfsFileSystem implements FileSystem { - // HDFS-specific implementation -} -``` +**CI stages:** compile-on-jdk8 → core (excludes Flink/Spark/Lake) → flink → spark3 → lake (`.github/workflows/ci.yaml`) +**Java:** Build on Java 11 (required); runtime Java 8 compatible + +**Test coverage:** `./mvnw verify -Ptest-coverage` → view `fluss-test-coverage/target/site/jacoco-aggregate/index.html` +**License headers:** Apache 2.0 required (enforced by RAT); check with `./mvnw validate` --- -## 11. Build & CI +## 12. Git & Pull Request Workflow -### 11.1 Build Commands +### Fork Management -**Full build (skip tests):** -```bash -./mvnw clean install -DskipTests -``` +**ALWAYS push to your fork, NEVER to apache/fluss upstream.** -**Parallel build (faster):** +Verify fork remote exists: ```bash -./mvnw clean install -DskipTests -T 1C +git remote -v +# Should show: fork https://github.com//fluss.git ``` -**Build and run all tests:** +If fork remote doesn't exist: ```bash -./mvnw clean verify -``` +# Via GitHub CLI (recommended) +gh repo fork apache/fluss --remote --remote-name fork -**Test specific module:** -```bash -./mvnw verify -pl fluss-server -./mvnw verify -pl fluss-client +# Or manually +git remote add fork https://github.com//fluss.git ``` -**Test by stage (as in CI):** -```bash -# Core tests (excludes Flink, Spark, Lake) -./mvnw verify -pl '!fluss-flink/**,!fluss-spark/**,!fluss-lake/**' - -# Flink tests -./mvnw verify -pl fluss-flink/** - -# Spark tests -./mvnw verify -Pspark3 -pl fluss-spark/** +### Commit Guidelines -# Lake tests -./mvnw verify -pl fluss-lake/** +**Commit message format:** ``` +[component] Brief description (under 70 chars) -**Single test class:** -```bash -./mvnw test -Dtest=ConfigBuilderTest -pl fluss-common -``` - -**Single test method:** -```bash -./mvnw test -Dtest=ConfigBuilderTest#testStringType -pl fluss-common -``` - -### 11.2 Code Formatting +Detailed explanation of changes and motivation. -**Auto-format before committing:** -```bash -./mvnw spotless:apply -``` - -**Check formatting without changing files:** -```bash -./mvnw spotless:check +Co-Authored-By: Claude ``` -**Format style:** -- Java: google-java-format (AOSP style) -- Scala: scalafmt (v3.10.2) -- Config: `.scalafmt.conf` in repository root - -**IntelliJ IDEA plugin:** -- Install: google-java-format v1.7.0.6 -- **DO NOT update** - newer versions have incompatibilities +**Component tags:** `[client]`, `[server]`, `[rpc]`, `[flink]`, `[spark]`, `[docs]`, `[build]`, `[test]` -**Reference:** `tools/maven/checkstyle.xml` +**AI-generated code identification:** ALWAYS include `Co-Authored-By: Claude ` in commit messages for AI-generated changes. -### 11.3 CI Pipeline +### Pre-Push Self-Review -**GitHub Actions workflow** runs tests in four parallel stages: +Before pushing, conduct thorough self-review: -1. **compile-on-jdk8**: Compile with Java 8 for compatibility -2. **core**: Test all modules except Flink, Spark, Lake -3. **flink**: Test Flink connector (all versions) -4. **spark3**: Test Spark connector -5. **lake**: Test Lake integrations + legacy Flink (1.18, 1.19) +1. **Review full diff:** `git diff main...HEAD` - verify only intentional changes +2. **Check code style:** `./mvnw spotless:check` +3. **Run tests:** `./mvnw verify -pl ` +4. **Verify Checkstyle:** Check no forbidden imports, AssertJ usage, MapUtils, etc. +5. **Audit security:** No secrets, credentials, or sensitive data committed -**Java compatibility:** -- **Build on:** Java 11 (required) -- **Runtime:** Java 8 compatible (validated with `-Pjava8` profile) -- **CI validates:** Both Java 8 compile and Java 11 tests +### Creating Pull Request -**Build command used in CI:** +**Rebase onto main:** ```bash -# Compile -mvn -T 1C -B clean install -DskipTests - -# Test with coverage -mvn -T 1C verify -Ptest-coverage +git fetch upstream main +git rebase upstream/main ``` -**Reference:** `.github/workflows/ci.yaml` - -### 11.4 Test Coverage - -**Generate coverage report:** +**Push to fork:** ```bash -./mvnw verify -Ptest-coverage +git push -u fork ``` -**View report:** -Open `fluss-test-coverage/target/site/jacoco-aggregate/index.html` in browser. +**Create PR:** +```bash +gh pr create --web --title "[component] Brief title (under 70 chars)" +``` -**Coverage tool:** JaCoCo (aggregated across all modules) +The `--web` flag opens browser for final review before submission. -### 11.5 License Headers +**PR description template:** +```markdown +## Summary +- Bullet point summary of changes +- Fixes issue #XXX (if applicable) -All files must have Apache 2.0 license header (enforced by Apache RAT plugin): +## Test Plan +- How changes were tested +- Affected modules/tests run -```java -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ +🤖 AI-assisted changes - reviewed by human developer ``` -**Enforcement:** Run `./mvnw validate` to check license headers. - --- -## Additional Resources +## 13. AI Agent Boundaries -### Key Source Files for Reference +### Ask Before Acting -1. **Checkstyle configuration:** `tools/maven/checkstyle.xml` - - All MUST/NEVER rules and style enforcement +**Large-scale changes requiring approval:** +- Cross-module refactoring affecting 5+ files +- New dependencies with broad impact +- Database schema or migration changes +- Changes to build system or CI pipeline +- Destructive operations (delete branches, force-push, reset --hard) -2. **Builder pattern:** `fluss-common/src/main/java/org/apache/fluss/config/ConfigBuilder.java` - - Canonical fluent API design +### Never Do (Without Explicit Permission) -3. **Factory pattern:** `fluss-client/src/main/java/org/apache/fluss/client/ConnectionFactory.java` - - Static factory methods with private constructor +**Prohibited actions:** +- ❌ Commit secrets, credentials, API keys, or tokens +- ❌ Push directly to `apache/fluss` upstream (always use fork) +- ❌ Force-push to shared branches (main, release branches) +- ❌ Execute destructive git operations (`reset --hard`, `clean -fdx`, `branch -D`) +- ❌ Modify generated files when code generation workflows exist +- ❌ Skip pre-commit hooks (`--no-verify`) without explicit request +- ❌ Add dependencies without discussing compatibility/licensing +- ❌ Disable Checkstyle/Spotless rules to make code pass -4. **Preconditions:** `fluss-common/src/main/java/org/apache/fluss/utils/Preconditions.java` - - Input validation utility (must be imported statically) +### Safe to Do (Within Scope) -5. **Test base:** `fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlinkTestBase.java` - - Common test setup patterns +**Encouraged autonomous actions:** +- ✅ Read any file in the repository +- ✅ Run tests and build commands +- ✅ Format code with `./mvnw spotless:apply` +- ✅ Fix Checkstyle violations following Section 1 rules +- ✅ Create feature branches in your fork +- ✅ Commit changes with proper attribution +- ✅ Push to your fork and create PRs -6. **CI configuration:** `.github/workflows/ci.yaml` - - Test stages and build commands +### Verification Requirements -### Development Checklist +**Before committing code:** +1. All tests pass: `./mvnw verify` +2. Code formatted: `./mvnw spotless:check` +3. No Checkstyle violations +4. License headers present: `./mvnw validate` +5. Self-review completed (Section 12) -Before submitting a PR: - -- [ ] Run `./mvnw spotless:apply` to format code -- [ ] Run `./mvnw clean verify` to ensure all tests pass -- [ ] Check that no forbidden imports are used -- [ ] Verify all public APIs have Javadoc -- [ ] Ensure tests use AssertJ (not JUnit assertions) -- [ ] Verify Preconditions are imported statically -- [ ] Check that ConcurrentHashMap is not directly instantiated -- [ ] Ensure file length < 3000 lines -- [ ] Remove trailing whitespace -- [ ] Follow naming conventions (Impl suffix, Abstract prefix, etc.) +**When in doubt:** Ask the user before proceeding with potentially destructive or far-reaching changes. --- -## Document Information - -**Last Updated:** 2026-03-24 -**Fluss Version:** 0.10-SNAPSHOT -**Generated-by:** AI-assisted analysis of Apache Fluss codebase using Claude Code - -**Content Verification:** -All rules, patterns, and code examples in this document are derived from and verifiable against the Apache Fluss source code. The patterns are not generated—they are documented observations of existing conventions enforced by Checkstyle, Spotless, and established through the project's codebase. - -**Contributing:** -For questions, corrections, or suggestions about this guide, please open an issue at https://github.com/apache/fluss/issues - -**License:** -This document is licensed under the Apache License 2.0, consistent with the Apache Fluss project. +**Version:** 0.10-SNAPSHOT | **License:** Apache License 2.0 From 2fe47f713edfbb8d0c526fc1c07798e7aade5b1c Mon Sep 17 00:00:00 2001 From: vaibhav kumar Date: Thu, 26 Mar 2026 16:16:47 +0530 Subject: [PATCH 3/5] [docs] Fix .gitignore trailing newline to match upstream Restore trailing newline at end of .gitignore file to match apache/fluss upstream. Previous commit accidentally removed it. Co-Authored-By: Claude Opus 4.6 --- .gitignore | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index fd06d2827a..845fa7ff97 100644 --- a/.gitignore +++ b/.gitignore @@ -44,4 +44,4 @@ website/package-lock.json website/versioned_docs website/versioned_sidebars website/versions.json -website/pnpm-lock.yaml \ No newline at end of file +website/pnpm-lock.yaml From 68630aaee8a719aa88ad88ae5e552ee3b79745cb Mon Sep 17 00:00:00 2001 From: vaibhav kumar Date: Fri, 27 Mar 2026 22:37:48 +0530 Subject: [PATCH 4/5] [docs] Make CLAUDE.md a symlink to AGENTS.md Following the Apache Airflow repository pattern, CLAUDE.md is now a symlink to AGENTS.md. This maintains a single source of truth while supporting both filename conventions for AI coding assistants. Co-Authored-By: Claude --- CLAUDE.md | 1 + 1 file changed, 1 insertion(+) create mode 120000 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 120000 index 0000000000..47dc3e3d86 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1 @@ +AGENTS.md \ No newline at end of file From e09b83a55f7c7df2bbf33100576390f10aaf9897 Mon Sep 17 00:00:00 2001 From: vaibhav kumar Date: Sat, 28 Mar 2026 23:51:10 +0530 Subject: [PATCH 5/5] [docs] Enhance AGENTS.md and add AI disclosure to PR template - Add Java Version Compatibility section to clarify Java 8 source level requirement - Add Repository Structure overview with module descriptions - Remove Co-Authored-By references from commit guidelines - Add ASF-compliant AI disclosure checkbox to PR template - Add Generated-by tag template aligned with Apache Airflow approach --- .github/PULL_REQUEST_TEMPLATE.md | 7 ++ AGENTS.md | 120 +++++++++++++++++++++++++++++-- 2 files changed, 123 insertions(+), 4 deletions(-) diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 238dab59fa..528a432564 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -13,6 +13,13 @@ - Each pull request should address only one issue, not mix up code from multiple issues. + - **Generative AI disclosure:** Indicate whether generative AI tools were used in authoring this PR. If yes, specify the tool below. + - [ ] No generative AI tools used + - [ ] Yes (please specify the tool below) + + **(The sections below can be removed for hotfixes or typos)** --> diff --git a/AGENTS.md b/AGENTS.md index 6eeeb65126..0a1ada9133 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -50,6 +50,43 @@ import org.apache.zookeeper.* // → org.apache.fluss. // ❌ Boolean.getBoolean("prop") → ✅ Boolean.parseBoolean(System.getProperty("prop")) ``` +### Java Version Compatibility + +**Source level: Java 8** — All code MUST compile with JDK 8. CI enforces this via `compile-on-jdk8`. + +**Build requirement:** Java 11 is required to build the project, but all source code must remain Java 8 compatible. + +**FORBIDDEN Java 9+ features:** +```java +// ❌ var keyword (Java 10) +var list = new ArrayList<>(); // → ✅ ArrayList list = new ArrayList<>(); + +// ❌ List.of(), Map.of(), Set.of() (Java 9) +List.of("a", "b") // → ✅ Arrays.asList("a", "b") +Map.of("k", "v") // → ✅ Collections.singletonMap("k", "v") +Set.of("a", "b") // → ✅ new HashSet<>(Arrays.asList("a", "b")) + +// ❌ Optional.isEmpty() (Java 11) +optional.isEmpty() // → ✅ !optional.isPresent() + +// ❌ String.strip(), String.isBlank() (Java 11) +string.strip() // → ✅ string.trim() +string.isBlank() // → ✅ string.trim().isEmpty() + +// ❌ Stream.toList() (Java 16) +stream.toList() // → ✅ stream.collect(Collectors.toList()) + +// ❌ Map.entry() (Java 9) +Map.entry("k", "v") // → ✅ new AbstractMap.SimpleEntry<>("k", "v") + +// ❌ InputStream.transferTo() (Java 9) +inputStream.transferTo(out) // → ✅ IOUtils.copyBytes(inputStream, out) + +// ❌ Switch expressions, text blocks, records, sealed classes, pattern matching +``` + +**FORBIDDEN language features:** Switch expressions (Java 12), text blocks (Java 13), records (Java 14), sealed classes (Java 17), pattern matching (Java 16+) + ### Testing **MANDATORY: Use AssertJ, NOT JUnit assertions:** @@ -132,6 +169,85 @@ public class ConnectionFactory { ## 3. Code Organization +### Repository Structure + +Apache Fluss follows a layered Maven module architecture: `fluss-common` (foundation) → `fluss-rpc` → `fluss-client`/`fluss-server` (peers, cannot cross-depend) → connectors → lake tiering. + +#### Core Modules + +**`fluss-common`** - Foundation: data types, config, metadata, utilities, exceptions +- Key packages: `annotation`, `config`, `types`, `row`, `record`, `metadata`, `utils`, `exception`, `fs` + +**`fluss-metrics`** - Metrics system (reporters, metric groups) + +**`fluss-rpc`** - RPC framework, Protocol Buffer messages (proto2) +- Regenerate: `./mvnw clean install -DskipTests -pl fluss-protogen,fluss-rpc` + +**`fluss-client`** - Client library for table operations +- APIs: `Connection`, `Admin`, `Table`, `LogScanner`, `LookupClient`, `UpsertWriter`, `AppendWriter` +- Packages: `admin`, `table`, `write`, `lookup`, `scanner`, `metadata` + +**`fluss-server`** - CoordinatorServer (metadata, coordination) + TabletServer (data storage) +- Packages: `coordinator`, `tablet`, `log`, `kv`, `replica`, `zk`, `metadata`, `authorizer` + +#### Connectors + +**`fluss-flink/`** - Flink connectors: `fluss-flink-common`, `fluss-flink-{1.18,1.19,1.20,2.2}`, `fluss-flink-tiering` + +**`fluss-spark/`** - Spark connectors: `fluss-spark-common`, `fluss-spark-{3.4,3.5}`, `fluss-spark-ut` + +**`fluss-kafka/`** - Kafka-compatible producer/consumer APIs + +#### Lake Tiering + +**`fluss-lake/`** - Lake format integrations: `fluss-lake-{iceberg,paimon,lance}` + +#### Filesystems + +**`fluss-filesystems/`** - Pluggable filesystem implementations: `fluss-fs-{hadoop,hdfs,s3,oss,obs,azure,gs}` + +#### Support + +**`fluss-test-utils`** - JUnit 5 extensions (`FlussClusterExtension`, `ZooKeeperExtension`), test base classes + +**`fluss-dist`** - Binary distribution with `bin/` scripts and `conf/` templates + +**`fluss-protogen`** - Protocol Buffer code generation + +**`fluss-test-coverage`** - Aggregated JaCoCo test coverage + +**`fluss-jmh`** - Performance microbenchmarks + +**`fluss-docgen`** - Configuration documentation generation + +#### Key Directories + +``` +fluss/ +├── fluss-common/ # Foundation module +├── fluss-rpc/ # RPC framework +├── fluss-client/ # Client APIs +├── fluss-server/ # Server components +├── fluss-flink/ # Flink connectors +├── fluss-spark/ # Spark connectors +├── fluss-lake/ # Lake tiering +├── fluss-filesystems/ # Filesystem plugins +├── fluss-dist/ # Binary distribution +│ └── src/main/resources/ +│ ├── bin/ # coordinator-server.sh, tablet-server.sh, local-cluster.sh +│ └── conf/ # server.yaml, log4j.properties +├── .github/workflows/ # CI pipeline (ci.yaml) +└── pom.xml # Root Maven POM +``` + +#### Package Conventions + +- **`org.apache.fluss.`** - Module root package +- **`org.apache.fluss.shaded.*`** - Shaded dependencies (Guava, Jackson, Netty, Arrow, ZooKeeper) +- **Test packages** - Mirror main structure in `src/test/java` + +--- + ### Naming Conventions | Type | Convention | Example | @@ -338,14 +454,10 @@ git remote add fork https://github.com//fluss.git [component] Brief description (under 70 chars) Detailed explanation of changes and motivation. - -Co-Authored-By: Claude ``` **Component tags:** `[client]`, `[server]`, `[rpc]`, `[flink]`, `[spark]`, `[docs]`, `[build]`, `[test]` -**AI-generated code identification:** ALWAYS include `Co-Authored-By: Claude ` in commit messages for AI-generated changes. - ### Pre-Push Self-Review Before pushing, conduct thorough self-review: