diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000000..72ed6387e1a --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,428 @@ +# Apache Traffic Server - GitHub Copilot Instructions + +## Repository Overview + +Apache Traffic Server (ATS) is a high-performance HTTP/HTTPS caching proxy server written in C++20. It processes large-scale web traffic using an event-driven, multi-threaded architecture with a sophisticated plugin system. + +**Key Facts:** +- ~500K lines of C++20 code (strict - no C++23) +- Event-driven architecture using Continuation callbacks +- Handles HTTP/1.1, HTTP/2, HTTP/3 (QUIC) +- Supports TLS termination and caching +- Extensible via C/C++ plugins + +## Code Style Requirements + +### C++ Style Guidelines + +Some of these rules are enforced automatically by CI (via clang-format and clang-tidy); others are recommended conventions that will not themselves cause CI failures but should still be followed in code reviews. +**Formatting Rules:** +- **Line length:** 132 characters maximum +- **Indentation:** 2 spaces (never tabs) +- **Braces:** Linux kernel style (opening brace on same line) +```cpp +if (condition) { + // code here +} +``` +- **Pointer/reference alignment:** Right side +```cpp +Type *ptr; // Correct +Type &ref; // Correct +Type* ptr; // Wrong +``` +- **Empty line after declarations:** +```cpp +void function() { + int x = 5; + std::string name = "test"; + + // Empty line before first code statement + process_data(x, name); +} +``` +- **Always use braces** for if/while/for (no naked conditions): +```cpp +if (x > 0) { // Correct + foo(); +} + +if (x > 0) // Wrong - missing braces + foo(); +``` +- **Keep variable declarations together:** +```cpp +// Correct - declarations grouped together +void function() { + int count = 0; + std::string name = "test"; + auto *buffer = new_buffer(); + + // Empty line before first code statement + process_data(count, name, buffer); +} + +// Wrong - declarations scattered +void function() { + int count = 0; + process_count(count); + std::string name = "test"; // Don't scatter declarations +} +``` + +**Naming Conventions:** +- Classes: `CamelCase` → `HttpSM`, `NetVConnection`, `CacheProcessor` +- Functions/variables: `snake_case` → `handle_request()`, `server_port`, `cache_key` +- Constants/macros: `UPPER_CASE` → `HTTP_STATUS_OK`, `MAX_BUFFER_SIZE` +- Member variables: `snake_case` with no prefix → `connection_count`, `buffer_size` + +**C++20 Patterns (Use These):** +```cpp +// GOOD - Modern C++20 +auto buffer = std::make_unique(size); +for (const auto &entry : container) { + if (auto *ptr = entry.get(); ptr != nullptr) { + process(ptr); + } +} + +// AVOID - Legacy C-style +MIOBuffer *buffer = (MIOBuffer*)malloc(sizeof(MIOBuffer)); +for (int i = 0; i < container.size(); i++) { + process(container[i]); +} +``` + +**Memory Management:** +- Use RAII and smart pointers (`std::unique_ptr`, `std::shared_ptr`) +- Prefer smart pointers over raw `new`/`delete` when possible +- Use `ats_malloc()`/`ats_free()` for large allocations (not `malloc`) +- Use `IOBuffer` for network data (zero-copy design) +- Note: Some subsystems legitimately use explicit deletes / `delete this` (e.g., continuation-based code) + +**What NOT to Use:** +- ❌ C++23 features (code must compile with C++20) +- ❌ `malloc`/`free` for large allocations (use `ats_malloc`), or prefer heaps or stack allocations +- ❌ Blocking operations in event threads +- ❌ Creating threads manually (use async event system) + +### Comments and Documentation + +**Minimal comments philosophy:** +- Only add comments where code isn't self-explanatory +- **Don't** describe what the code does (the code already shows that) +- **Do** explain *why* something is done if not obvious +- Avoid stating the obvious + +```cpp +// BAD - stating the obvious +// Increment the counter +counter++; + +// GOOD - explaining why +// Skip the first element since it's always the sentinel value +counter++; + +// BAD - describing what +// Loop through all connections and close them +for (auto &conn : connections) { + conn.close(); +} + +// GOOD - explaining why (if not obvious) +// Must close connections before destroying the acceptor to avoid use-after-free +for (auto &conn : connections) { + conn.close(); +} +``` + +**When to add comments:** +- Non-obvious algorithms or math +- Workarounds for bugs in dependencies +- Performance optimizations that reduce clarity +- Security-critical sections +- Complex state machine transitions + +**When NOT to add comments:** +- Self-documenting code +- Obvious operations +- Function/variable names that explain themselves + +### Python Style + +- Python 3.11+ with type hints +- 4-space indentation (never tabs) +- Type annotations on all function signatures + +### License Headers + +**New source and test files** must start with Apache License 2.0 header (`.cc`, `.h`, `.py`, and other code files): +```cpp +/** @file + * + * Brief description of file + * + * @section license License + * + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +``` + +## Architecture Patterns + +### Event-Driven Model (CRITICAL) + +ATS uses **Continuation-based** asynchronous programming: + +```cpp +// Continuation is the core callback pattern +class MyContinuation : public Continuation { + int handle_event(int event, void *data) { + switch (event) { + case EVENT_SUCCESS: + // Handle success + return EVENT_DONE; + case EVENT_ERROR: + // Handle error + return EVENT_ERROR; + } + return EVENT_CONT; + } +}; +``` + +**Key Rules:** +- ⚠️ **NEVER block in event threads** - use async I/O or thread pools +- All async operations use `Continuation` callbacks +- Return `EVENT_DONE`, `EVENT_CONT`, or `EVENT_ERROR` from handlers +- Use `EThread::schedule()` for deferred work + +### HTTP State Machine + +The `HttpSM` class orchestrates HTTP request processing: + +```cpp +// HttpSM is the central state machine +class HttpSM : public Continuation { + // Processes requests through various states + // Hook into appropriate stage via plugin hooks + // Access transaction state via HttpTxn +}; +``` + +**Common hooks:** +- `TS_HTTP_READ_REQUEST_HDR_HOOK` - After reading client request +- `TS_HTTP_SEND_REQUEST_HDR_HOOK` - Before sending to origin +- `TS_HTTP_READ_RESPONSE_HDR_HOOK` - After reading origin response +- `TS_HTTP_SEND_RESPONSE_HDR_HOOK` - Before sending to client + +### Threading Model + +- **Event threads:** Handle most async work (never block here) +- **DNS threads:** Dedicated DNS resolution pool +- **Disk I/O threads:** Cache disk operations +- **Network threads:** Actually event threads handling network I/O + +**Rule:** Don't create threads. Use the event system or existing thread pools. + +### Debug Logging Pattern + +```cpp +// At file scope +static DbgCtl dbg_ctl{"http_sm"}; + +// In code (preferred) +Dbg(dbg_ctl, "Processing request for URL: %s", url); + +// Alternative (less common) +DbgPrint(dbg_ctl, "Processing request for URL: %s", url); +``` + +**Note:** Use `Dbg()` for new code. `DbgPrint()` exists but is rarely used (~60 vs ~3400 uses). + +## Project Structure (Key Paths) + +``` +trafficserver/ +├── src/ +│ ├── iocore/ # I/O subsystem +│ │ ├── eventsystem/ # Event engine (Continuation.h is core) +│ │ ├── cache/ # Cache implementation +│ │ ├── net/ # Network I/O, TLS, QUIC +│ │ └── dns/ # DNS resolution +│ ├── proxy/ # HTTP proxy logic +│ │ ├── http/ # HTTP/1.1 (HttpSM.cc is central) +│ │ ├── http2/ # HTTP/2 +│ │ ├── http3/ # HTTP/3 +│ │ ├── hdrs/ # Header parsing +│ │ └── logging/ # Logging +│ ├── tscore/ # Core utilities +│ ├── tsutil/ # Utilities (metrics, debugging) +│ └── api/ # Plugin API implementation +│ +├── include/ +│ ├── ts/ # Public plugin API (ts.h) +│ ├── tscpp/ # C++ plugin API +│ └── iocore/ # Internal headers +│ +├── plugins/ # Stable plugins +│ ├── header_rewrite/ # Header manipulation (see HRW.instructions.md) +│ └── experimental/ # Experimental plugins +│ +└── tools/ + └── hrw4u/ # Header Rewrite DSL compiler + +``` + +### Key Files to Understand + +- `include/iocore/eventsystem/Continuation.h` - Core async pattern +- `src/proxy/http/HttpSM.cc` - HTTP state machine (most important) +- `src/iocore/cache/Cache.cc` - Cache implementation +- `include/ts/ts.h` - Plugin API (most stable interface) +- `include/tscore/ink_memory.h` - Memory allocation functions + +## Common Patterns + +### Finding Examples + +**Before writing new code, look for similar existing code:** +- Plugin examples: `example/plugins/` for simple patterns +- Stable plugins: `plugins/` for production patterns +- Experimental plugins: `plugins/experimental/` for newer approaches + +**Pattern discovery:** +- Search for similar functionality in existing code +- Check `include/ts/ts.h` for plugin API patterns +- Look at tests in `tests/gold_tests/` for usage examples + +### Code Organization + +**Typical file structure for a plugin:** +``` +plugins/my_plugin/ +├── my_plugin.cc # Main plugin logic +├── handler.cc # Request/response handlers +├── handler.h # Handler interface +├── config.cc # Configuration parsing +└── CMakeLists.txt # Build configuration +``` + +**Typical class structure:** +- Inherit from `Continuation` for async operations +- Implement `handle_event()` for event processing +- Store state in class members, not globals +- Clean up resources in destructor (RAII) + +### Async Operation Pattern + +**General structure for async operations:** +1. Create continuation with callback +2. Initiate async operation (returns `Action*`) +3. Handle callback events in `handle_event()` +4. Return `EVENT_DONE` when complete + +**Always async, never blocking:** +- Network I/O → Use VConnection +- Cache operations → Use CacheProcessor +- DNS lookups → Use DNSProcessor +- Delayed work → Use `schedule_in()` or `schedule_at()` + +### Error Handling + +**Recoverable errors:** +- Return error codes +- Log with appropriate severity +- Clean up resources (RAII helps) + +**Unrecoverable errors:** +- Use `ink_release_assert()` for conditions that should never happen +- Log detailed context before asserting + +### Testing Approach + +**When adding new functionality:** +1. Check if unit tests exist in same directory (Catch2) +2. Add integration tests in `tests/gold_tests/` (autest) +3. Prefer `Test.ATSReplayTest()` with `replay.yaml` format (Proxy Verifier) +4. Test both success and error paths + +## Configuration + +### Adding New Configuration Records + +1. Define in `src/records/RecordsConfig.cc`: +```cpp +{RECT_CONFIG, "proxy.config.my_feature.enabled", RECD_INT, "0", RECU_RESTART_TS, RR_NULL, RECC_INT, nullptr, RECA_NULL} +``` + +2. Read in code: +```cpp +int enabled = 0; +REC_ReadConfigInteger(enabled, "proxy.config.my_feature.enabled"); +``` + +## What to Avoid + +### Common Mistakes + +❌ **Blocking in event threads:** +```cpp +// WRONG - blocks event thread +sleep(5); +blocking_network_call(); +``` + +✅ **Use async operations:** +```cpp +// CORRECT - schedules continuation +eventProcessor.schedule_in(this, HRTIME_SECONDS(5)); +``` + +❌ **Manual memory management:** +```cpp +// WRONG +auto *obj = new MyObject(); +// ... might leak if exception thrown +delete obj; +``` + +✅ **Use RAII:** +```cpp +// CORRECT +auto obj = std::make_unique(); +// Automatically cleaned up +``` + +❌ **Creating threads:** +```cpp +// WRONG +std::thread t([](){ do_work(); }); +``` + +✅ **Use event system:** +```cpp +// CORRECT +eventProcessor.schedule_imm(continuation, ET_CALL); +``` + +## Additional Resources + +- Plugin API: `include/ts/ts.h` +- Event system: `include/iocore/eventsystem/` +- HTTP state machine: `src/proxy/http/HttpSM.cc` +- Documentation: `doc/developer-guide/` diff --git a/.github/instructions/HRW.instructions.md b/.github/instructions/HRW.instructions.md new file mode 100644 index 00000000000..11f0036400e --- /dev/null +++ b/.github/instructions/HRW.instructions.md @@ -0,0 +1,223 @@ +--- +applyTo: + - "plugins/header_rewrite/**/*" + - "tools/hrw4u/**/*" +--- + +# Header Rewrite Plugin and HRW4U Transpiler + +## Overview + +Two closely related components that must be kept in sync: + +1. **header_rewrite plugin** (`plugins/header_rewrite/`) - ATS plugin for modifying HTTP headers +2. **hrw4u transpiler** (`tools/hrw4u/`) - DSL compiler for generating header_rewrite configurations + +## Critical Requirement: Feature Synchronization + +**Features added to either component may require corresponding changes in the other.** + +### When to Update Both + +- **New operator in header_rewrite** → Add syntax and code generation in hrw4u +- **New condition in header_rewrite** → Add parsing and symbols in hrw4u +- **New variable/resource in header_rewrite** → Update hrw4u symbol tables and types +- **New hook in header_rewrite** → Add hook syntax in hrw4u +- **New hrw4u syntax** → Ensure correct header_rewrite config generation + +### Bidirectional Compilation + +Both directions must work: +- **hrw4u** (forward): HRW4U source → header_rewrite config +- **u4wrh** (reverse): header_rewrite config → HRW4U source + +Round-trip test: `hrw4u example.hrw4u | u4wrh` should produce equivalent output. + +## Header Rewrite Plugin + +### Architecture + +**Core files:** +- `parser.cc/h` - Configuration syntax parser +- `factory.cc/h` - Factory for operators and conditions +- `operators.cc/h` - Header manipulation operations +- `conditions.cc/h` - Conditional logic +- `resources.cc/h` - Available variables (headers, IPs, etc.) +- `statement.cc/h` - Rule statement abstraction +- `ruleset.cc/h` - Rule collection and execution +- `matcher.cc/h` - Pattern matching +- `value.cc/h` - Value extraction and manipulation + +### Adding Features + +**New operator:** +1. Define class in `operators.h`, implement in `operators.cc` +2. Register in `factory.cc` +3. Update hrw4u: `tables.py` (forward mapping tables), `visitor.py` (forward compiler - HRW4UVisitor), and `generators.py` (reverse-resolution tables used by u4wrh) + +**New condition:** +1. Define class in `conditions.h`, implement in `conditions.cc` +2. Register in `factory.cc` +3. Update hrw4u: `visitor.py` for parsing, `tables.py` for symbol maps + +**New resource/variable:** +1. Define in `resources.h`, implement in `resources.cc` +2. Update hrw4u: `types.py` for type system, `tables.py` (OPERATOR_MAP/CONDITION_MAP/etc.) for symbol tables, `symbols.py` for resolver wiring, and `generators.py` for reverse mappings + +## HRW4U Transpiler + +### Purpose + +Provides readable DSL syntax that compiles to header_rewrite configuration. + +**Requirements:** Python 3.11+, ANTLR4 + +### Project Structure + +``` +tools/hrw4u/ +├── src/ # Python source +│ ├── common.py # Shared utilities +│ ├── types.py # Type system +│ ├── symbols.py # Symbol resolution +│ ├── hrw_symbols.py # Header rewrite symbols +│ ├── tables.py # Symbol/type tables +│ ├── visitor.py # Forward compiler (HRW4UVisitor - hrw4u script) +│ ├── hrw_visitor.py # Reverse compiler (HRWInverseVisitor - u4wrh script) +│ ├── generators.py # Reverse-resolution table generation +│ ├── validation.py # Semantic validation +│ └── lsp/ # LSP server +├── scripts/ # CLI tools +│ ├── hrw4u # Forward compiler (hrw4u → HRW config) +│ ├── u4wrh # Reverse compiler (HRW config → hrw4u) +│ └── hrw4u-lsp # LSP server +├── grammar/ # ANTLR4 grammars +└── tests/ # Test suite +``` + +### Key Modules + +**Type System (`types.py`):** +- HRW4U type hierarchy +- Variable types (string, int, bool, IP, etc.) +- Operator signatures +- Type checking and inference + +**Symbol Resolution (`symbols.py`, `hrw_symbols.py`, `tables.py`):** +- Symbol tables for variables, operators, functions +- Scope management +- Built-in symbols for header_rewrite resources + +**Reverse-Resolution Tables (`generators.py`):** +- Generates derived tables and reverse mappings from primary forward tables +- Used by u4wrh (reverse compiler) to map HRW config back to hrw4u syntax +- Eliminates duplication by maintaining single source of truth in forward tables + +**Visitors:** +- `visitor.py` (HRW4UVisitor) - Forward compilation: hrw4u DSL → header_rewrite config +- `hrw_visitor.py` (HRWInverseVisitor) - Reverse compilation: header_rewrite config → hrw4u DSL +- `kg_visitor.py` (KnowledgeGraphVisitor) - Extracts structured graph data for analysis/visualization (used by `hrw4u-kg` script, rarely modified) + +### Adding Features + +**New operator:** +1. Update grammar if new syntax needed +2. Add symbol definition in `hrw_symbols.py` +3. Add type signature in `types.py` +4. Update forward compiler in `visitor.py` (HRW4UVisitor) to handle new operator +5. Update `generators.py` to generate reverse mappings for u4wrh +6. Update reverse compiler in `hrw_visitor.py` (HRWInverseVisitor) if special handling needed +7. Add tests in `tests/test_ops.py` and `tests/test_ops_reverse.py` +8. Update corresponding header_rewrite plugin code + +**New condition:** +1. Update grammar if needed +2. Add symbol definition in `hrw_symbols.py` and type info in `types.py` +3. Update forward compiler in `visitor.py` (HRW4UVisitor) +4. Update `generators.py` for reverse mappings +5. Update reverse compiler in `hrw_visitor.py` (HRWInverseVisitor) if needed +6. Add tests +7. Update header_rewrite plugin + +**New variable:** +1. Add to symbol tables (`tables.py`, `hrw_symbols.py`) +2. Add type definition (`types.py`) +3. Update forward compiler in `visitor.py` (HRW4UVisitor) for property access +4. Update `generators.py` for reverse mappings +5. Add tests +6. Ensure header_rewrite supports it + +### Code Style + +**Python (3.11+):** +- 4-space indentation (never tabs) +- Type hints on all functions +- Dataclasses for structured data +- Modern Python features (match/case, walrus operator) + +**C++ (header_rewrite):** +- Follow ATS C++20 standards +- CamelCase classes, snake_case functions/variables +- 2-space indentation +- Empty line after declarations + +## Feature Addition Example + +**Hypothetical example to illustrate the workflow:** + +Adding a `has-prefix` operator (this operator does not exist): + +1. **header_rewrite plugin:** + ```cpp + // operators.h + class OperatorHasPrefix : public Operator { + void exec(const Resources &res) override; + }; + + // operators.cc - implement exec() + // factory.cc - register operator + ``` + +2. **hrw4u transpiler:** + ```python + # hrw_symbols.py + OPERATORS = { + 'has-prefix': OperatorSymbol( + name='has-prefix', + params=['target', 'prefix'], + return_type=BoolType() + ), + } + + # generators.py + def generate_has_prefix_op(target, prefix): + return f'has-prefix {target} {prefix}' + + # tests/test_ops.py + def test_has_prefix(): + # Test forward compilation + + # tests/test_ops_reverse.py + def test_has_prefix_reverse(): + # Test reverse compilation + ``` + +3. **Verify round-trip:** + ```bash + echo 'REMAP { if req.Host has-prefix "www." { } }' | hrw4u | u4wrh + ``` + +## Common Pitfalls + +1. **Forgetting to update both components** - Changes often need coordination +2. **Breaking round-trip** - Always test `hrw4u | u4wrh` round-trip +3. **Symbol table drift** - Keep hrw4u symbols synced with plugin capabilities +4. **Type mismatches** - Ensure type system matches plugin runtime behavior +5. **Missing tests** - Add tests for both forward and reverse compilation + +## Documentation + +- User docs: `doc/admin-guide/plugins/header_rewrite.en.html` +- Plugin README: `plugins/header_rewrite/README` +- HRW4U README: `tools/hrw4u/README.md` +- LSP README: `tools/hrw4u/LSP_README.md`