in_tail: harden db offset restore across restart#11691
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds DB-backed offset-marker persistence and validation to in_tail: stores a hash/size of bytes preceding persisted offsets, runs DB schema migrations, validates markers on resume to detect rewrites/truncations, adjusts file positioning/stream offset logic, tracks aged-out inodes, and adds extensive integration and multiline tests. Changes
Sequence Diagram(s)sequenceDiagram
actor Operator
participant Tail as "Tail Plugin"
participant FS as "File System"
participant DB as "SQLite DB"
Note over Tail,DB: Normal run — persist offset + marker
Operator->>Tail: Start tailing file
Tail->>FS: Read bytes
Tail->>Tail: Compute DB offset and offset_marker
Tail->>DB: INSERT/UPDATE offset, offset_marker, offset_marker_size
DB-->>Tail: OK
Note over Tail,DB: Resume after restart
Operator->>Tail: Restart/resume
Tail->>DB: SELECT offset, offset_marker, offset_marker_size
DB-->>Tail: stored_offset + marker
Tail->>FS: Read bytes preceding stored_offset
Tail->>Tail: Recompute marker
alt Marker matches
Tail->>FS: Seek to stored_offset (resume)
FS-->>Tail: Ready
else Marker mismatch
Tail->>Tail: Reset file offsets to 0
Tail->>DB: UPDATE stored offset to 0
Tail->>FS: Seek to file start (re-read)
FS-->>Tail: Ready
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
tests/integration/scenarios/in_tail/tests/test_in_tail_001.py (1)
617-640: Double-close ofwriter_oldis handled safely but could be cleaner.Line 631 calls
writer_old.close()and thefinallyblock at line 639 calls it again. While theclose()method handles this safely (checks forNone), consider removing the explicit close at line 631 since thefinallyblock will handle cleanup.♻️ Suggested simplification
os.unlink(active_log) - writer_old.close() time.sleep(3)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/scenarios/in_tail/tests/test_in_tail_001.py` around lines 617 - 640, The test test_in_tail_delete_and_recreate_same_path_is_reingested calls writer_old.close() twice (an explicit close before sleep and again in the finally block); remove the intermediate writer_old.close() and rely on the finally block to close writer_old to keep cleanup centralized and avoid redundant calls while preserving safety in PersistentWriter.close().plugins/in_tail/tail_db.c (1)
224-228: Implicit signed-to-unsigned conversion foroffset_marker_size.
sqlite3_column_int64()returns a signedsqlite3_int64, butoffset_marker_sizeissize_t(unsigned). While the value should always be non-negative in practice (it's a byte count), consider adding a defensive check or cast to make the conversion explicit.🔧 Suggested defensive cast
/* offset_marker_size: column 5 */ - *offset_marker_size = sqlite3_column_int64(ctx->stmt_get_file, 5); + *offset_marker_size = (size_t) sqlite3_column_int64(ctx->stmt_get_file, 5);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_db.c` around lines 224 - 228, The assignment to *offset_marker_size uses sqlite3_column_int64(ctx->stmt_get_file, 5) (signed sqlite3_int64) but offset_marker_size is a size_t; update the code in the function that reads columns from ctx->stmt_get_file to explicitly handle signed-to-unsigned conversion by reading the value into a sqlite3_int64 temp (e.g. val = sqlite3_column_int64(...)), check if val is negative and handle that case (e.g. treat as 0 or return an error), then assign (size_t)val to *offset_marker_size to make the conversion explicit and defensive; keep the current assignment for *offset_marker (column 4) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_tail/tail_sql.h`:
- Around line 38-39: The schema stores offset_marker and offset_marker_size as
INTEGER while the in-memory fields (uint64_t and size_t) and the code use
sqlite3_bind_int64()/sqlite3_column_int64(), which is unsafe for values >
INT64_MAX; change storage to an unsigned-safe representation (e.g., schema
columns "offset_marker" and "offset_marker_size" as BLOB or TEXT) and update the
binding/reading code to perform explicit safe conversions: when saving,
serialize the uint64_t/size_t to a fixed-width binary blob (use
sqlite3_bind_blob) or to a decimal string (use sqlite3_bind_text) and when
loading, parse/deserialize back into uint64_t/size_t with overflow checks;
alternatively, if you choose to keep INTEGER, enforce and validate values do not
exceed INT64_MAX on write/read and reject or clamp larger values to avoid UB.
Ensure changes touch the schema definition (offset_marker/offset_marker_size),
the code paths that call sqlite3_bind_int64()/sqlite3_column_int64(), and the
places that use the in-memory fields (uint64_t/size_t) so conversions and
overflow checks are consistent.
In `@tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c`:
- Around line 319-340: The test currently appends to stream id 1 without
creating a stream, conflating "no parser groups" with "invalid stream id"; in
run_empty_append() create a valid stream first (call flb_ml_stream_create(ml)
and check its return), then call flb_ml_append_text using the returned stream id
instead of the hardcoded 1, handle errors/teardown if stream creation fails, and
keep the rest of the function (time setup, append, logging, and teardown)
unchanged so the test isolates the empty-group append condition.
In
`@tests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.py`:
- Around line 41-49: The command list construction uses os.environ.get("CC",
"cc") as a single string which can include wrappers/flags (e.g., "ccache
clang"), causing ENOENT; change building of the command list so the CC value is
split into separate argv tokens before appending other args: obtain cc_value =
os.environ.get("CC", "cc"), split it (shlex.split or str.split) into tokens,
then extend the existing command list with those tokens followed by
str(PROBE_SOURCE), "-o", str(binary), *_load_probe_flags(), str(LIBFLUENT_BIT),
and the rpath flag so the executable and its flags are separate entries.
- Around line 18-31: The _load_probe_flags() routine currently ignores each
compile_commands.json entry's directory and treats CC as a single token; update
_load_probe_flags() to (1) use entry["directory"] as the working/base path when
resolving any relative -I flags produced from shlex.split(entry["command"]) so
that relative include paths are interpreted relative to entry["directory"]
(e.g., resolve "-I<path>" to an absolute or joined path when <path> is not
absolute), and (2) split the CC environment variable with
shlex.split(os.environ.get("CC", "cc")) so embedded options in CC are passed as
separate argv elements rather than a single program name; keep references to
COMPILE_COMMANDS and entries to locate the compile_commands.json parsing logic.
---
Nitpick comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 224-228: The assignment to *offset_marker_size uses
sqlite3_column_int64(ctx->stmt_get_file, 5) (signed sqlite3_int64) but
offset_marker_size is a size_t; update the code in the function that reads
columns from ctx->stmt_get_file to explicitly handle signed-to-unsigned
conversion by reading the value into a sqlite3_int64 temp (e.g. val =
sqlite3_column_int64(...)), check if val is negative and handle that case (e.g.
treat as 0 or return an error), then assign (size_t)val to *offset_marker_size
to make the conversion explicit and defensive; keep the current assignment for
*offset_marker (column 4) unchanged.
In `@tests/integration/scenarios/in_tail/tests/test_in_tail_001.py`:
- Around line 617-640: The test
test_in_tail_delete_and_recreate_same_path_is_reingested calls
writer_old.close() twice (an explicit close before sleep and again in the
finally block); remove the intermediate writer_old.close() and rely on the
finally block to close writer_old to keep cleanup centralized and avoid
redundant calls while preserving safety in PersistentWriter.close().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b7972091-0e84-429a-a332-027717ca2dd7
📒 Files selected for processing (27)
plugins/in_tail/tail_db.cplugins/in_tail/tail_file.cplugins/in_tail/tail_file.hplugins/in_tail/tail_file_internal.hplugins/in_tail/tail_sql.htests/integration/scenarios/in_tail/README.mdtests/integration/scenarios/in_tail/config/parsers_tail_it.conftests/integration/scenarios/in_tail/config/tail_docker_mode.yamltests/integration/scenarios/in_tail/config/tail_exclude_path.yamltests/integration/scenarios/in_tail/config/tail_generic_encoding.yamltests/integration/scenarios/in_tail/config/tail_gzip.yamltests/integration/scenarios/in_tail/config/tail_ignore_active_older.yamltests/integration/scenarios/in_tail/config/tail_ignore_older.yamltests/integration/scenarios/in_tail/config/tail_inotify.yamltests/integration/scenarios/in_tail/config/tail_inotify_new_files_from_tail.yamltests/integration/scenarios/in_tail/config/tail_inotify_read_from_tail.yamltests/integration/scenarios/in_tail/config/tail_multiline.yamltests/integration/scenarios/in_tail/config/tail_parser.yamltests/integration/scenarios/in_tail/config/tail_rotate_wait_short.yamltests/integration/scenarios/in_tail/config/tail_skip_long_lines.yamltests/integration/scenarios/in_tail/config/tail_stat.yamltests/integration/scenarios/in_tail/config/tail_stat_db_compare_filename.yamltests/integration/scenarios/in_tail/config/tail_truncate_long_lines.yamltests/integration/scenarios/in_tail/tests/test_in_tail_001.pytests/integration/scenarios/multiline_interface/README.mdtests/integration/scenarios/multiline_interface/assets/ml_interface_probe.ctests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.py
| " offset_marker INTEGER DEFAULT 0," \ | ||
| " offset_marker_size INTEGER DEFAULT 0" \ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify how marker fields are bound/read and whether signed SQLite APIs are used.
rg -n -C4 'offset_marker|offset_marker_size|sqlite3_bind_int64|sqlite3_column_int64|sqlite3_bind_text|sqlite3_column_text' plugins/in_tailRepository: fluent/fluent-bit
Length of output: 18213
Use unsigned-safe storage for offset_marker fields in database schema.
Lines 38–39 and 60–63 define offset_marker and offset_marker_size as SQLite INTEGER (signed 64-bit), while the in-memory fields in tail_file_internal.h (lines 119–120) are uint64_t and size_t. The code binds and reads these values using sqlite3_bind_int64() and sqlite3_column_int64(), which causes undefined behavior for values exceeding INT64_MAX. This can corrupt marker round-trips and trigger false marker mismatch during restart, resetting offsets unexpectedly.
Use BLOB or TEXT storage with explicit safe conversion, or document and enforce that markers never exceed INT64_MAX.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugins/in_tail/tail_sql.h` around lines 38 - 39, The schema stores
offset_marker and offset_marker_size as INTEGER while the in-memory fields
(uint64_t and size_t) and the code use
sqlite3_bind_int64()/sqlite3_column_int64(), which is unsafe for values >
INT64_MAX; change storage to an unsigned-safe representation (e.g., schema
columns "offset_marker" and "offset_marker_size" as BLOB or TEXT) and update the
binding/reading code to perform explicit safe conversions: when saving,
serialize the uint64_t/size_t to a fixed-width binary blob (use
sqlite3_bind_blob) or to a decimal string (use sqlite3_bind_text) and when
loading, parse/deserialize back into uint64_t/size_t with overflow checks;
alternatively, if you choose to keep INTEGER, enforce and validate values do not
exceed INT64_MAX on write/read and reject or clamp larger values to avoid UB.
Ensure changes touch the schema definition (offset_marker/offset_marker_size),
the code paths that call sqlite3_bind_int64()/sqlite3_column_int64(), and the
places that use the in-memory fields (uint64_t/size_t) so conversions and
overflow checks are consistent.
| static int run_empty_append(void) | ||
| { | ||
| int ret; | ||
| struct flb_config *config; | ||
| struct flb_ml *ml; | ||
| struct flb_time tm; | ||
|
|
||
| config = flb_config_init(); | ||
| if (!config) { | ||
| return 2; | ||
| } | ||
|
|
||
| ml = flb_ml_create(config, "empty-append"); | ||
| if (!ml) { | ||
| return teardown(config, NULL, 2); | ||
| } | ||
|
|
||
| flb_time_get(&tm); | ||
| ret = flb_ml_append_text(ml, 1, &tm, "hello", 5); | ||
| fprintf(stdout, "append_ret=%d\n", ret); | ||
|
|
||
| return teardown(config, ml, 0); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the probe file
find . -name "ml_interface_probe.c" -type fRepository: fluent/fluent-bit
Length of output: 138
🏁 Script executed:
#!/bin/bash
# Read the entire probe file to understand context
if [ -f "tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c" ]; then
wc -l "tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c"
fiRepository: fluent/fluent-bit
Length of output: 140
🏁 Script executed:
#!/bin/bash
# Read the full file to examine run_empty_append and related functions
cat -n "tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c"Repository: fluent/fluent-bit
Length of output: 13531
Isolate the empty-group append condition by creating a valid stream first.
The run_empty_append() function currently appends to stream 1 without calling flb_ml_stream_create(), unlike all other probe functions (run_single_group(), run_multi_group(), run_truncation()). This conflates two separate conditions: missing parser groups and an invalid stream id. The regression test should isolate the empty-group case with a valid stream to properly exercise the intended behavior.
🔧 Suggested fix
static int run_empty_append(void)
{
int ret;
+ uint64_t stream_id;
struct flb_config *config;
struct flb_ml *ml;
struct flb_time tm;
+ struct callback_state state = {0};
config = flb_config_init();
if (!config) {
return 2;
}
ml = flb_ml_create(config, "empty-append");
if (!ml) {
return teardown(config, NULL, 2);
}
+ ret = flb_ml_stream_create(ml, "empty-stream", -1,
+ capture_flush, &state, &stream_id);
+ if (ret != 0) {
+ return teardown(config, ml, 2);
+ }
+
flb_time_get(&tm);
- ret = flb_ml_append_text(ml, 1, &tm, "hello", 5);
+ ret = flb_ml_append_text(ml, stream_id, &tm, "hello", 5);
fprintf(stdout, "append_ret=%d\n", ret);
return teardown(config, ml, 0);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c`
around lines 319 - 340, The test currently appends to stream id 1 without
creating a stream, conflating "no parser groups" with "invalid stream id"; in
run_empty_append() create a valid stream first (call flb_ml_stream_create(ml)
and check its return), then call flb_ml_append_text using the returned stream id
instead of the hardcoded 1, handle errors/teardown if stream creation fails, and
keep the rest of the function (time setup, append, logging, and teardown)
unchanged so the test isolates the empty-group append condition.
| def _load_probe_flags(): | ||
| with COMPILE_COMMANDS.open(encoding="utf-8") as handle: | ||
| entries = json.load(handle) | ||
|
|
||
| for entry in entries: | ||
| if entry["file"].endswith("tests/internal/multiline.c"): | ||
| command = shlex.split(entry["command"]) | ||
| return [ | ||
| argument | ||
| for argument in command | ||
| if argument.startswith("-I") or argument.startswith("-D") | ||
| ] | ||
|
|
||
| raise RuntimeError("could not find tests/internal/multiline.c in compile_commands.json") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate the test file
find . -path "*multiline_interface*" -name "test_multiline_interface_001.py" -type fRepository: fluent/fluent-bit
Length of output: 148
🏁 Script executed:
# Get the directory structure to understand the layout
ls -la tests/integration/scenarios/multiline_interface/tests/ 2>/dev/null || echo "Path not found, searching..."
find . -type d -name "multiline_interface" 2>/dev/nullRepository: fluent/fluent-bit
Length of output: 302
🏁 Script executed:
# Search for the test file more broadly
fd -t f "test_multiline_interface" 2>/dev/null | head -20Repository: fluent/fluent-bit
Length of output: 146
🏁 Script executed:
# Look for any Python test files in the tests directory
fd -t f "\.py$" "tests/" | grep -i multiline | head -20Repository: fluent/fluent-bit
Length of output: 146
🏁 Script executed:
# Check repository structure
ls -la tests/ 2>/dev/null | head -30Repository: fluent/fluent-bit
Length of output: 528
🏁 Script executed:
cat -n tests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.pyRepository: fluent/fluent-bit
Length of output: 4764
🏁 Script executed:
# Check if compile_commands.json exists and examine its structure
find . -name "compile_commands.json" -type f | head -5Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Look for build directory or where compile_commands.json might be generated
find . -path "*/build/*" -name "compile_commands.json" 2>/dev/null | head -5Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Check the CMakeLists.txt or build configuration to understand compile_commands.json generation
find . -name "CMakeLists.txt" | head -10Repository: fluent/fluent-bit
Length of output: 373
🏁 Script executed:
# Check the CMakeLists.txt to see if it generates compile_commands.json
head -100 CMakeLists.txt | grep -i "compile_commands"Repository: fluent/fluent-bit
Length of output: 98
🏁 Script executed:
# Look for any CMake configuration that sets up compile_commands.json
rg "compile_commands" --type cmakeRepository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Check if there's any documentation or examples of the generated compile_commands.json
find . -path "*/build/*" -type f 2>/dev/null | head -20Repository: fluent/fluent-bit
Length of output: 1022
🏁 Script executed:
# Look at the actual probe source to understand what includes it needs
cat -n tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c | head -50Repository: fluent/fluent-bit
Length of output: 1663
🏁 Script executed:
# Search for any existing compile_commands.json examples or test fixtures
rg "compile_commands" --type json | head -20Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Check if the test directory has any compile_commands.json or related files
ls -la tests/integration/scenarios/multiline_interface/Repository: fluent/fluent-bit
Length of output: 346
🏁 Script executed:
# Look for how COMPILE_COMMANDS is used elsewhere in the codebase
rg "COMPILE_COMMANDS|compile_commands\.json" | grep -v ".git"Repository: fluent/fluent-bit
Length of output: 1424
🏁 Script executed:
# Check what the probe source file actually includes
cat tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.c | head -20Repository: fluent/fluent-bit
Length of output: 550
🏁 Script executed:
# Search for any information about what includes are needed
rg "fluent-bit" tests/integration/scenarios/multiline_interface/assets/ml_interface_probe.cRepository: fluent/fluent-bit
Length of output: 342
🏁 Script executed:
# Check the root directory structure to understand include paths
ls -la include/ | head -20Repository: fluent/fluent-bit
Length of output: 362
🏁 Script executed:
# Look for fluent-bit specific headers
find . -path "*/include/*" -name "flb_config.h" -o -name "flb_ml.h" 2>/dev/null | head -10Repository: fluent/fluent-bit
Length of output: 134
🏁 Script executed:
# Check if there's documentation on compile_commands.json structure and semantics
rg -A 5 "directory.*command" tests/ 2>/dev/null | head -30Repository: fluent/fluent-bit
Length of output: 448
🏁 Script executed:
# Understand the typical output of cmake's compile_commands.json for this project
# Look for examples in CI/build documentation
find . -name "*.md" -o -name "*.txt" | xargs grep -l "compile_commands" 2>/dev/null | head -5Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Look for any test setup or fixture that might show compile_commands.json structure
rg "directory.*file.*command" --type jsonRepository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Check the CMakeLists.txt to see what flags are typically set
cat CMakeLists.txt | grep -A 20 "COMPILE_COMMANDS\|include_directories\|add_compile_options" | head -50Repository: fluent/fluent-bit
Length of output: 2073
🏁 Script executed:
# Look at what entry the code is searching for
rg "tests/internal/multiline.c" -B 5 -A 5Repository: fluent/fluent-bit
Length of output: 2369
🏁 Script executed:
# Check if there are any documentation or examples of the build process
ls -la | grep -i readme
cat README.md | head -100Repository: fluent/fluent-bit
Length of output: 4328
🏁 Script executed:
# Search for how CC environment variable is typically used
rg "os.environ.*CC" -B 2 -A 2Repository: fluent/fluent-bit
Length of output: 1559
🏁 Script executed:
# Look for any Python documentation or notes about subprocess and cwd
rg "subprocess.run" -B 2 -A 5 tests/ | head -50Repository: fluent/fluent-bit
Length of output: 4224
🏁 Script executed:
# Verify the exact structure of compile_commands.json entries
# Let's look for the file that generates this or an example
find . -name "*.json" -path "*/build/*" 2>/dev/null | grep -i compile | head -5Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Check if there's any test or example that shows the structure
rg "directory.*command.*file" -A 2 -B 2 | head -40Repository: fluent/fluent-bit
Length of output: 1643
🏁 Script executed:
# Look for any documentation about compile_commands.json structure
rg '"directory"' -A 3 -B 1 | head -50Repository: fluent/fluent-bit
Length of output: 5132
🏁 Script executed:
# Verify by examining how the entry would look - check CMakeLists.txt for what would generate this
cat CMakeLists.txt | grep -i "multiline.c" -B 5 -A 5Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Search for the actual multiline.c file being compiled
find . -name "multiline.c" -type f 2>/dev/nullRepository: fluent/fluent-bit
Length of output: 89
🏁 Script executed:
# Let's check the tests/internal directory
ls -la tests/internal/ | head -20Repository: fluent/fluent-bit
Length of output: 1355
🏁 Script executed:
# Verify that relative paths could appear in compile_commands.json by checking how CMake handles them
# Look for any CMakeLists.txt that shows how include_directories are set
rg "include_directories|target_include_directories" tests/internal/ 2>/dev/null | head -10Repository: fluent/fluent-bit
Length of output: 213
🏁 Script executed:
# Check how the multiline.c file is compiled in CMakeLists.txt
cat tests/internal/CMakeLists.txt | grep -A 10 -B 5 "multiline.c" 2>/dev/nullRepository: fluent/fluent-bit
Length of output: 296
🏁 Script executed:
# Let's understand the structure of compile_commands.json by checking Python subprocess.run documentation implications
# Check if there are any actual failures or issues in tests by looking at CI logs or test output
find . -name "*.log" -o -name "*test*output*" 2>/dev/null | grep -i multiline | head -5Repository: fluent/fluent-bit
Length of output: 223
🏁 Script executed:
# Verify the subprocess.run behavior with CC environment variable
# Let's check if there are any existing tests that show the correct pattern for handling CC
rg "shlex.split.*environ\|subprocess.*CC" -A 3 -B 3 | head -40Repository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Final verification: check if the probe actually compiles successfully with current code
# by looking for any build documentation or examples
cat tests/integration/scenarios/multiline_interface/README.md 2>/dev/nullRepository: fluent/fluent-bit
Length of output: 1574
🏁 Script executed:
# One final check: verify the subprocess argv behavior with spaces in CC by looking at Python documentation
# or any examples showing proper handling
rg "shlex.split.*CC|CC.*split" tests/ 2>/dev/nullRepository: fluent/fluent-bit
Length of output: 43
🏁 Script executed:
# Check if there are integration tests that might exercise CC with spaces
rg "os.environ.*CC" -B 5 -A 5 | grep -i "test\|example"Repository: fluent/fluent-bit
Length of output: 2080
Use the compile database working directory and properly split the CC environment variable.
The _load_probe_flags() function discards the directory field from each compile_commands.json entry. CMake-generated entries are valid only relative to their directory value. When relative -I... paths appear (which is valid per the compile_commands.json schema), reinterpreting them from the test's cwd instead of the build directory causes probe compilation to fail or use incorrect paths.
Additionally, os.environ.get("CC", "cc") is passed as a single command argument. If CC contains embedded options (e.g., CC="gcc -Wall"), subprocess.run receives an invalid program name instead of separate argv elements.
🔧 Suggested fix
def _load_probe_flags():
with COMPILE_COMMANDS.open(encoding="utf-8") as handle:
entries = json.load(handle)
for entry in entries:
if entry["file"].endswith("tests/internal/multiline.c"):
command = shlex.split(entry["command"])
- return [
- argument
- for argument in command
- if argument.startswith("-I") or argument.startswith("-D")
- ]
+ return entry["directory"], [
+ argument
+ for argument in command
+ if argument.startswith("-I") or argument.startswith("-D")
+ ]
raise RuntimeError("could not find tests/internal/multiline.c in compile_commands.json")
`@pytest.fixture`(scope="session")
def multiline_probe(tmp_path_factory):
assert COMPILE_COMMANDS.exists(), f"missing {COMPILE_COMMANDS}"
assert LIBFLUENT_BIT.exists(), f"missing {LIBFLUENT_BIT}"
build_dir = tmp_path_factory.mktemp("multiline-probe")
binary = build_dir / "ml_interface_probe"
+ compile_cwd, probe_flags = _load_probe_flags()
command = [
- os.environ.get("CC", "cc"),
+ *shlex.split(os.environ.get("CC", "cc")),
str(PROBE_SOURCE),
"-o",
str(binary),
- *_load_probe_flags(),
+ *probe_flags,
str(LIBFLUENT_BIT),
f"-Wl,-rpath,{LIBFLUENT_BIT.parent}",
]
- subprocess.run(command, check=True, capture_output=True, text=True)
+ subprocess.run(command, check=True, capture_output=True, text=True, cwd=compile_cwd)
return binary🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.py`
around lines 18 - 31, The _load_probe_flags() routine currently ignores each
compile_commands.json entry's directory and treats CC as a single token; update
_load_probe_flags() to (1) use entry["directory"] as the working/base path when
resolving any relative -I flags produced from shlex.split(entry["command"]) so
that relative include paths are interpreted relative to entry["directory"]
(e.g., resolve "-I<path>" to an absolute or joined path when <path> is not
absolute), and (2) split the CC environment variable with
shlex.split(os.environ.get("CC", "cc")) so embedded options in CC are passed as
separate argv elements rather than a single program name; keep references to
COMPILE_COMMANDS and entries to locate the compile_commands.json parsing logic.
| command = [ | ||
| os.environ.get("CC", "cc"), | ||
| str(PROBE_SOURCE), | ||
| "-o", | ||
| str(binary), | ||
| *_load_probe_flags(), | ||
| str(LIBFLUENT_BIT), | ||
| f"-Wl,-rpath,{LIBFLUENT_BIT.parent}", | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd -t f 'test_multiline_interface_001.py'Repository: fluent/fluent-bit
Length of output: 146
🏁 Script executed:
cat -n tests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.py | head -60Repository: fluent/fluent-bit
Length of output: 2298
🏁 Script executed:
# Check if shlex is imported in the file
rg 'import shlex' tests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.pyRepository: fluent/fluent-bit
Length of output: 73
🏁 Script executed:
# Check how CC is used elsewhere in similar test files
rg -A 3 -B 3 'os.environ.get.*CC' tests/ -t pyRepository: fluent/fluent-bit
Length of output: 881
🏁 Script executed:
# Check if shlex is used elsewhere in the codebase for similar purposes
rg 'shlex.split.*CC' . -t pyRepository: fluent/fluent-bit
Length of output: 43
Split $CC before putting it in the argv.
If CC is set to a wrapper or carries flags (ccache clang, zig cc, etc.), line 42 becomes one literal executable name and the probe compile fails with ENOENT.
🔧 Suggested fix
- command = [
- os.environ.get("CC", "cc"),
+ command = [
+ *shlex.split(os.environ.get("CC", "cc")),
str(PROBE_SOURCE),
"-o",
str(binary),
*_load_probe_flags(),
str(LIBFLUENT_BIT),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@tests/integration/scenarios/multiline_interface/tests/test_multiline_interface_001.py`
around lines 41 - 49, The command list construction uses os.environ.get("CC",
"cc") as a single string which can include wrappers/flags (e.g., "ccache
clang"), causing ENOENT; change building of the command list so the CC value is
split into separate argv tokens before appending other args: obtain cc_value =
os.environ.get("CC", "cc"), split it (shlex.split or str.split) into tokens,
then extend the existing command list with those tokens followed by
str(PROBE_SOURCE), "-o", str(binary), *_load_probe_flags(), str(LIBFLUENT_BIT),
and the rpath flag so the executable and its flags are separate entries.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/internal/multiline.c (1)
1894-1940: Consider using a more explicit skip indicator for Windows.The crash tests are well-designed for Unix platforms. However, the Windows stubs only emit a
TEST_MSGand return normally, which means the test passes without actually verifying anything. While this is a common pattern, some test frameworks support explicit skip indicators that make test reports clearer.If Fluent Bit's test framework supports it, consider a pattern that explicitly marks the test as skipped rather than passing silently.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/internal/multiline.c` around lines 1894 - 1940, Replace the Windows stub behavior that just prints TEST_MSG in test_known_bug_empty_context_flush_crashes and test_known_bug_empty_context_append_crashes with the test framework's explicit skip mechanism (e.g., use the TEST_SKIP or equivalent skip macro/function provided by the Fluent Bit test harness) so the tests are reported as skipped on Windows rather than passing silently; update both functions to call the skip macro with a clear reason string instead of TEST_MSG.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/internal/multiline.c`:
- Around line 1894-1940: Replace the Windows stub behavior that just prints
TEST_MSG in test_known_bug_empty_context_flush_crashes and
test_known_bug_empty_context_append_crashes with the test framework's explicit
skip mechanism (e.g., use the TEST_SKIP or equivalent skip macro/function
provided by the Fluent Bit test harness) so the tests are reported as skipped on
Windows rather than passing silently; update both functions to call the skip
macro with a clear reason string instead of TEST_MSG.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 19fddcef-a8b3-4bdc-80fd-0f812d9236fe
📒 Files selected for processing (1)
tests/internal/multiline.c
9f51950 to
5614f61
Compare
5614f61 to
7651631
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/scenarios/in_tail/tests/test_in_tail_001.py`:
- Around line 966-984: The test
test_in_tail_discovers_file_after_permissions_are_restored should be guarded
when running as root: detect root by checking os.geteuid()==0 (or
os.getenv("SUDO_UID")/os.getuid() if necessary) and call pytest.skip or
pytest.xfail early in the test to avoid the permission-based race on root-run
CI; add the os and pytest imports if missing and place the guard before
writing/chmoding delayed_file or starting Service (refer to the
test_in_tail_discovers_file_after_permissions_are_restored function and
Service("tail_stat.yaml", ...) usage).
- Around line 891-905: The test
test_in_tail_ignore_active_older_files_stops_following_aged_file currently never
appends after the aging window so it can pass even if the plugin still follows
aged files; modify the test to append another line after the sleep to validate
the plugin stopped following: after the existing time.sleep(4) and
service.assert_no_new_records_for(1, quiet_period=4) call, call
write_and_sync(log_file, "second-line\n"), then wait briefly (e.g.,
service.wait_for_records with timeout or small sleep) and assert that the
records still only contain "first-line" (e.g., assert_log_set(records,
["first-line"]) or assert len(service.wait_for_records(...)) == 1) using the
same service, log_file, write_and_sync, service.wait_for_records and
service.assert_no_new_records_for identifiers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 82d3ca83-7172-455d-9fa1-f761f150f08f
📒 Files selected for processing (20)
tests/integration/scenarios/in_tail/README.mdtests/integration/scenarios/in_tail/config/parsers_tail_it.conftests/integration/scenarios/in_tail/config/tail_docker_mode.yamltests/integration/scenarios/in_tail/config/tail_exclude_path.yamltests/integration/scenarios/in_tail/config/tail_generic_encoding.yamltests/integration/scenarios/in_tail/config/tail_gzip.yamltests/integration/scenarios/in_tail/config/tail_ignore_active_older.yamltests/integration/scenarios/in_tail/config/tail_ignore_older.yamltests/integration/scenarios/in_tail/config/tail_inotify.yamltests/integration/scenarios/in_tail/config/tail_inotify_new_files_from_tail.yamltests/integration/scenarios/in_tail/config/tail_inotify_read_from_tail.yamltests/integration/scenarios/in_tail/config/tail_multiline.yamltests/integration/scenarios/in_tail/config/tail_parser.yamltests/integration/scenarios/in_tail/config/tail_rotate_wait_short.yamltests/integration/scenarios/in_tail/config/tail_skip_long_lines.yamltests/integration/scenarios/in_tail/config/tail_stat.yamltests/integration/scenarios/in_tail/config/tail_stat_db_compare_filename.yamltests/integration/scenarios/in_tail/config/tail_truncate_long_lines.yamltests/integration/scenarios/in_tail/tests/test_in_tail_001.pytests/internal/multiline.c
✅ Files skipped from review due to trivial changes (18)
- tests/integration/scenarios/in_tail/config/tail_multiline.yaml
- tests/integration/scenarios/in_tail/config/tail_generic_encoding.yaml
- tests/integration/scenarios/in_tail/config/tail_docker_mode.yaml
- tests/integration/scenarios/in_tail/config/tail_rotate_wait_short.yaml
- tests/integration/scenarios/in_tail/config/tail_parser.yaml
- tests/integration/scenarios/in_tail/config/tail_stat_db_compare_filename.yaml
- tests/integration/scenarios/in_tail/config/tail_gzip.yaml
- tests/integration/scenarios/in_tail/config/tail_skip_long_lines.yaml
- tests/integration/scenarios/in_tail/config/tail_exclude_path.yaml
- tests/integration/scenarios/in_tail/config/tail_inotify.yaml
- tests/integration/scenarios/in_tail/config/tail_ignore_older.yaml
- tests/integration/scenarios/in_tail/config/tail_inotify_read_from_tail.yaml
- tests/integration/scenarios/in_tail/config/tail_truncate_long_lines.yaml
- tests/integration/scenarios/in_tail/config/parsers_tail_it.conf
- tests/integration/scenarios/in_tail/config/tail_inotify_new_files_from_tail.yaml
- tests/integration/scenarios/in_tail/config/tail_stat.yaml
- tests/integration/scenarios/in_tail/config/tail_ignore_active_older.yaml
- tests/integration/scenarios/in_tail/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/internal/multiline.c
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/internal/multiline.c`:
- Around line 498-503: The test must not call flb_time_pop_from_msgpack when
msgpack_unpack_next failed; after calling msgpack_unpacked_init(&result) and ret
= msgpack_unpack_next(&result, ...), check if ret != MSGPACK_UNPACK_SUCCESS and
return or skip the remainder of the test (and destroy/uninit result if needed)
instead of continuing; update the block around msgpack_unpack_next / TEST_CHECK
to bail out early on failure before calling flb_time_pop_from_msgpack(&tm,
&result, &map).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 927968ef-e175-40f2-979e-7b298b1a7bf3
📒 Files selected for processing (1)
tests/internal/multiline.c
tests/internal/multiline.c
Outdated
| msgpack_unpacked_init(&result); | ||
| ret = msgpack_unpack_next(&result, buf_data, buf_size, &off); | ||
| TEST_CHECK(ret == MSGPACK_UNPACK_SUCCESS); | ||
|
|
||
| flb_time_pop_from_msgpack(&tm, &result, &map); | ||
| TEST_CHECK(flb_time_to_nanosec(&tm) != 0L); |
There was a problem hiding this comment.
Missing early return on unpack failure could access uninitialized data.
If msgpack_unpack_next fails, result.data is uninitialized but flb_time_pop_from_msgpack will still be called. While TEST_CHECK logs the failure, the test continues with potentially invalid data.
🛡️ Proposed fix
msgpack_unpacked_init(&result);
ret = msgpack_unpack_next(&result, buf_data, buf_size, &off);
TEST_CHECK(ret == MSGPACK_UNPACK_SUCCESS);
+ if (ret != MSGPACK_UNPACK_SUCCESS) {
+ msgpack_unpacked_destroy(&result);
+ return 0;
+ }
flb_time_pop_from_msgpack(&tm, &result, &map);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| msgpack_unpacked_init(&result); | |
| ret = msgpack_unpack_next(&result, buf_data, buf_size, &off); | |
| TEST_CHECK(ret == MSGPACK_UNPACK_SUCCESS); | |
| flb_time_pop_from_msgpack(&tm, &result, &map); | |
| TEST_CHECK(flb_time_to_nanosec(&tm) != 0L); | |
| msgpack_unpacked_init(&result); | |
| ret = msgpack_unpack_next(&result, buf_data, buf_size, &off); | |
| TEST_CHECK(ret == MSGPACK_UNPACK_SUCCESS); | |
| if (ret != MSGPACK_UNPACK_SUCCESS) { | |
| msgpack_unpacked_destroy(&result); | |
| return 0; | |
| } | |
| flb_time_pop_from_msgpack(&tm, &result, &map); | |
| TEST_CHECK(flb_time_to_nanosec(&tm) != 0L); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/internal/multiline.c` around lines 498 - 503, The test must not call
flb_time_pop_from_msgpack when msgpack_unpack_next failed; after calling
msgpack_unpacked_init(&result) and ret = msgpack_unpack_next(&result, ...),
check if ret != MSGPACK_UNPACK_SUCCESS and return or skip the remainder of the
test (and destroy/uninit result if needed) instead of continuing; update the
block around msgpack_unpack_next / TEST_CHECK to bail out early on failure
before calling flb_time_pop_from_msgpack(&tm, &result, &map).
7651631 to
d84e62a
Compare
d84e62a to
79da1f6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests/internal/multiline.c (1)
498-503:⚠️ Potential issue | 🔴 CriticalGuard the unpack failure before touching
result.
TEST_CHECK()does not stop execution here, so Line 502 can still callflb_time_pop_from_msgpack()on an invalidmsgpack_unpacked. That turns a failed assertion into undefined behavior inside the helper.Proposed fix
msgpack_unpacked_init(&result); ret = msgpack_unpack_next(&result, buf_data, buf_size, &off); TEST_CHECK(ret == MSGPACK_UNPACK_SUCCESS); + if (ret != MSGPACK_UNPACK_SUCCESS) { + msgpack_unpacked_destroy(&result); + return 0; + } flb_time_pop_from_msgpack(&tm, &result, &map);Based on learnings
TEST_CHECKintests/internal/logs failures and continues execution, so this path still needs an explicit guard before usingresult.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/internal/multiline.c` around lines 498 - 503, The test uses msgpack_unpacked (initialized with msgpack_unpacked_init) and calls msgpack_unpack_next into result, but TEST_CHECK does not abort on failure so flb_time_pop_from_msgpack(&tm, &result, &map) may operate on an invalid msgpack_unpacked; add an explicit guard after msgpack_unpack_next (check ret == MSGPACK_UNPACK_SUCCESS) and if it fails skip or return from the test (or call the test framework's fatal equivalent) before calling flb_time_pop_from_msgpack, ensuring you do not touch result when unpack failed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 261-275: flb_tail_file_update_offset_marker() can fail but its
return value is ignored here (and at the other caller around the 532-539 block);
check its return and abort or handle the insert when it fails: call
flb_tail_file_update_offset_marker(file) and if it returns < 0, do not proceed
to bind/execute ctx->stmt_insert_file — instead log the error and skip or mark
the DB row write as failed so stale/empty file->db_offset_marker and
file->db_offset_marker_size are not persisted; ensure both caller sites (this
insert using sqlite3_bind_* on ctx->stmt_insert_file and the other insert block)
perform the same check and error handling.
In `@tests/internal/multiline.c`:
- Around line 1790-1842: The test function
test_known_bug_multi_group_flush_only_first_group (and the other known_bug_*
tests referenced) assert currently-broken behavior for the
"known-bug-multi-group" multiline engine; change these to xfail/skip-style
regression tests instead of making the broken behavior the expected contract:
mark test_known_bug_multi_group_flush_only_first_group (and parallel known_bug_*
cases) as expected-to-fail or skip in the test harness (or wrap them with the
test framework's xfail/skip facility) and add a short TODO referencing the
intended correct behavior so CI won't treat these as passing regressions; ensure
you update any test discovery metadata for the parser/group names
("known-bug-multi-group", "bug_group_a", "bug_group_b") so they remain present
for regression coverage but no longer make the suite green when the engine is
incorrect.
---
Duplicate comments:
In `@tests/internal/multiline.c`:
- Around line 498-503: The test uses msgpack_unpacked (initialized with
msgpack_unpacked_init) and calls msgpack_unpack_next into result, but TEST_CHECK
does not abort on failure so flb_time_pop_from_msgpack(&tm, &result, &map) may
operate on an invalid msgpack_unpacked; add an explicit guard after
msgpack_unpack_next (check ret == MSGPACK_UNPACK_SUCCESS) and if it fails skip
or return from the test (or call the test framework's fatal equivalent) before
calling flb_time_pop_from_msgpack, ensuring you do not touch result when unpack
failed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: edea8872-7856-41d8-ab4a-a88caeaf95b6
📒 Files selected for processing (31)
plugins/in_tail/tail_config.cplugins/in_tail/tail_config.hplugins/in_tail/tail_db.cplugins/in_tail/tail_file.cplugins/in_tail/tail_file.hplugins/in_tail/tail_file_internal.hplugins/in_tail/tail_scan.cplugins/in_tail/tail_scan.hplugins/in_tail/tail_scan_glob.cplugins/in_tail/tail_scan_win32.cplugins/in_tail/tail_sql.htests/integration/scenarios/in_tail/README.mdtests/integration/scenarios/in_tail/config/parsers_tail_it.conftests/integration/scenarios/in_tail/config/tail_docker_mode.yamltests/integration/scenarios/in_tail/config/tail_exclude_path.yamltests/integration/scenarios/in_tail/config/tail_generic_encoding.yamltests/integration/scenarios/in_tail/config/tail_gzip.yamltests/integration/scenarios/in_tail/config/tail_ignore_active_older.yamltests/integration/scenarios/in_tail/config/tail_ignore_older.yamltests/integration/scenarios/in_tail/config/tail_inotify.yamltests/integration/scenarios/in_tail/config/tail_inotify_new_files_from_tail.yamltests/integration/scenarios/in_tail/config/tail_inotify_read_from_tail.yamltests/integration/scenarios/in_tail/config/tail_multiline.yamltests/integration/scenarios/in_tail/config/tail_parser.yamltests/integration/scenarios/in_tail/config/tail_rotate_wait_short.yamltests/integration/scenarios/in_tail/config/tail_skip_long_lines.yamltests/integration/scenarios/in_tail/config/tail_stat.yamltests/integration/scenarios/in_tail/config/tail_stat_db_compare_filename.yamltests/integration/scenarios/in_tail/config/tail_truncate_long_lines.yamltests/integration/scenarios/in_tail/tests/test_in_tail_001.pytests/internal/multiline.c
✅ Files skipped from review due to trivial changes (18)
- tests/integration/scenarios/in_tail/config/tail_generic_encoding.yaml
- tests/integration/scenarios/in_tail/config/tail_gzip.yaml
- tests/integration/scenarios/in_tail/config/tail_multiline.yaml
- tests/integration/scenarios/in_tail/config/tail_stat_db_compare_filename.yaml
- tests/integration/scenarios/in_tail/config/tail_skip_long_lines.yaml
- tests/integration/scenarios/in_tail/config/tail_exclude_path.yaml
- tests/integration/scenarios/in_tail/config/tail_parser.yaml
- tests/integration/scenarios/in_tail/config/tail_inotify_read_from_tail.yaml
- tests/integration/scenarios/in_tail/config/tail_inotify_new_files_from_tail.yaml
- tests/integration/scenarios/in_tail/README.md
- tests/integration/scenarios/in_tail/config/tail_rotate_wait_short.yaml
- tests/integration/scenarios/in_tail/config/tail_inotify.yaml
- tests/integration/scenarios/in_tail/config/parsers_tail_it.conf
- tests/integration/scenarios/in_tail/config/tail_stat.yaml
- tests/integration/scenarios/in_tail/config/tail_ignore_older.yaml
- tests/integration/scenarios/in_tail/config/tail_ignore_active_older.yaml
- tests/integration/scenarios/in_tail/config/tail_docker_mode.yaml
- tests/integration/scenarios/in_tail/tests/test_in_tail_001.py
🚧 Files skipped from review as they are similar to previous changes (4)
- plugins/in_tail/tail_file_internal.h
- tests/integration/scenarios/in_tail/config/tail_truncate_long_lines.yaml
- plugins/in_tail/tail_sql.h
- plugins/in_tail/tail_file.c
Persist and validate DB restore state more defensively so tail does not resume from stale positions after restart, and keep aged-out active files from being rediscovered on the next scan. This change adds a small offset marker derived from the bytes before the persisted DB offset and checks that marker before trusting the saved position on restart. If the marker no longer matches, tail resets to a safe position instead of seeking into replacement content after copytruncate or inode reuse. The restore path also persists the last resumable offset when a trailing partial line is buffered, so a restart completes that line exactly once instead of resuming after the fragment. Because the new restart metadata must survive restarts, the SQLite schema is extended and upgraded automatically in place. The migration is additive so existing databases are upgraded without manual intervention. The active-file purge path now records aged-out path and inode state so a later rescan keeps ignoring that same file while still allowing a new inode at the same path to be tailed again. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Add a comprehensive Python integration scenario for in_tail and keep the scenario documentation with it. The new coverage exercises startup positioning, discovery from head and tail, rename rotation, repeated rotation, copytruncate, restart resume, partial-line restart recovery, automatic DB schema upgrade, multiline and parser modes, long-line handling, rotate_wait, gzip, generic encoding, ignore_older, ignore_active_older_files, exclude_path, docker mode, and delayed readability restoration. The scenario also guards the permission-restoration case when running as root and verifies that ignore_active_older_files stops ingesting new writes after the file has aged out. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Add internal multiline regression coverage for the known API-level defects that are not a good fit for the Python integration suite. The new tests cover the first-group-only flush behavior, truncation dropping the overflow line, and the empty context crash paths using the existing tests/internal multiline harness. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
79da1f6 to
bffd67d
Compare
This branch hardens in_tail restart recovery and adds broad regression coverage. The main plugin change makes DB restore safer by validating persisted offsets before reuse, preventing resume into stale positions after copytruncate or inode reuse. It also persists the last resumable offset when a trailing partial line is buffered, so restarts complete that line exactly once. To support that metadata, the SQLite schema is extended with additive automatic migration.
On the test side, the branch adds a comprehensive in_tail integration scenario covering discovery, startup-from-tail, rename rotation, repeated rotation, copytruncate, restart resume, partial-line restart recovery, DB migration, multiline, parser mode, long-line handling, rotate_wait, gzip, generic encoding, ignore_older, exclude_path, docker mode, and permission restoration. It also adds a separate multiline_interface integration scenario with a compiled probe that proves current API-level multiline bugs through strict xfail regression cases.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores