http_client: enhance parser in timeouts and protocol parsing#11686
http_client: enhance parser in timeouts and protocol parsing#11686
Conversation
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded chunked-transfer trailer parsing and trailer/header lookup to the HTTP client; exposed buffered-response processing and a connection detach API; introduced per-request net_setup binding to apply/clamp socket receive timeouts; refactored timeout/read-idle handling and adjusted OAuth2 reconnect flow; expanded unit and integration tests and test servers for chunked/timeout scenarios. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as flb_http_client
participant NetSetup as request_net_setup
participant Upstream as UpstreamConn (u_conn)
participant Server as Remote HTTP Server
rect rgba(200,230,255,0.5)
Client->>NetSetup: bind u_conn->net = request_net_setup
end
Client->>Upstream: write HTTP request
Upstream->>Server: TCP write
Server-->>Upstream: chunked response fragments (chunks, final 0, trailers)
Upstream-->>Client: deliver fragments into c->resp.data
Client->>Client: flb_http_client_process_response_buffer()
alt chunk payloads
Client->>Client: assemble chunk bodies
end
alt final 0 chunk observed
Client->>Client: set chunked_trailer_pending
Client->>Client: buffer trailer bytes -> resp.trailer_buf
end
Client->>Client: flb_http_get_response_header(key) checks headers then trailer_buf
rect rgba(200,230,255,0.5)
Client->>NetSetup: unbind restore original_net_setup
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3b3b207c04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/integration/src/server/http_server.py (1)
118-138:⚠️ Potential issue | 🟠 MajorMake the new nullable knobs explicitly clearable.
stream_fragmentsandhang_after_fragment_indexuseNoneas the disabled state, but these helpers also useNoneto mean “argument omitted.” After either field is set once, a laterconfigure_*_response(...)call cannot restore the normal path without a full reset, so retry tests cannot cleanly switch from a hung/streamed first attempt to a normal second attempt.🛠️ Proposed fix
+_UNSET = object() + -def configure_http_response(*, status_code=None, body=None, content_type=None, - delay_seconds=None, stream_fragments=None, - fragment_delay_seconds=None, - hang_before_response=None, - hang_after_fragment_index=None): - if status_code is not None: +def configure_http_response(*, status_code=_UNSET, body=_UNSET, content_type=_UNSET, + delay_seconds=_UNSET, stream_fragments=_UNSET, + fragment_delay_seconds=_UNSET, + hang_before_response=_UNSET, + hang_after_fragment_index=_UNSET): + if status_code is not _UNSET: response_config["status_code"] = status_code - if body is not None: + if body is not _UNSET: response_config["body"] = body - if content_type is not None: + if content_type is not _UNSET: response_config["content_type"] = content_type - if delay_seconds is not None: + if delay_seconds is not _UNSET: response_config["delay_seconds"] = delay_seconds - if stream_fragments is not None: - response_config["stream_fragments"] = list(stream_fragments) - if fragment_delay_seconds is not None: + if stream_fragments is not _UNSET: + response_config["stream_fragments"] = ( + None if stream_fragments is None else list(stream_fragments) + ) + if fragment_delay_seconds is not _UNSET: response_config["fragment_delay_seconds"] = fragment_delay_seconds - if hang_before_response is not None: + if hang_before_response is not _UNSET: response_config["hang_before_response"] = hang_before_response - if hang_after_fragment_index is not None: + if hang_after_fragment_index is not _UNSET: response_config["hang_after_fragment_index"] = hang_after_fragment_indexApply the same sentinel pattern to
configure_oauth_token_response(...).Also applies to: 141-163
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/src/server/http_server.py` around lines 118 - 138, The configure_http_response helper currently treats None as both "omit argument" and "explicitly clear this field", preventing tests from restoring default behavior; introduce a unique sentinel (e.g., UNSET = object()) and change parameters for the nullable knobs (at least stream_fragments and hang_after_fragment_index, and other nullable knobs referenced) to default to UNSET instead of None, then check "if param is not UNSET" to assign response_config[...] (assign the exact None if the caller passed None); apply the same sentinel pattern to configure_oauth_token_response so retries can explicitly clear those fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/flb_http_client.c`:
- Around line 1429-1450: The helper http_client_clamp_connection_io_timeout
mutates shared net_setup->io_timeout making a short request permanently lower
the upstream default; instead, stop writing to net_setup->io_timeout and compute
the effective per-operation timeout on the stack and return it (or populate an
out-parameter) for callers to use for each read/write. Locate
http_client_clamp_connection_io_timeout and change its behavior to leave struct
flb_net_setup::io_timeout untouched, return min(timeout, net_setup->io_timeout)
(or timeout if net_setup->io_timeout <= 0), and update callers that previously
relied on the mutation to use the returned effective_timeout when performing
per-read timeouts on c->u_conn and its upstream.
- Around line 394-458: chunked_data_size currently accepts any trailing garbage
after the hex size and risks integer overflow when accumulating and using value;
update the function chunked_data_size to (1) parse only hex digits, then require
that any following bytes up to CRLF are either spaces/tabs or a single ';'
starting extensions (reject characters like 'g' or other letters), (2) add
explicit overflow checks on value before each multiply/add step when doing value
= (value * 16) + digit and also check before computing total_size += value + 2
(e.g., ensure value <= SIZE_MAX - 2 - total_size or similar), and (3) validate
indexes derived from value (such as accessing line_end[2 + value]) are within
the available length to avoid wrap/over-read; return FLB_HTTP_ERROR for invalid
formats and FLB_HTTP_MORE when the buffer is too short.
In `@tests/integration/src/server/http_server.py`:
- Around line 277-287: The oauth_token handler currently always uses jsonify()
for non-streaming responses which ignores configured content_type; update
oauth_token to mirror _build_response/_build_streaming_response behavior: when
oauth_token_response["body"] is a dict or list and content_type is None, use
jsonify(...) otherwise construct and return a
Response(oauth_token_response["body"],
status=oauth_token_response["status_code"],
content_type=oauth_token_response.get("content_type")) so raw/plaintext or
malformed bodies and explicit content_type values are honored; keep the existing
delay/hang logic and streaming branch that calls _build_streaming_response().
---
Outside diff comments:
In `@tests/integration/src/server/http_server.py`:
- Around line 118-138: The configure_http_response helper currently treats None
as both "omit argument" and "explicitly clear this field", preventing tests from
restoring default behavior; introduce a unique sentinel (e.g., UNSET = object())
and change parameters for the nullable knobs (at least stream_fragments and
hang_after_fragment_index, and other nullable knobs referenced) to default to
UNSET instead of None, then check "if param is not UNSET" to assign
response_config[...] (assign the exact None if the caller passed None); apply
the same sentinel pattern to configure_oauth_token_response so retries can
explicitly clear those fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8e7b04ca-af93-4391-ba16-70e07a8c0e7f
📒 Files selected for processing (16)
include/fluent-bit/flb_http_client.hsrc/flb_http_client.ctests/integration/scenarios/out_http/config/out_http_chunked_basic.yamltests/integration/scenarios/out_http/config/out_http_chunked_read_idle_timeout.yamltests/integration/scenarios/out_http/config/out_http_chunked_response_timeout.yamltests/integration/scenarios/out_http/config/out_http_chunked_retry.yamltests/integration/scenarios/out_http/config/out_http_oauth2_timeout.yamltests/integration/scenarios/out_http/config/out_http_tls_read_idle_timeout.yamltests/integration/scenarios/out_http/config/out_http_tls_response_timeout.yamltests/integration/scenarios/out_http/tests/test_out_http_001.pytests/integration/scenarios/out_http/tests/test_out_http_chunked_response_001.pytests/integration/src/server/chunked_http_server.pytests/integration/src/server/http_server.pytests/internal/http_client.ctests/runtime/CMakeLists.txttests/runtime/http_client_chunked.c
811e9bc to
eac2d6c
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/integration/scenarios/out_http/tests/test_out_http_001.py (1)
20-25: Minor race condition in_find_free_port, but acceptable for tests.The socket is closed before returning, leaving a brief window where another process could claim the port. This is a common test pattern and the window is small, but flakiness is possible under high load.
💡 Alternative: return socket for caller to close
If flakiness occurs, consider returning the socket and having the caller close it after binding:
def _find_free_port(): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock.bind(("127.0.0.1", 0)) port = sock.getsockname()[1] sock.close() return port🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/scenarios/out_http/tests/test_out_http_001.py` around lines 20 - 25, The helper _find_free_port closes the socket before returning the port which creates a small race where another process can bind the port; to fix, change _find_free_port to keep the socket open and return both the socket and the port (or return the socket and let the caller call getsockname()), and ensure you set socket.SO_REUSEADDR on the socket (use sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)) so the caller can close the socket after it has been bound/used; update all call sites that expect a port to accept and close the returned socket when ready.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/flb_http_client.c`:
- Around line 1499-1501: The timeout-update helper is being skipped whenever
u_conn->net is NULL, so connections that inherit upstream->base.net never get
the request-specific timeout clamp; in http_client_bind_connection move or
duplicate the call to http_client_update_connection_io_timeout so it runs after
rebinding u_conn->net (or alter the helper to fall back to upstream->base.net
when u_conn->net is NULL), and apply the same fix to the other similar site
referenced (the block around the second occurrence). Ensure you reference
http_client_bind_connection, http_client_update_connection_io_timeout,
u_conn->net and upstream->base.net when making the change.
- Around line 559-565: The code currently allows a later partial chunk/trailer
to overwrite a previously-detected full chunk by returning FLB_HTTP_MORE; change
the logic in the chunk parsing path (around r->chunked_trailer_pending handling
and the found_full_chunk flag) so that once you set/decide
FLB_HTTP_CHUNK_AVAILABLE for the current parse you never downgrade it later in
the same call. Specifically, after calling chunked_trailer_block_size(...) and
setting found_full_chunk (or preparing trailer_bytes/trailer_raw_size), ensure
the function's return value preserves FLB_HTTP_CHUNK_AVAILABLE if it has been
set earlier (do not replace it with FLB_HTTP_MORE), and apply the same
preservation logic in the analogous block at lines 580-583 so partial/trailer
processing cannot override an already-available full chunk.
In `@tests/integration/src/server/http_server.py`:
- Around line 289-297: The branch that builds the OAuth token response should
JSON-serialize dict/list bodies the same way as _build_response does: if
oauth_token_response["body"] is a dict or list and the response content type is
absent or is "application/json" (or compatible JSON media type), call
jsonify(body) and return that with oauth_token_response["status_code"] instead
of passing the raw dict to Response; update the condition that checks
oauth_token_response["content_type"] and use
oauth_token_response.get("content_type")/comparison to "application/json" (or
treat None as JSON) so Response(...) only receives already-serialized
strings/bytes when the body is not a dict/list.
---
Nitpick comments:
In `@tests/integration/scenarios/out_http/tests/test_out_http_001.py`:
- Around line 20-25: The helper _find_free_port closes the socket before
returning the port which creates a small race where another process can bind the
port; to fix, change _find_free_port to keep the socket open and return both the
socket and the port (or return the socket and let the caller call
getsockname()), and ensure you set socket.SO_REUSEADDR on the socket (use
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)) so the caller can
close the socket after it has been bound/used; update all call sites that expect
a port to accept and close the returned socket when ready.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0fd90683-561f-497e-9ce9-9c8ef126b74a
📒 Files selected for processing (6)
include/fluent-bit/flb_http_client.hinclude/fluent-bit/flb_network.hsrc/flb_http_client.ctests/integration/scenarios/out_http/tests/test_out_http_001.pytests/integration/src/server/http_server.pytests/internal/http_client.c
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/internal/http_client.c
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/integration/src/server/http_server.py (1)
289-297:⚠️ Potential issue | 🟠 MajorDict/list bodies won't be JSON-serialized when
content_typeis"application/json"(the default).The condition on line 290 only uses
jsonify()whencontent_type is None, but the reset/default value is"application/json". When a dict body is configured with the default content_type,Response(dict, ...)is called, which won't JSON-serialize the body—Flask'sResponseexpects string/bytes/iterable.This differs from
_build_response()(line 208) which unconditionally usesjsonify()for dict/list bodies.🐛 Proposed fix
body = oauth_token_response["body"] - if isinstance(body, (dict, list)) and oauth_token_response["content_type"] is None: + content_type = oauth_token_response["content_type"] + if isinstance(body, (dict, list)) and content_type in (None, "application/json"): return jsonify(body), oauth_token_response["status_code"] return Response( body, status=oauth_token_response["status_code"], - content_type=oauth_token_response.get("content_type"), + content_type=content_type, )src/flb_http_client.c (1)
559-583:⚠️ Potential issue | 🟠 MajorPreserve
FLB_HTTP_CHUNK_AVAILABLEwhen full chunks exist before incomplete data.When
found_full_chunkisFLB_TRUEfrom processing earlier chunks, the early returns at lines 563-565 and 581-583 can returnFLB_HTTP_MOREinstead ofFLB_HTTP_CHUNK_AVAILABLE. For example, with input4\r\nWiki\r\n5\r\npe, theWikichunk is fully received but the caller won't see it until more bytes arrive for the partial second chunk.🔧 Proposed fix
if (r->chunked_trailer_pending == FLB_TRUE) { ret = chunked_trailer_block_size(cursor, available, &trailer_bytes, &trailer_raw_size); if (ret != FLB_HTTP_OK) { + if (ret == FLB_HTTP_MORE && found_full_chunk == FLB_TRUE) { + return FLB_HTTP_CHUNK_AVAILABLE; + } return ret; }ret = chunked_data_size(cursor, available, &chunk_header_size); if (ret != FLB_HTTP_OK) { + if (ret == FLB_HTTP_MORE && found_full_chunk == FLB_TRUE) { + return FLB_HTTP_CHUNK_AVAILABLE; + } return ret; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/flb_http_client.c` around lines 559 - 583, When early-returning from the chunked trailer and chunk header checks inside the chunk processing logic (the block guarded by r->chunked_trailer_pending and the call to chunked_data_size), preserve and propagate a previously-detected full-chunk signal by returning FLB_HTTP_CHUNK_AVAILABLE instead of blindly returning FLB_HTTP_MORE (or ret) when found_full_chunk is true; update the returns in the r->chunked_trailer_pending handling and immediately after chunked_data_size(...) to check the local found_full_chunk flag and return FLB_HTTP_CHUNK_AVAILABLE in that case, otherwise keep the existing return behavior (e.g., FLB_HTTP_MORE, FLB_HTTP_OK, FLB_HTTP_ERROR).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/scenarios/out_http/tests/test_out_http_001.py`:
- Around line 20-25: The helper _find_free_port currently creates a socket and
may leak it if getsockname() (or any call) raises; wrap socket creation/use in a
try/finally or use a context manager to ensure sock.close() always runs (e.g.,
use with socket.socket(...) as sock or wrap bind/getsockname in try: ...
finally: sock.close()), leaving the function return logic intact and still
returning the port from sock.getsockname()[1].
---
Duplicate comments:
In `@src/flb_http_client.c`:
- Around line 559-583: When early-returning from the chunked trailer and chunk
header checks inside the chunk processing logic (the block guarded by
r->chunked_trailer_pending and the call to chunked_data_size), preserve and
propagate a previously-detected full-chunk signal by returning
FLB_HTTP_CHUNK_AVAILABLE instead of blindly returning FLB_HTTP_MORE (or ret)
when found_full_chunk is true; update the returns in the
r->chunked_trailer_pending handling and immediately after chunked_data_size(...)
to check the local found_full_chunk flag and return FLB_HTTP_CHUNK_AVAILABLE in
that case, otherwise keep the existing return behavior (e.g., FLB_HTTP_MORE,
FLB_HTTP_OK, FLB_HTTP_ERROR).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d8f5177f-1ab0-41df-971f-961191fc070b
📒 Files selected for processing (6)
include/fluent-bit/flb_http_client.hinclude/fluent-bit/flb_network.hsrc/flb_http_client.ctests/integration/scenarios/out_http/tests/test_out_http_001.pytests/integration/src/server/http_server.pytests/internal/http_client.c
🚧 Files skipped from review as they are similar to previous changes (2)
- include/fluent-bit/flb_network.h
- tests/internal/http_client.c
tests/integration/scenarios/out_http/tests/test_out_http_001.py
Outdated
Show resolved
Hide resolved
eac2d6c to
0b997aa
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/src/server/http_server.py`:
- Around line 329-331: The server is bound to 0.0.0.0 which exposes the test
endpoint; change the host passed to make_server in the server_instance creation
to the loopback address (e.g., "127.0.0.1" or "localhost") so the test only
listens on localhost, leaving port, app, threaded=True, and ssl_context
unchanged (update the call site where make_server("0.0.0.0", port, app,
threaded=True, ssl_context=ssl_context) is invoked).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 05ac0587-7db5-4901-9fc0-8eabf704fa6e
📒 Files selected for processing (6)
include/fluent-bit/flb_http_client.hinclude/fluent-bit/flb_network.hsrc/flb_http_client.ctests/integration/scenarios/out_http/tests/test_out_http_001.pytests/integration/src/server/http_server.pytests/internal/http_client.c
🚧 Files skipped from review as they are similar to previous changes (4)
- include/fluent-bit/flb_network.h
- include/fluent-bit/flb_http_client.h
- tests/internal/http_client.c
- src/flb_http_client.c
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
03cd39d to
cd90179
Compare
This PR hardens the HTTP client around timeouts and chunked responses. It ensures response_timeout and read_idle_timeout are enforced consistently at the connection layer, improves failure handling for stalled or timed-out upstream reads, and fixes chunked response parsing to support incremental chunks, extensions, and trailer headers.
It also adds coverage across internal, runtime, and integration tests, including out_http retry scenarios for hung OAuth/token endpoints, stalled TLS responses, and chunked HTTP responses with trailers
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
Tests