Skip to content

pceplib, pathd: improve pcep parsing and error-handling#21208

Merged
riw777 merged 4 commits intoFRRouting:masterfrom
mjstapp:fix_pceplib_parsing
Mar 24, 2026
Merged

pceplib, pathd: improve pcep parsing and error-handling#21208
riw777 merged 4 commits intoFRRouting:masterfrom
mjstapp:fix_pceplib_parsing

Conversation

@mjstapp
Copy link
Copy Markdown
Contributor

@mjstapp mjstapp commented Mar 17, 2026

Make handling of pcep messages more robust; fix several packet-handling errors and add validation in several paths. Replace some pathd asserts with less-dramatic error handling.

@frrbot frrbot bot added the bugfix label Mar 17, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR improves robustness of PCEP message parsing in FRR's pathd/pceplib by replacing crash-inducing assert() calls with graceful error handling, fixing several pre-existing parsing bugs, and adding length validation throughout the RO object decoder.

Key changes:

  • pathd/path_pcep_lib.c: Converts all duplicate-object asserts to silent if guards; refactors pcep_lib_parse_ero_sr to remove the next parameter, return NULL on any parse error, and frees the allocated hop via XFREE at a single done: label. The ERO hop chain is now built with a prepend pattern in the caller, preserving original ordering.
  • pathd/path_pcep_pcc.c: Replaces two asserts in pcep_pcc_enable with a state-validity guard logging EC_PATH_PCEP_RECOVERABLE_INTERNAL_ERROR.
  • pceplib/pcep_msg_objects_encoding.c: Major rework — fixes the IPv6 subobject decode using obj_buf instead of obj_buf + read_count (silent wrong-data bug), replaces all unsafe unaligned uint32_t * casts with memcpy, corrects the UNNUM read_count advancement (was += 2 instead of += 8), adds per-subobject length validation for every subtype, and fixes the error-path cleanup to call pcep_obj_free_object (fixing the previously noted memory leak).
  • pceplib/pcep_msg_tlvs_encoding.c: Caps memcpy length at MAX_POLICY_NAME to prevent a buffer overflow in pcep_decode_tlv_pol_name.
  • pceplib/pcep_socket_comm_loop.c: Fixes two bugs in write_message — incorrect loop condition (bytes_sent vs total_bytes_sent) and missing partial-write offset/remaining-length accounting.

Confidence Score: 3/5

  • PR fixes real bugs and improves robustness, but the heavily reworked RO object decoder warrants careful review before merge.
  • The changes are well-motivated and the overall direction is correct. Several pre-existing bugs are genuinely fixed (IPv6 wrong-pointer decode, UNNUM read_count off by 6, write_message partial-write loop, policy-name overflow). The assert-to-graceful-error conversions are straightforward. Score is 3 rather than higher because: (1) pcep_msg_objects_encoding.c is a large, densely packed decoder rewrite where each case manages read_count manually — a mis-counted advance in any single case would silently corrupt all subsequent subobject parses; (2) the new write_message silently drops a message on EAGAIN on non-blocking sockets; (3) the pcep_decode_tlv_pol_name fix can still produce an unterminated string when len == MAX_POLICY_NAME.
  • pceplib/pcep_msg_objects_encoding.c — the rewritten RO subobject decoder has per-case manual read_count management; any off-by-one in a single case silently corrupts subsequent subobject parsing. pceplib/pcep_msg_tlvs_encoding.c — potential missing null terminator when policy name length equals MAX_POLICY_NAME.

Important Files Changed

Filename Overview
pathd/path_pcep_lib.c Replaces asserts with graceful if-checks throughout message parsing; refactors pcep_lib_parse_ero_sr to drop the 'next' parameter and return NULL on error, with hop-chain built via prepend in the caller. The err_p/XFREE pattern at the 'done' label correctly frees allocated hops on error paths. Hop ordering is preserved correctly (tail→head iteration + prepend = original order).
pathd/path_pcep_pcc.c Replaces two asserts in pcep_pcc_enable with a guard that logs EC_PATH_PCEP_RECOVERABLE_INTERNAL_ERROR and returns 0 on invalid state. Clean, targeted change with appropriate error code.
pceplib/pcep_msg_objects_encoding.c Comprehensive rework of pcep_decode_obj_ro: fixes IPv6 decode using wrong buffer pointer (obj_buf vs obj_buf+read_count), replaces unsafe unaligned uint32_t* casts with memcpy, adds per-subobject length validation, introduces err_p-based cleanup via pcep_obj_free_object. The read_count accounting for UNNUM was previously wrong (+= 2 instead of += 8) and is now fixed. The bounds check guards against reading past the object body.
pceplib/pcep_msg_tlvs_encoding.c Adds a length cap before memcpy in pcep_decode_tlv_pol_name, fixing a buffer overflow when encoded_tlv_length > MAX_POLICY_NAME. Minor concern: when len == MAX_POLICY_NAME (256), the 256-byte name array is fully overwritten with no null terminator guaranteed.
pceplib/pcep_socket_comm_loop.c Fixes two bugs in write_message: the loop condition now correctly checks total_bytes_sent instead of (uint32_t)bytes_sent, and the write() call now passes the remaining length (msg_length - total_bytes_sent) and correct offset. Also simplifies error handling by returning immediately on any write error.

Sequence Diagram

sequenceDiagram
    participant Socket as pcep_socket_comm_loop
    participant Decoder as pcep_msg_objects_encoding
    participant TLVDec as pcep_msg_tlvs_encoding
    participant PathLib as path_pcep_lib
    participant PCC as path_pcep_pcc

    Socket->>Decoder: raw bytes after write_message fix
    Decoder->>Decoder: pcep_decode_obj_ro - validate subobj lengths
    alt parse error detected
        Decoder->>Decoder: err_p=true, pcep_obj_free_object
        Decoder-->>PathLib: return NULL
    else all subobjects valid
        Decoder-->>PathLib: return pcep_object_ro
    end

    TLVDec->>TLVDec: pcep_decode_tlv_pol_name - cap len at MAX_POLICY_NAME

    PathLib->>PathLib: pcep_lib_parse_path - if-guards replace asserts
    PathLib->>PathLib: pcep_lib_parse_ero - iterate tail to head
    loop each RO subobject
        PathLib->>PathLib: pcep_lib_parse_ero_sr - validate flag_s and nai_list
        alt hop valid
            PathLib->>PathLib: prepend hop to path->first_hop
        else hop NULL
            PathLib->>PathLib: flog_warn, skip subobject
        end
    end

    PCC->>PCC: pcep_pcc_enable - guard invalid state with flog_warn
Loading

Comments Outside Diff (2)

  1. pceplib/pcep_msg_tlvs_encoding.c, line 1186-1196 (link)

    Silent truncation may produce unterminated string

    name is declared as char name[MAX_POLICY_NAME] (256 bytes). When encoded_tlv_length == MAX_POLICY_NAME, the memcpy fills the buffer entirely, leaving no room for a null terminator. Because common_tlv_create uses calloc the buffer is zero-initialised before the copy, but the copy itself can overwrite the final byte with non-zero data. Any subsequent code that treats name as a C-string will read past the end of the array.

    Capping at MAX_POLICY_NAME - 1 (leaving one guaranteed zero byte) is the minimal safe fix. A log message on truncation would also help operators notice oversized names:

  2. pceplib/pcep_msg_objects_encoding.c, line 1488-1497 (link)

    Bounds check uses wrong baseline after header bytes are already consumed

    At this point in the loop, read_count has already been advanced past the 2-byte subobject header (type + length), yet obj_body_length - read_count gives the number of bytes remaining in the RO object body after those header bytes. The check correctly guards against an overly long body, but the LHS subobj_length - OBJECT_RO_SUBOBJ_HEADER_LENGTH subtracts the header length again. If subobj_length is the full wire length (including the 2-byte header), then:

    • LHS = body bytes promised by the wire header
    • RHS = bytes remaining in the buffer (already past that header)

    These are comparable, so the check is logically correct. However, the code then does not advance read_count by subobj_length - 2 at the end of each case — each case manually manages read_count. If any case under-advances read_count (e.g., the currently-preserved default: path that does read_count += subobj_length), successive iterations will start from the wrong position, potentially misinterpreting subobject boundaries. It is worth adding an explicit assertion or debug log after each case that read_count - case_start == subobj_length - OBJECT_RO_SUBOBJ_HEADER_LENGTH to catch drift early.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: pceplib/pcep_msg_tlvs_encoding.c
Line: 1186-1196

Comment:
**Silent truncation may produce unterminated string**

`name` is declared as `char name[MAX_POLICY_NAME]` (256 bytes). When `encoded_tlv_length == MAX_POLICY_NAME`, the `memcpy` fills the buffer entirely, leaving no room for a null terminator. Because `common_tlv_create` uses calloc the buffer is zero-initialised *before* the copy, but the copy itself can overwrite the final byte with non-zero data. Any subsequent code that treats `name` as a C-string will read past the end of the array.

Capping at `MAX_POLICY_NAME - 1` (leaving one guaranteed zero byte) is the minimal safe fix. A log message on truncation would also help operators notice oversized names:

```suggestion
	uint16_t len = tlv->header.encoded_tlv_length;

	if (len >= MAX_POLICY_NAME) {
		pcep_log(LOG_INFO,
			 "%s: policy name TLV length [%d] exceeds max [%d], truncating",
			 __func__, len, MAX_POLICY_NAME);
		len = MAX_POLICY_NAME - 1;
	}

	memcpy(tlv->name, tlv_body_buf, len);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: pceplib/pcep_msg_objects_encoding.c
Line: 1488-1497

Comment:
**Bounds check uses wrong baseline after header bytes are already consumed**

At this point in the loop, `read_count` has already been advanced **past** the 2-byte subobject header (`type` + `length`), yet `obj_body_length - read_count` gives the number of bytes remaining in the RO object body *after those header bytes*. The check correctly guards against an overly long body, but the LHS `subobj_length - OBJECT_RO_SUBOBJ_HEADER_LENGTH` subtracts the header length *again*. If `subobj_length` is the full wire length (including the 2-byte header), then:

- LHS = body bytes promised by the wire header  
- RHS = bytes remaining in the buffer (already past that header)

These are comparable, so the check is logically correct. However, the code then does **not** advance `read_count` by `subobj_length - 2` at the end of each case — each case manually manages `read_count`. If any case under-advances `read_count` (e.g., the currently-preserved `default:` path that does `read_count += subobj_length`), successive iterations will start from the wrong position, potentially misinterpreting subobject boundaries. It is worth adding an explicit assertion or debug log after each case that `read_count - case_start == subobj_length - OBJECT_RO_SUBOBJ_HEADER_LENGTH` to catch drift early.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "pathd: replace asser..."

Comment thread pceplib/pcep_msg_objects_encoding.c
Comment thread pathd/path_pcep_pcc.c
@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from 8ecc2a0 to 243fc75 Compare March 17, 2026 14:21
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 17, 2026

pushed updates to address the greptile comments

Copy link
Copy Markdown
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One really minor question ...

Comment thread pceplib/pcep_msg_tlvs_encoding.c Outdated
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

@greptileai review

Comment thread pathd/path_pcep_lib.c
@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from 243fc75 to 4aeeed2 Compare March 18, 2026 15:39
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

Updated to fix the comment about the error-handling for ERO subobjects

@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

@greptileai review

@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from 4aeeed2 to e8ac707 Compare March 18, 2026 16:01
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

And updating again to fix another greptile comment

@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

@greptileai review

@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from e8ac707 to 9ffe8d6 Compare March 18, 2026 17:51
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

@greptileai review

@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from 9ffe8d6 to 387246e Compare March 18, 2026 18:29
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

@greptileai review

@riw777
Copy link
Copy Markdown
Member

riw777 commented Mar 18, 2026

I don't think the lint is something that needs to be addressed here ... ??

@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

yeah, I'm afraid it is - it's better not to let those things in. I also want to wait to ensure greptile is clean

I don't think the lint is something that needs to be addressed here ... ??

@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from 387246e to 2ee9699 Compare March 18, 2026 18:45
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 18, 2026

ok, pushed a checkpatch fix, and greptile seems satisfied now.

@Jafaral
Copy link
Copy Markdown
Member

Jafaral commented Mar 19, 2026

@greptile review

Mark Stapp added 4 commits March 19, 2026 09:47
Limit the number of POLICY_NAME octets copied to avoid
overrun.

Signed-off-by: Mark Stapp <mjs@cisco.com>
Correct the arithmetic in packet-sending: test the correct
message-length, continue sending from offset into buffer
rather than start of buffer.

Signed-off-by: Mark Stapp <mjs@cisco.com>
Compute and test required bytes available before accessing
message buffer. Avoid unaligned integer casts.

Signed-off-by: Mark Stapp <mjs@cisco.com>
Avoid asserts in some paths that involve incoming messages;
handle messages with errors more cleanly. Improve
error-handling with EROs; be more careful to avoid invalid
subobjects.

Signed-off-by: Mark Stapp <mjs@cisco.com>
@mjstapp mjstapp force-pushed the fix_pceplib_parsing branch from 2ee9699 to 22959e4 Compare March 19, 2026 20:20
@mjstapp
Copy link
Copy Markdown
Contributor Author

mjstapp commented Mar 19, 2026

greptile has three complaints; I think only one was valid, and I've pushed a fix for that.

  1. "unterminated" string. RFC 9862 4.5.1 says that the data is unterminated, so that's how it's treated in the pceplib code (for now)

  2. complicated subtlv parser for the route objects - true, but ... that's what it is, and I don't propose to re-write it at this time.

  3. "default" case in the complicated parser advances too far in some circumstances - that was correct I think, and I've pushed a fix for that case.

Copy link
Copy Markdown
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@riw777 riw777 merged commit 3169afe into FRRouting:master Mar 24, 2026
20 checks passed
@mjstapp mjstapp deleted the fix_pceplib_parsing branch April 8, 2026 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants