Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions pocs/linux/kernelctf/CVE-2025-39946_lts_cos/docs/exploit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Triggering the Vulnerability

This is an out-of-bounds access vulnerability in the kernel TLS subsystem, specifically in `tls_strp_copyin_frag`, the function responsible for copying incoming TLS message data from the TCP socket buffer into TLS's own internal skb.

Normally, when the TCP receive queue contains a complete TLS header, kernel TLS simply references the skb from the TCP receive queue in its internal skb, even before receiving the full TLS message body. If TCP socket buffer space runs low (e.g., under memory pressure), kernel TLS switches to copy mode: it copies data from TCP's buffer into its own internal skb to free up TCP buffer space and prevent the connection from stalling. Both behaviors are safe as long as kernel TLS has already successfully parsed a complete TLS header, because it then knows how many bytes to expect.

The vulnerability is triggered when two conditions are met simultaneously:
1. TCP socket buffer space is low, forcing kernel TLS into copy mode.
2. Kernel TLS has **not** yet successfully parsed a complete TLS header (`strp->stm.full_len == 0`).

In this state, when `tls_rx_msg_size` fails to parse the header, the function returns an error but does **not** abort the connection. Because the connection stays alive, `tls_strp_copyin_frag` is called again on the next incoming data, and the cycle repeats:

```c
static int tls_strp_copyin_frag(struct tls_strparser *strp, struct sk_buff *skb,
struct sk_buff *in_skb, unsigned int offset,
size_t in_len)
{
size_t len, chunk;
skb_frag_t *frag;
int sz;
frag = &skb_shinfo(skb)->frags[skb->len / PAGE_SIZE]; // [1]
len = in_len;
/* First make sure we got the header */
if (!strp->stm.full_len) { // [2]
/* Assume one page is more than enough for headers */
chunk = min_t(size_t, len, PAGE_SIZE - skb_frag_size(frag));
WARN_ON_ONCE(skb_copy_bits(in_skb, offset,
skb_frag_address(frag) +
skb_frag_size(frag),
chunk)); // [3]
skb->len += chunk;
skb->data_len += chunk;
skb_frag_size_add(frag, chunk);
sz = tls_rx_msg_size(strp, skb); // [4]
if (sz < 0)
return sz;
/*...*/
```

At **[1]**, the function indexes into the `frags` array using `skb->len / PAGE_SIZE`. At **[2]**, it enters the header-parsing path because `full_len` is still 0. At **[3]**, it copies incoming data into the page referenced by the current frag. At **[4]**, `tls_rx_msg_size` tries to parse the TLS header; if it fails, the function returns the error code but does not close the connection.

On each subsequent call, `skb->len` grows by the amount of data copied. Eventually, `skb->len / PAGE_SIZE` exceeds the bounds of the `frags` array, causing **[1]** to read an out-of-bounds frag entry. Since `frags` is not initialized when the skb is allocated, the OOB entries contain whatever data previously occupied that memory.

To trigger this from userspace, we send `MSG_OOB` data (which causes TLS header parsing to fail) combined with large sends to create memory pressure:

```c
/* Use OOB+large send to trigger copy mode due to memory pressure.
* OOB causes a short read.
*/
TEST_F(tls_err, oob_pressure)
{
char buf[1<<16];
int i;

memrnd(buf, sizeof(buf));

EXPECT_EQ(send(self->fd2, buf, 5, MSG_OOB), 5);
EXPECT_EQ(send(self->fd2, buf, sizeof(buf), 0), sizeof(buf));
for (i = 0; i < 64; i++)
EXPECT_EQ(send(self->fd2, buf, 5, MSG_OOB), 5);
}
```

The OOB frags entries reside at `skb_shinfo(skb)->frags`, where `skb_shinfo(skb)` is at `skb->head + skb->end`. Since the frags array is uninitialized at allocation time, we can control its contents by shaping the heap beforehand.

# Exploit

## Step 1: Fill uninitialized frags with stale page pointers

The goal is to place freed page pointers into the memory region that will later become the OOB portion of the `frags` array. When the vulnerability is triggered, the kernel will write incoming TLS data to these stale pages, giving us a page UAF write primitive.

We allocate pages via the `io_setup` syscall (AIO context ring buffer) and splice references to these pages into the same kmalloc slab region where the TLS internal skb's `skb_shinfo->frags` will reside. We then call `io_destroy` to free the AIO pages. When the TLS internal skb is subsequently allocated from the same slab, its uninitialized `frags` entries contain the now-stale pointers to the freed AIO pages.

The technique for placing page references differs between targets:
- **LTS**: Uses vsock zero-copy (`MSG_ZEROCOPY`) to splice AIO pages into vsock skb frags, which land in the same slab as the TLS skb data.
- **COS**: Vsock zero-copy is not available. Instead, we use `vmsplice` to push AIO page references into a pipe, then `splice` from the pipe into a memfd. This invokes `iter_file_splice_write`, which places the page references into the correct kmalloc cache. This technique does not work on LTS because LTS allocates skb data from the dedicated `skb_small_head_cache`.

## Step 2: Spray pipe_buffer to reclaim the freed page

Before triggering the page UAF write, we spray `pipe_buffer` objects (kmalloc-cg-192, 0xc0 bytes each) to reclaim the freed AIO page. To maximize the chance of landing a `pipe_buffer` on the target page, we interleave `pipe_buffer` allocations with `msg_msgseg` allocations:

1. Pre-allocate 0x600 pipes (each with 1 byte written to ensure a `pipe_buffer` exists).
2. Send 0x4000 messages to message queues, where each message's `msg_msgseg` continuation lands in kmalloc-cg-192.
3. Every 21 messages, expand two pipes via `fcntl(F_SETPIPE_SZ)`, which allocates new `pipe_buffer` arrays in kmalloc-cg-192.
4. Free all `msg_msgseg` objects by receiving the messages, leaving `pipe_buffer` objects scattered across the slab pages.

## Step 3: Trigger page UAF write with crafted pipe_buffer payload

We craft a buffer containing repeating fake `pipe_buffer` structures, each 0xc0 bytes. Each fake `pipe_buffer` contains:
- A fake `ops` pointer at offset +0x10, pointing to `module_sysfs_ops` (used as a fake `pipe_buf_operations` vtable).
- A stack pivot gadget address at offset +0x18, which gets called when the kernel invokes `ops->release`.
- A ROP chain starting at offset +0x20.

We first send a 5-byte `MSG_OOB` to trigger the TLS parse failure, then send the full payload buffer (64 KB). The OOB frags write causes the kernel to copy our crafted payload onto the freed page, overwriting the real `pipe_buffer` objects that were sprayed there in Step 2.

## Step 4: Execute ROP chain via pipe close (MODULE_SYSFS_OPS JOP trick)

When we close all pipes, the kernel calls `pipe_buf_release(pipe, buf)` for each buffer, which invokes `buf->ops->release(pipe, buf)`. The key insight is that `pipe_buf_operations->release` is at offset +8 in the vtable. We exploit this by pointing `buf->ops` to `module_sysfs_ops`, a kernel global of type `struct sysfs_ops`:

```c
struct sysfs_ops {
ssize_t (*show)(...); // +0x00
ssize_t (*store)(...); // +0x08 <-- same offset as ops->release
};
```

Since `release` and `store` are at the same offset (+8), the kernel actually calls `module_sysfs_ops->store`, which is `module_attr_store`. This function does:

```c
static ssize_t module_attr_store(struct kobject *kobj,
struct attribute *attr,
const char *buf, size_t len)
{
struct module_attribute *attribute = to_module_attr(attr);
// ...
attribute->store(attribute, mk, buf, len);
}
```

The crucial detail is that in `pipe_buf_release(pipe, buf)`, the second argument `buf` is a pointer to the `pipe_buffer` struct itself, which is our controlled data on the overwritten page. When `module_attr_store` is called via the vtable, `buf` becomes the `attr` parameter (second argument, `rsi`). Inside `module_attr_store`, the function reinterprets this as a `module_attribute` and calls `attribute->store(attribute, ...)`, passing `attribute` (our controlled `pipe_buffer` pointer) as `rdi`.

This gives us `rdi` pointing to our fully controlled data without needing to know any heap address in advance. We place a `push rdi; pop rsp` gadget (or equivalent stack pivot) as the `store` function pointer in our fake `pipe_buffer`. When it executes:
1. `push rdi` pushes the address of our controlled `pipe_buffer` onto the stack.
2. `pop rsp` sets the stack pointer to that address, pivoting the stack into our controlled buffer.
3. Execution continues into the ROP chain laid out at the start of the `pipe_buffer` slot.

On **LTS**, the stack pivot uses `push rdi; call [rdi]` followed by `pop rbx; pop rsp` to achieve the same effect. On **COS**, `push rdi; pop rsp` is used directly, but this gadget has a `dec [rdi]` side-effect, so the first value in the slot is pre-incremented by 1 to compensate.

The ROP chain:

1. Calls `copy_from_user(core_pattern, &user_string, 0x30)` to overwrite the kernel's `core_pattern` with `|/proc/%P/fd/666 %P`.
2. Calls `msleep(0x10000)` to keep the kernel thread alive indefinitely.

## Step 5: Privilege escalation via core_pattern

Before starting the exploit, we fork a child process that:
1. Copies the exploit binary into a memfd and dups it to fd 666.
2. Polls `/proc/sys/kernel/core_pattern` in a loop, waiting for the ROP chain to overwrite it.
3. Once the overwrite is detected, crashes itself with a null pointer dereference.

The kernel's core dump handler sees `|/proc/%P/fd/666 %P` in `core_pattern` and executes our binary (at fd 666) as root. The binary uses `pidfd_open` + `pidfd_getfd` to steal the parent process's stdin/stdout/stderr, then reads the flag.
11 changes: 11 additions & 0 deletions pocs/linux/kernelctf/CVE-2025-39946_lts_cos/docs/vulnerability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
- Requirements:
- Capabilities: None
- Kernel configuration: CONFIG_TLS
- User namespaces required: no
- Introduced by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=84c61fe1a75b
- Fixed by: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0aeb54ac4cd5cf8f60131b4d9ec0b6dc9c27b20d
- Affected kernel versions: v6.0 - v6.16
- Affected component: net/tls
- Cause: Out-of-bound access
- Syscall to disable: -
- Description: A Out-of-bound access in the Linux kernel net/tls
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
all: exploit
exploit: exploit.c
gcc -static -o exploit exploit.c
Binary file not shown.
Loading
Loading