Skip to content

get_writable_page verify flags breaks due to hugepage coalescing #64

@chc4

Description

@chc4

Occasionally, the mmap syscall will crash with:

CR0: 0x80050033  CR3: 0xA000
CR2: 0x0  CR4: 0x350E20
RAX: 0x21403000  RBX: 0x0  RCX: 0x1006338
RDX: 0x3  RSI: 0x3F000  RDI: 0x0
RIP: 0x21AA  RBP: 0x1198A10  RSP: 0x11989E8
SS: 0x10  CS: 0x8  DS: 0x23  FS: 0x0  GS: 0x0
FS BASE: 0x0  GS BASE: 0x6050
Machine exception: page_at: pt entry not user writable  Data: 0x21403000
terminate called after throwing an instance of 'tinykvm::MemoryException'
  what():  page_at: pt entry not user writable

with a callstack pointing to

auto* page = memory.get_writable_page(addr & ~PageMask(), memory.expectedUsermodeFlags(), true, false);

        collect_state_guest = master_vm.mmap_allocate(0x1000, 0x7, false);
        tinykvm::page_at(master_vm.main_memory(), collect_state_guest, [] (uint64_t addr, uint64_t& entry, size_t size) {
            // Make the page executable by the user (There is probably a better way to do this?)
            entry = entry & ~PDE64_NX | PDE64_DIRTY;
        });

        // Emulate the relevant mmap
        auto new_page = master_vm.mmap_allocate(258048, 3);
        master_vm.memzero(new_page, 258048);

is a reduced reproducer, although is a symptom of the issue showing up from a userspace program executing mmap(0x0, 258048, prot=3, flags=22, vfd=-1) = 0x21403000 instead. The collect_state page is an executable memory page that I'm allocating from the VMM - the issue "goes away" if you don't set PDE64_DIRTY in page_at, however removing all of the PDE64_DIRTY flags from the original program still causes a crash in a (slightly later) mmap call instead.

I believe the issue is due to the above page_at resolving to a hugepage that the newly mmap'd region is embedded within, and so it sees that collect_state_guest has the dirty bit set and thus must_be_zeroed = true, but then the later get_writable_page gets the hugepage which has PDE64_NX cleared and fails the flag against vMemory::expectedUsermodeFlags

This is maybe a case of me holding tinykvm wrong, and executable pages should somehow be allocated separately from non-executable pages? But mmap_allocate throws away prot, and so I'm not sure how else I'm supposed to allocate code pages from the VMM. It seems like the executable_heap MachineOption configures !NX everywhere via the vMemory::expectedUsermodeFlags, and so would cause or hide this issue depending on e.g. if you have a dynamic ELF or not and turn it off - but even in the non-executable case it seems like you could get unlucky and have the initial machine mapped .text page for your ELF coalesce with the first user serviced mmap and get sad.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions