Skip to content

fix(vfs): make dup/dup2/dup3 share open file description and move close_on_exec to per-fd#1783

Merged
fslongjin merged 2 commits intoDragonOS-Community:masterfrom
fslongjin:refactor-fd-close-on-exec-per-fd
Feb 11, 2026
Merged

fix(vfs): make dup/dup2/dup3 share open file description and move close_on_exec to per-fd#1783
fslongjin merged 2 commits intoDragonOS-Community:masterfrom
fslongjin:refactor-fd-close-on-exec-per-fd

Conversation

@fslongjin
Copy link
Member

Previously, dup/dup2/dup3 used try_clone() which created independent
File objects, violating the POSIX requirement that duplicated fds share
the same open file description (file offset, status flags). Also,
close_on_exec was stored inside File instead of being a per-fd attribute
in the fd table.

This commit:

  • Move close_on_exec from File to FileDescriptorVec as a per-fd bitmap, matching Linux's fdtable.close_on_exec semantics
  • Add alloc_fd_arc() to install a shared Arc for dup operations
  • Make dup/dup2/dup3 share Arc instead of cloning
  • Adapt all 26 call sites (open, pipe, socket, epoll, bpf, kvm, etc.) to pass explicit cloexec parameter to alloc_fd()
  • Fix CLONE_PIDFD to unconditionally set close-on-exec on pidfd
  • Fix pidfd_open to unconditionally set close-on-exec on pidfd
  • Fix do_dup3 to allow fd table expansion for high newfd values and return EBADF (not EMFILE) when newfd >= RLIMIT_NOFILE
  • Add unit tests covering shared offset, shared flags, per-fd cloexec, high fd dup2, dup3 EINVAL on same fd, and refcount after close

…se_on_exec to per-fd

   Previously, dup/dup2/dup3 used try_clone() which created independent
   File objects, violating the POSIX requirement that duplicated fds share
   the same open file description (file offset, status flags). Also,
   close_on_exec was stored inside File instead of being a per-fd attribute
   in the fd table.

   This commit:
   - Move close_on_exec from File to FileDescriptorVec as a per-fd bitmap,
     matching Linux's fdtable.close_on_exec semantics
   - Add alloc_fd_arc() to install a shared Arc<File> for dup operations
   - Make dup/dup2/dup3 share Arc<File> instead of cloning
   - Adapt all 26 call sites (open, pipe, socket, epoll, bpf, kvm, etc.)
     to pass explicit cloexec parameter to alloc_fd()
   - Fix CLONE_PIDFD to unconditionally set close-on-exec on pidfd
   - Fix pidfd_open to unconditionally set close-on-exec on pidfd
   - Fix do_dup3 to allow fd table expansion for high newfd values and
     return EBADF (not EMFILE) when newfd >= RLIMIT_NOFILE
   - Add unit tests covering shared offset, shared flags, per-fd cloexec,
     high fd dup2, dup3 EINVAL on same fd, and refcount after close

Signed-off-by: longjin <longjin@DragonOS.org>
@github-actions github-actions bot added the Bug fix A bug is fixed in this pull request label Feb 11, 2026
@fslongjin
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ce35c7b2e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

let mut fd_table_guard = fd_table.write();

let fd = fd_table_guard.alloc_fd(ep_file, None)?;
let fd = fd_table_guard.alloc_fd(ep_file, None, false)?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor EPOLL_CLOEXEC when allocating epoll fd

EventPoll::create_epoll now always calls alloc_fd(..., false), so epoll_create1(EPOLL_CLOEXEC) no longer sets FD_CLOEXEC on the returned descriptor even though create_epoll_file still accepts and forwards O_CLOEXEC in flags. This regresses close-on-exec behavior and can leak epoll fds into exec'd children.

Useful? React with 👍 / 👎.

let fd = fd_table.write().alloc_fd(file, None).map(|x| x as usize)?;
let fd = fd_table
.write()
.alloc_fd(file, None, false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve PERF_FLAG_FD_CLOEXEC on perf_event_open

perf_event_open computes file_mode with O_CLOEXEC when PERF_FLAG_FD_CLOEXEC is present, but the fd installation now hardcodes alloc_fd(..., false), which drops the requested close-on-exec bit. Callers that explicitly request cloexec will receive inheritable perf fds after execve, which is a behavioral and security regression.

Useful? React with 👍 / 👎.

Comment on lines 1457 to 1461
if self.fds.try_reserve(new_capacity - current_len).is_err() {
return Err(SystemError::ENOMEM);
}
self.fds.resize(new_capacity, None);
self.cloexec.resize(new_capacity, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reserve cloexec bitmap capacity before resizing

resize_to_capacity only checks allocation failure with self.fds.try_reserve(...), then unconditionally executes self.cloexec.resize(...). Under memory pressure this second allocation can fail outside the error path, causing a panic/abort instead of returning ENOMEM as intended by the surrounding logic.

Useful? React with 👍 / 👎.

- Update`create_epoll`and`perf_event_open`
to pass the O_CLOEXEC flag from file flags to`alloc_fd`
- Ensure`FileDescriptorVec`properly reserves capacity for the`cloexec`
vector when expanding

Signed-off-by: longjin <longjin@DragonOS.org>
@fslongjin fslongjin merged commit 50695ef into DragonOS-Community:master Feb 11, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug fix A bug is fixed in this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant