fix(vfs): make dup/dup2/dup3 share open file description and move close_on_exec to per-fd#1783
Conversation
…se_on_exec to per-fd
Previously, dup/dup2/dup3 used try_clone() which created independent
File objects, violating the POSIX requirement that duplicated fds share
the same open file description (file offset, status flags). Also,
close_on_exec was stored inside File instead of being a per-fd attribute
in the fd table.
This commit:
- Move close_on_exec from File to FileDescriptorVec as a per-fd bitmap,
matching Linux's fdtable.close_on_exec semantics
- Add alloc_fd_arc() to install a shared Arc<File> for dup operations
- Make dup/dup2/dup3 share Arc<File> instead of cloning
- Adapt all 26 call sites (open, pipe, socket, epoll, bpf, kvm, etc.)
to pass explicit cloexec parameter to alloc_fd()
- Fix CLONE_PIDFD to unconditionally set close-on-exec on pidfd
- Fix pidfd_open to unconditionally set close-on-exec on pidfd
- Fix do_dup3 to allow fd table expansion for high newfd values and
return EBADF (not EMFILE) when newfd >= RLIMIT_NOFILE
- Add unit tests covering shared offset, shared flags, per-fd cloexec,
high fd dup2, dup3 EINVAL on same fd, and refcount after close
Signed-off-by: longjin <longjin@DragonOS.org>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2ce35c7b2e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let mut fd_table_guard = fd_table.write(); | ||
|
|
||
| let fd = fd_table_guard.alloc_fd(ep_file, None)?; | ||
| let fd = fd_table_guard.alloc_fd(ep_file, None, false)?; |
There was a problem hiding this comment.
Honor EPOLL_CLOEXEC when allocating epoll fd
EventPoll::create_epoll now always calls alloc_fd(..., false), so epoll_create1(EPOLL_CLOEXEC) no longer sets FD_CLOEXEC on the returned descriptor even though create_epoll_file still accepts and forwards O_CLOEXEC in flags. This regresses close-on-exec behavior and can leak epoll fds into exec'd children.
Useful? React with 👍 / 👎.
kernel/src/perf/mod.rs
Outdated
| let fd = fd_table.write().alloc_fd(file, None).map(|x| x as usize)?; | ||
| let fd = fd_table | ||
| .write() | ||
| .alloc_fd(file, None, false) |
There was a problem hiding this comment.
Preserve PERF_FLAG_FD_CLOEXEC on perf_event_open
perf_event_open computes file_mode with O_CLOEXEC when PERF_FLAG_FD_CLOEXEC is present, but the fd installation now hardcodes alloc_fd(..., false), which drops the requested close-on-exec bit. Callers that explicitly request cloexec will receive inheritable perf fds after execve, which is a behavioral and security regression.
Useful? React with 👍 / 👎.
| if self.fds.try_reserve(new_capacity - current_len).is_err() { | ||
| return Err(SystemError::ENOMEM); | ||
| } | ||
| self.fds.resize(new_capacity, None); | ||
| self.cloexec.resize(new_capacity, false); |
There was a problem hiding this comment.
Reserve cloexec bitmap capacity before resizing
resize_to_capacity only checks allocation failure with self.fds.try_reserve(...), then unconditionally executes self.cloexec.resize(...). Under memory pressure this second allocation can fail outside the error path, causing a panic/abort instead of returning ENOMEM as intended by the surrounding logic.
Useful? React with 👍 / 👎.
- Update`create_epoll`and`perf_event_open` to pass the O_CLOEXEC flag from file flags to`alloc_fd` - Ensure`FileDescriptorVec`properly reserves capacity for the`cloexec` vector when expanding Signed-off-by: longjin <longjin@DragonOS.org>
Previously, dup/dup2/dup3 used try_clone() which created independent
File objects, violating the POSIX requirement that duplicated fds share
the same open file description (file offset, status flags). Also,
close_on_exec was stored inside File instead of being a per-fd attribute
in the fd table.
This commit: