Skip to content

fix: drbd-module-loader failing due to missing mount on debian trixie#872

Open
BokuNoGF wants to merge 1 commit intopiraeusdatastore:v2from
BokuNoGF:v2
Open

fix: drbd-module-loader failing due to missing mount on debian trixie#872
BokuNoGF wants to merge 1 commit intopiraeusdatastore:v2from
BokuNoGF:v2

Conversation

@BokuNoGF
Copy link
Copy Markdown

@BokuNoGF BokuNoGF commented Sep 7, 2025

The DRBD module loader fails on Debian Trixie due to a chain of softlinks from /lib/modules -> /usr/src -> /usr/lib not being available for a dependency of the make call, so add it as a volume mount.

The specific error that is seen is:

/usr/src/linux-headers-6.12.43+deb13-common/Makefile:366: /usr/src/linux-headers-6.12.43+deb13-common/scripts/Kbuild.include:

This is because scripts is actually a softlink to the /usr/lib/linux-kbuild-6.12.43+deb13/scripts/ directory on Debian Trixie and the call fails since the hostPath is not mounted into the pod.

This was replicated on Debian 13.1 using the quay.io/piraeusdatastore/drbd9-trixie:v9.2.14 image via podTemplate overrides. The fix was also tested this way.

@BokuNoGF BokuNoGF force-pushed the v2 branch 3 times, most recently from 6579bb8 to ed63cfe Compare September 7, 2025 17:43
The DRBD module loader fails on Debian Trixie due to a chain of softlinks from /lib/modules -> /usr/src -> /usr/lib not being available for a dependency of the `make` call, so add it as a volume mount.

Signed-off-by: Boku NoGF <gokunobf@gmail.com>
Copy link
Copy Markdown
Member

@WanzenBug WanzenBug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this might fix the specific issue on Debian, it is very likely that this change would break all other distributions, and in some cases even Debian.

The issue is that /usr/lib already contains libraries used by the programs of the container, such as gcc, patch, etc... and now you are replacing them with some random version from the host OS.

This can and will fail. A lot.

What we have done instead is installing linux-kbuild-* in the build containres, which installs the required scripts.

@BokuNoGF
Copy link
Copy Markdown
Author

BokuNoGF commented Sep 8, 2025

While this might fix the specific issue on Debian, it is very likely that this change would break all other distributions, and in some cases even Debian.

The issue is that /usr/lib already contains libraries used by the programs of the container, such as gcc, patch, etc... and now you are replacing them with some random version from the host OS.

This can and will fail. A lot.

What we have done instead is installing linux-kbuild-* in the build containres, which installs the required scripts.

Ah okay, yep that makes sense, good catch.

So would the fix then be to dynamically look up the host kernel in the build container and installing the appropriate kernel build packages each time on startup (since host kernel upgrades would break build containers if they're behind)?

I'm guessing this would require modifying or wrapping the entry script?

@WanzenBug
Copy link
Copy Markdown
Member

Ah, I see now that Debian switched from having one linux-kbuild-X.Y package to having linux-kbuild-X.Y.Z packages. Making this split in packaging completly pointless 🤦

It's a tricky situation. Installing the right package could work, but then we have to modify the entry.sh script which is already complicated enough for my taste.

@WanzenBug
Copy link
Copy Markdown
Member

It seems like even if we use something like mounting /usr to /host/user and do some tricks with settings the right KDIR variable won't work, because the way the symlinks and make targets are set up makes it still search for /usr/src/linux-headers-6.12.43+deb13-common instead of /host/usr/src/linux-headers-6.12.43+deb13-common 🙈

@BokuNoGF
Copy link
Copy Markdown
Author

BokuNoGF commented Sep 9, 2025

It seems like even if we use something like mounting /usr to /host/user and do some tricks with settings the right KDIR variable won't work, because the way the symlinks and make targets are set up makes it still search for /usr/src/linux-headers-6.12.43+deb13-common instead of /host/usr/src/linux-headers-6.12.43+deb13-common 🙈

One hack I can think of is to have a top level init container mount host's root (or needed dirs), copy the contents of the /lib/modules directory into an emptyDir volume using recursive cp -rL to resolve the symlinks, then mount that emptyDir into the drbd-module-loader container as /lib/modules.

But that is dirty, and probably modifying entry.sh to install the needed package dynamically would be better since the copy would be ~600MB at least on my machine.

Could also add logic to that init container to determine what specific build directory to copy as that would be significantly smaller, but would be more complex.

# du -sLh /lib/modules
587M    /lib/modules
# du -shL /lib/modules/6.12.43+deb13-amd64/build/
14M     /lib/modules/6.12.43+deb13-amd64/build/

@WanzenBug
Copy link
Copy Markdown
Member

Perhaps this is overkill, and super super hacky, but could we instead "merge" the host /usr directory with the one in the container by using overlayfs 🤔

We should be able to create our own temporary mounts: we already have permissions to insert kernel modules, so mounts should not be any worse from the permission perspective.

We would probably need to experiment with the order, I guess having the host directory as the lower_dir, container as the upper_dir and an empty working_dir should work

@WanzenBug
Copy link
Copy Markdown
Member

So this is a bit insane, but if we reconfigure the drbd-module-loader to mount /usr from the host to /usr-host in the container, and then update/replace the command to

#!/bin/sh
set -e
mount -t overlay overlay -olowerdir=/usr:/usr-host /usr
exec /entry.sh

We basically merge the whole "/usr" from the container on top of "/usr" from the host. This means we guarantee that all the build tools exist: they are all part of the /usr from the container. But we also ensured that all the kernel sources, scripts and tools are present, because they got merged in from the host.

I'll have to think a bit more if this actually a workable solution.

@mhkarimi1383
Copy link
Copy Markdown

Hi

I have manually applied your changes to one of my node's satellite And here is the result

Need a git checkout to regenerate drbd/.drbd_git_revision
make[1]: Entering directory '/tmp/pkg/drbd-9.2.15/drbd'

    Calling toplevel makefile of kernel source tree, which I believe is in
    KDIR=/lib/modules/6.12.48+deb13-amd64/build

make -C /lib/modules/6.12.48+deb13-amd64/build    "PRE_CFLAGS=" M=/tmp/pkg/drbd-9.2.15/drbd obj-m=dummy-for-compat.o dummy-for-compat-h.o
/bin/bash: line 1: x86_64-linux-gnu-gcc-14: command not found
/tmp/pkg/drbd-9.2.15/drbd/Kbuild:100: *** compiler not present or misbehaving.  Stop.
make[2]: *** [/usr/src/linux-headers-6.12.48+deb13-common/Makefile:1970: /tmp/pkg/drbd-9.2.15/drbd] Error 2
make[1]: Leaving directory '/tmp/pkg/drbd-9.2.15/drbd'
make[1]: *** [Makefile:236: compat.h] Error 2
make: *** [Makefile:131: module] Error 2

Could not find the expexted *.ko, see stderr for more details

In other nodes, I'm getting the same

@WanzenBug
Copy link
Copy Markdown
Member

Please try the following configuration:

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: debian-loading
spec:
  podTemplate:
    spec:
      initContainers:
      - name: drbd-module-loader
        command:
        - sh
        - -exc
        - |
          mount -t overlay overlay -olowerdir=/usr:/usr-host /usr
          exec /entry.sh
        securityContext:
          privileged: true
        volumeMounts:
        - name: usr
          mountPath: /usr-host
          readOnly: true
      volumes:
      - name: usr
        hostPath:
          path: /usr
          type: Directory

@mhkarimi1383
Copy link
Copy Markdown

mhkarimi1383 commented Oct 9, 2025

Please try the following configuration:

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: debian-loading
spec:
  podTemplate:
    spec:
      initContainers:
      - name: drbd-module-loader
        command:
        - sh
        - -exc
        - |
          mount -t overlay overlay -olowerdir=/usr:/usr-host /usr
          exec /entry.sh
        securityContext:
          privileged: true
        volumeMounts:
        - name: usr
          mountPath: /usr-host
          readOnly: true
      volumes:
      - name: usr
        hostPath:
          path: /usr
          type: Directory

Unfortunately I was forced to setup cluster fast and I have switched back to Debian Bookworm

@mhkarimi1383
Copy link
Copy Markdown

@WanzenBug
Hi
Any progress on Debian Trixie Support?

@WanzenBug
Copy link
Copy Markdown
Member

Can you confirm that it works with the above patch applied?

@mhkarimi1383
Copy link
Copy Markdown

Can you confirm that it works with the above patch applied?

I will try it in my R&D cluster

@cometship
Copy link
Copy Markdown
Contributor

Please try the following configuration:

apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
  name: debian-loading
spec:
  podTemplate:
    spec:
      initContainers:
      - name: drbd-module-loader
        command:
        - sh
        - -exc
        - |
          mount -t overlay overlay -olowerdir=/usr:/usr-host /usr
          exec /entry.sh
        securityContext:
          privileged: true
        volumeMounts:
        - name: usr
          mountPath: /usr-host
          readOnly: true
      volumes:
      - name: usr
        hostPath:
          path: /usr
          type: Directory

I can confirm that this works 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants