Skip to content

initramfs: Move UFS and SDHCI storage drivers to initramfs#2286

Open
Kavinaya99 wants to merge 4 commits into
qualcomm-linux:masterfrom
Kavinaya99:configs
Open

initramfs: Move UFS and SDHCI storage drivers to initramfs#2286
Kavinaya99 wants to merge 4 commits into
qualcomm-linux:masterfrom
Kavinaya99:configs

Conversation

@Kavinaya99
Copy link
Copy Markdown
Contributor

Move UFS and SDHCI storage drivers from static kernel build to initramfs modules to improve boot initialization timing

Copy link
Copy Markdown
Contributor

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe ordering issues and race conditions with other subsystems on some platforms.

Which probe issues? Which race conditions? Which platforms are affected and how? Please be exact.

Also drop the template or AI prompt, describing the changes. It's pretty obvious from the commit itself. Focus on something which is not obvious - reasons, errors, affected devices.

Comment thread recipes-kernel/linux/linux-qcom-6.18/configs/bsp-additions.cfg Outdated
initramfs-module-udev \
kernel-module-governor-simpleondemand \
kernel-module-ufshcd-core \
kernel-module-ufshcd-pltfrm \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't those pulled in by module dependencies?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without adding these modules to the recipe, we are getting "unable to mount root fs" errors.

kernel-module-ufshcd-core \
kernel-module-ufshcd-pltfrm \
kernel-module-ufs-qcom \
kernel-module-sdhci-msm \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the kernel modules should go into the variable MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS, which is more appropriate for the purpose. In its current form, we will get an error if any of the kernel modules are built-in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving kernel modules to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS ensures they are included in the image when built as modules, but it does not guarantee that they will be loaded during boot.
I have tried using MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS and saw bootup failures as modules were not loaded.
The error: root '/dev/disk/by-partlabel/rootfs' doesn't exist or does not contain a /dev.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving kernel modules to MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS ensures they are included in the image when built as modules

Well. No. Packages listed in MACHINE_ESSENTIAL_EXTRA_RRECOMMENDS don't get included into the initramfs-rootfs-image. The recipe overrides PACKAGE_INSTALL (purposedly), so packagegroup-core-boot doesn't get included into the image. That's why you've observed errors.

At the same time, no, you can't list modules here. The image should be generic. Also, it should not fail to build if one changes the kernel config. If you need modules, you need to have a packagegroup which would recommend necessary packages. I'd have said that you should resurrect packagegroup-qcom-boot and initramfs-qcom-image, see commit 05b73a1 ("initramfs-qcom-image: remove the recipe and packagegroup").

Also, this commit should be the first one, otherwise booting of the image would be broken between commits moving the drivers to the modules and this one (and thus breaking git bisect, which is a bad idea).

@Kavinaya99
Copy link
Copy Markdown
Contributor Author

probe ordering issues and race conditions with other subsystems on some platforms.

Which probe issues? Which race conditions? Which platforms are affected and how? Please be exact.

Also drop the template or AI prompt, describing the changes. It's pretty obvious from the commit itself. Focus on something which is not obvious - reasons, errors, affected devices.

Updated the commit message

Copy link
Copy Markdown
Contributor

@lumag lumag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.

Hmm, no. The GPU SMMU and the UFS SMMU are two different SMMU instances. So, the fact that 3da0000.iommu has not probed should not affect probing of 15000000.iommu and the UFS.

Please continue and find the actual root cause. What exactly is causing UFS to not to probe?

The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()

@shashim-quic
Copy link
Copy Markdown

shashim-quic commented May 29, 2026

On QCS8300 (Monaco), the kernel experiences a race condition
during boot where UFS storage driver, ARM-SMMU, and GPUCC (GPU
Clock Controller) all initialize at the same device_initcall
level (level 6). This creates a dependency chain where UFS
requires SMMU, SMMU requires GPUCC clocks, but GPUCC may not
finish initialization before SMMU times out waiting for it.

Hmm, no. The GPU SMMU and the UFS SMMU are two different SMMU instances. So, the fact that 3da0000.iommu has not probed should not affect probing of 15000000.iommu and the UFS.

Please continue and find the actual root cause. What exactly is causing UFS to not to probe?

The commit msg need to be re-written. The issue is not GPU SMMU blocking UFS , but actually few configs which are part of static image however their dependencies are configured as modules. This causes repeated probe deferrals sometime delaying ufs bring up and in some other cases exceeding probe deferral timeout that blocks other re-probes.

Few cases from last debug that were identified are below:

  • SMMU configured as 'y' while GPUCC as 'm' leading to repeated probe deferral of adreno smmu
  • USB controller configured as 'y' while usb phy as 'm'

I prefer such modules (which are dependencies for drivers that are part of static kernel image) be moved to initramfs so boot delays (and deferrable timeout) can be better managed.

The race condition manifests as:
gcc-qcs8300 100000.clock-controller: sync_state() pending
due to 3d90000.clock-controller
arm-smmu 3da0000.iommu: deferred probe timeout, ignoring
dependency
arm-smmu 3da0000.iommu: probe with driver arm-smmu failed with
error -110
Kernel panic in iommu_domain_free() at gmu_core_iommu_init()

@lumag
Copy link
Copy Markdown
Contributor

lumag commented May 30, 2026

SMMU configured as 'y' while GPUCC as 'm' leading to repeated probe deferral of adreno smmu

Is this verified? Doesn't fw_devlink take care of this? There should be no probe_deferral's (as in probe() callback returning -EPROBE_DEFER).

USB controller configured as 'y' while usb phy as 'm'

Ideally fw_devlink should take care of it and prevent extra probes and extra deferrals.

Anyway, I don't see changes related to either of your points in this PR.

Currently, some drivers are compiled as part of the
static kernel image (=y in the upstream defconfig) while
their dependencies are configured as modules (=m) and
reside in rootfs. Examples include:
1.CONFIG_ARM_SMMU_QCOM=y (for GPU) depends on
  CONFIG_SA_GPUCC_8775P=m
2.CONFIG_USB_DWC3_QCOM=y depends on
  CONFIG_PHY_QCOM_USB_SNPS_FEMTO_V2=m

Due to this configuration mismatch, built-in drivers
attempt to probe before their module dependencies are
loaded from rootfs, causing repeated probe deferrals.
Randomly, the kernel deferred probe timeout exceeds
leading to probe failures.

This probe timeout issue has cascading effects. When
critical drivers like GPU SMMU or USB controller fail
to probe due to missing dependencies, it delays the
overall boot process and affects other subsystems,
causing secondary failures in storage, camera, and
other drivers.

These modules will be packaged in ramdisk to ensure
they are available early during boot, preventing
probe timeout issues and boot delays.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Create initramfs-qcom-image that extends initramfs-rootfs-image
with QCOM specific boot requirements via packagegroup-qcom-boot.

This ensures boot-critical kernel modules (GPUCC, FEMTO PHY, UFS)
are included in the initramfs, allowing built-in drivers to find
their module dependencies early during boot and avoiding deferred
probe timeout issues.

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Remove CONFIG_SCSI_UFS_QCOM=y from bsp-additions.cfg, allowing
it to default to module (=m) configuration. The module will be
loaded from initramfs during early boot.

The current kernel configuration creates a dependency mismatch
where built-in drivers (=y) depend on modules (=m) located in
rootfs. Examples include:
1.CONFIG_ARM_SMMU_QCOM=y depends on CONFIG_SA_GPUCC_8775P=m
2.CONFIG_USB_DWC3_QCOM=y depends on
  CONFIG_PHY_QCOM_USB_SNPS_FEMTO_V2=m
This mismatch causes built-in drivers to probe before their
module dependencies are available, resulting in repeated probe
deferrals. When the kernel's deferred probe timeout expires,
these drivers fail to initialize, causing boot delays,
cascading failures in dependent subsystems

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
Remove CONFIG_SCSI_UFS_QCOM=y from bsp-additions.cfg, allowing
it to default to module (=m) configuration. The module will be
loaded from initramfs during early boot.

The current kernel configuration creates a dependency mismatch
where built-in drivers (=y) depend on modules (=m) located in
rootfs. Examples include:
1.CONFIG_ARM_SMMU_QCOM=y depends on CONFIG_SA_GPUCC_8775P=m
2.CONFIG_USB_DWC3_QCOM=y depends on
  CONFIG_PHY_QCOM_USB_SNPS_FEMTO_V2=m
This mismatch causes built-in drivers to probe before their
module dependencies are available, resulting in repeated probe
deferrals. When the kernel's deferred probe timeout expires,
these drivers fail to initialize, causing boot delays,
cascading failures in dependent subsystems

Signed-off-by: Kavinaya S <kavinaya@qti.qualcomm.com>
@Kavinaya99
Copy link
Copy Markdown
Contributor Author

SMMU configured as 'y' while GPUCC as 'm' leading to repeated probe deferral of adreno smmu

Is this verified? Doesn't fw_devlink take care of this? There should be no probe_deferral's (as in probe() callback returning -EPROBE_DEFER).

USB controller configured as 'y' while usb phy as 'm'

Ideally fw_devlink should take care of it and prevent extra probes and extra deferrals.

Anyway, I don't see changes related to either of your points in this PR.

Updated the contents and commit message accordingly.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Test run workflow

Test jobs for commit 512a9f3

qcom-distro_linux-qcom-6.18
Pass: 1 | Fail: 6 | Total: 7
qcom-distro
Pass: 2 | Fail: 7 | Total: 9
nodistro
Pass: 3 | Fail: 6 | Total: 9

@test-reporting-app
Copy link
Copy Markdown

Test Results

 25 files   25 suites   2h 25m 32s ⏱️
 13 tests   9 ✅ 0 💤  4 ❌
206 runs  130 ✅ 0 💤 76 ❌

For more details on these failures, see this check.

Results for commit 512a9f3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants