Commit 0f03885
raid5: introduce MD_BROKEN
commit 57668f0 upstream.
Raid456 module had allowed to achieve failed state. It was fixed by
fb73b35 ("raid5: block failing device if raid will be failed").
This fix introduces a bug, now if raid5 fails during IO, it may result
with a hung task without completion. Faulty flag on the device is
necessary to process all requests and is checked many times, mainly in
analyze_stripe().
Allow to set faulty on drive again and set MD_BROKEN if raid is failed.
As a result, this level is allowed to achieve failed state again, but
communication with userspace (via -EBUSY status) will be preserved.
This restores possibility to fail array via #mdadm --set-faulty command
and will be fixed by additional verification on mdadm side.
Reproduction steps:
mdadm -CR imsm -e imsm -n 3 /dev/nvme[0-2]n1
mdadm -CR r5 -e imsm -l5 -n3 /dev/nvme[0-2]n1 --assume-clean
mkfs.xfs /dev/md126 -f
mount /dev/md126 /mnt/root/
fio --filename=/mnt/root/file --size=5GB --direct=1 --rw=randrw
--bs=64k --ioengine=libaio --iodepth=64 --runtime=240 --numjobs=4
--time_based --group_reporting --name=throughput-test-job
--eta-newline=1 &
echo 1 > /sys/block/nvme2n1/device/device/remove
echo 1 > /sys/block/nvme1n1/device/device/remove
[ 1475.787779] Call Trace:
[ 1475.793111] __schedule+0x2a6/0x700
[ 1475.799460] schedule+0x38/0xa0
[ 1475.805454] raid5_get_active_stripe+0x469/0x5f0 [raid456]
[ 1475.813856] ? finish_wait+0x80/0x80
[ 1475.820332] raid5_make_request+0x180/0xb40 [raid456]
[ 1475.828281] ? finish_wait+0x80/0x80
[ 1475.834727] ? finish_wait+0x80/0x80
[ 1475.841127] ? finish_wait+0x80/0x80
[ 1475.847480] md_handle_request+0x119/0x190
[ 1475.854390] md_make_request+0x8a/0x190
[ 1475.861041] generic_make_request+0xcf/0x310
[ 1475.868145] submit_bio+0x3c/0x160
[ 1475.874355] iomap_dio_submit_bio.isra.20+0x51/0x60
[ 1475.882070] iomap_dio_bio_actor+0x175/0x390
[ 1475.889149] iomap_apply+0xff/0x310
[ 1475.895447] ? iomap_dio_bio_actor+0x390/0x390
[ 1475.902736] ? iomap_dio_bio_actor+0x390/0x390
[ 1475.909974] iomap_dio_rw+0x2f2/0x490
[ 1475.916415] ? iomap_dio_bio_actor+0x390/0x390
[ 1475.923680] ? atime_needs_update+0x77/0xe0
[ 1475.930674] ? xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
[ 1475.938455] xfs_file_dio_aio_read+0x6b/0xe0 [xfs]
[ 1475.946084] xfs_file_read_iter+0xba/0xd0 [xfs]
[ 1475.953403] aio_read+0xd5/0x180
[ 1475.959395] ? _cond_resched+0x15/0x30
[ 1475.965907] io_submit_one+0x20b/0x3c0
[ 1475.972398] __x64_sys_io_submit+0xa2/0x180
[ 1475.979335] ? do_io_getevents+0x7c/0xc0
[ 1475.986009] do_syscall_64+0x5b/0x1a0
[ 1475.992419] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 1476.000255] RIP: 0033:0x7f11fc27978d
[ 1476.006631] Code: Bad RIP value.
[ 1476.073251] INFO: task fio:3877 blocked for more than 120 seconds.
Cc: stable@vger.kernel.org
Fixes: fb73b35 ("raid5: block failing device if raid will be failed")
Reviewd-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>1 parent 8df42bc commit 0f03885
1 file changed
+22
-25
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
686 | 686 | | |
687 | 687 | | |
688 | 688 | | |
689 | | - | |
| 689 | + | |
690 | 690 | | |
691 | | - | |
| 691 | + | |
692 | 692 | | |
693 | | - | |
694 | | - | |
| 693 | + | |
| 694 | + | |
695 | 695 | | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | | - | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
700 | 700 | | |
701 | 701 | | |
702 | 702 | | |
| |||
2877 | 2877 | | |
2878 | 2878 | | |
2879 | 2879 | | |
| 2880 | + | |
| 2881 | + | |
| 2882 | + | |
2880 | 2883 | | |
| 2884 | + | |
| 2885 | + | |
| 2886 | + | |
2881 | 2887 | | |
2882 | | - | |
2883 | | - | |
2884 | | - | |
2885 | | - | |
2886 | | - | |
2887 | | - | |
| 2888 | + | |
| 2889 | + | |
2888 | 2890 | | |
2889 | | - | |
2890 | | - | |
| 2891 | + | |
| 2892 | + | |
| 2893 | + | |
| 2894 | + | |
| 2895 | + | |
| 2896 | + | |
2891 | 2897 | | |
2892 | 2898 | | |
2893 | | - | |
2894 | | - | |
2895 | | - | |
2896 | 2899 | | |
2897 | 2900 | | |
2898 | 2901 | | |
2899 | 2902 | | |
2900 | 2903 | | |
2901 | 2904 | | |
2902 | | - | |
2903 | | - | |
2904 | | - | |
2905 | | - | |
2906 | | - | |
2907 | | - | |
2908 | 2905 | | |
2909 | 2906 | | |
2910 | 2907 | | |
| |||
0 commit comments