Skip to content

Commit a627947

Browse files
fdmananakdave
authored andcommitted
Btrfs: fix deadlock when allocating tree block during leaf/node split
When splitting a leaf or node from one of the trees that are modified when flushing pending block groups (extent, chunk, device and free space trees), we need to allocate a new tree block, which in turn can result in the need to allocate a new block group. After allocating the new block group we may need to flush new block groups that were previously allocated during the course of the current transaction, which is what may cause a deadlock due to attempts to write lock twice the same leaf or node, as when splitting a leaf or node we are holding a write lock on it and its parent node. The same type of deadlock can also happen when increasing the tree's height, since we are holding a lock on the existing root while allocating the tree block to use as the new root node. An example trace when the deadlock happens during the leaf split path is: [27175.293054] CPU: 0 PID: 3005 Comm: kworker/u17:6 Tainted: G W 4.19.16 #1 [27175.293942] Hardware name: Penguin Computing Relion 1900/MD90-FS0-ZB-XX, BIOS R15 06/25/2018 [27175.294846] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] (...) [27175.298384] RSP: 0018:ffffab2087107758 EFLAGS: 00010246 [27175.299269] RAX: 0000000000000bbd RBX: ffff9fadc7141c48 RCX: 0000000000000001 [27175.300155] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9fadc7141c48 [27175.301023] RBP: 0000000000000001 R08: ffff9faeb6ac1040 R09: ffff9fa9c0000000 [27175.301887] R10: 0000000000000000 R11: 0000000000000040 R12: ffff9fb21aac8000 [27175.302743] R13: ffff9fb1a64d6a20 R14: 0000000000000001 R15: ffff9fb1a64d6a18 [27175.303601] FS: 0000000000000000(0000) GS:ffff9fb21fa00000(0000) knlGS:0000000000000000 [27175.304468] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [27175.305339] CR2: 00007fdc8743ead8 CR3: 0000000763e0a006 CR4: 00000000003606f0 [27175.306220] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [27175.307087] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [27175.307940] Call Trace: [27175.308802] btrfs_search_slot+0x779/0x9a0 [btrfs] [27175.309669] ? update_space_info+0xba/0xe0 [btrfs] [27175.310534] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [27175.311397] btrfs_insert_item+0x60/0xd0 [btrfs] [27175.312253] btrfs_create_pending_block_groups+0xee/0x210 [btrfs] [27175.313116] do_chunk_alloc+0x25f/0x300 [btrfs] [27175.313984] find_free_extent+0x706/0x10d0 [btrfs] [27175.314855] btrfs_reserve_extent+0x9b/0x1d0 [btrfs] [27175.315707] btrfs_alloc_tree_block+0x100/0x5b0 [btrfs] [27175.316548] split_leaf+0x130/0x610 [btrfs] [27175.317390] btrfs_search_slot+0x94d/0x9a0 [btrfs] [27175.318235] btrfs_insert_empty_items+0x67/0xc0 [btrfs] [27175.319087] alloc_reserved_file_extent+0x84/0x2c0 [btrfs] [27175.319938] __btrfs_run_delayed_refs+0x596/0x1150 [btrfs] [27175.320792] btrfs_run_delayed_refs+0xed/0x1b0 [btrfs] [27175.321643] delayed_ref_async_start+0x81/0x90 [btrfs] [27175.322491] normal_work_helper+0xd0/0x320 [btrfs] [27175.323328] ? move_linked_works+0x6e/0xa0 [27175.324160] process_one_work+0x191/0x370 [27175.324976] worker_thread+0x4f/0x3b0 [27175.325763] kthread+0xf8/0x130 [27175.326531] ? rescuer_thread+0x320/0x320 [27175.327284] ? kthread_create_worker_on_cpu+0x50/0x50 [27175.328027] ret_from_fork+0x35/0x40 [27175.328741] ---[ end trace 300a1b9f0ac30e26 ]--- Fix this by preventing the flushing of new blocks groups when splitting a leaf/node and when inserting a new root node for one of the trees modified by the flushing operation, similar to what is done when COWing a node/leaf from on of these trees. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202383 Reported-by: Eli V <eliventer@gmail.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent fd340d0 commit a627947

1 file changed

Lines changed: 50 additions & 28 deletions

File tree

fs/btrfs/ctree.c

Lines changed: 50 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -968,6 +968,48 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
968968
return 0;
969969
}
970970

971+
static struct extent_buffer *alloc_tree_block_no_bg_flush(
972+
struct btrfs_trans_handle *trans,
973+
struct btrfs_root *root,
974+
u64 parent_start,
975+
const struct btrfs_disk_key *disk_key,
976+
int level,
977+
u64 hint,
978+
u64 empty_size)
979+
{
980+
struct btrfs_fs_info *fs_info = root->fs_info;
981+
struct extent_buffer *ret;
982+
983+
/*
984+
* If we are COWing a node/leaf from the extent, chunk, device or free
985+
* space trees, make sure that we do not finish block group creation of
986+
* pending block groups. We do this to avoid a deadlock.
987+
* COWing can result in allocation of a new chunk, and flushing pending
988+
* block groups (btrfs_create_pending_block_groups()) can be triggered
989+
* when finishing allocation of a new chunk. Creation of a pending block
990+
* group modifies the extent, chunk, device and free space trees,
991+
* therefore we could deadlock with ourselves since we are holding a
992+
* lock on an extent buffer that btrfs_create_pending_block_groups() may
993+
* try to COW later.
994+
* For similar reasons, we also need to delay flushing pending block
995+
* groups when splitting a leaf or node, from one of those trees, since
996+
* we are holding a write lock on it and its parent or when inserting a
997+
* new root node for one of those trees.
998+
*/
999+
if (root == fs_info->extent_root ||
1000+
root == fs_info->chunk_root ||
1001+
root == fs_info->dev_root ||
1002+
root == fs_info->free_space_root)
1003+
trans->can_flush_pending_bgs = false;
1004+
1005+
ret = btrfs_alloc_tree_block(trans, root, parent_start,
1006+
root->root_key.objectid, disk_key, level,
1007+
hint, empty_size);
1008+
trans->can_flush_pending_bgs = true;
1009+
1010+
return ret;
1011+
}
1012+
9711013
/*
9721014
* does the dirty work in cow of a single block. The parent block (if
9731015
* supplied) is updated to point to the new cow copy. The new buffer is marked
@@ -1015,28 +1057,8 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
10151057
if ((root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && parent)
10161058
parent_start = parent->start;
10171059

1018-
/*
1019-
* If we are COWing a node/leaf from the extent, chunk, device or free
1020-
* space trees, make sure that we do not finish block group creation of
1021-
* pending block groups. We do this to avoid a deadlock.
1022-
* COWing can result in allocation of a new chunk, and flushing pending
1023-
* block groups (btrfs_create_pending_block_groups()) can be triggered
1024-
* when finishing allocation of a new chunk. Creation of a pending block
1025-
* group modifies the extent, chunk, device and free space trees,
1026-
* therefore we could deadlock with ourselves since we are holding a
1027-
* lock on an extent buffer that btrfs_create_pending_block_groups() may
1028-
* try to COW later.
1029-
*/
1030-
if (root == fs_info->extent_root ||
1031-
root == fs_info->chunk_root ||
1032-
root == fs_info->dev_root ||
1033-
root == fs_info->free_space_root)
1034-
trans->can_flush_pending_bgs = false;
1035-
1036-
cow = btrfs_alloc_tree_block(trans, root, parent_start,
1037-
root->root_key.objectid, &disk_key, level,
1038-
search_start, empty_size);
1039-
trans->can_flush_pending_bgs = true;
1060+
cow = alloc_tree_block_no_bg_flush(trans, root, parent_start, &disk_key,
1061+
level, search_start, empty_size);
10401062
if (IS_ERR(cow))
10411063
return PTR_ERR(cow);
10421064

@@ -3345,8 +3367,8 @@ static noinline int insert_new_root(struct btrfs_trans_handle *trans,
33453367
else
33463368
btrfs_node_key(lower, &lower_key, 0);
33473369

3348-
c = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid,
3349-
&lower_key, level, root->node->start, 0);
3370+
c = alloc_tree_block_no_bg_flush(trans, root, 0, &lower_key, level,
3371+
root->node->start, 0);
33503372
if (IS_ERR(c))
33513373
return PTR_ERR(c);
33523374

@@ -3475,8 +3497,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
34753497
mid = (c_nritems + 1) / 2;
34763498
btrfs_node_key(c, &disk_key, mid);
34773499

3478-
split = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid,
3479-
&disk_key, level, c->start, 0);
3500+
split = alloc_tree_block_no_bg_flush(trans, root, 0, &disk_key, level,
3501+
c->start, 0);
34803502
if (IS_ERR(split))
34813503
return PTR_ERR(split);
34823504

@@ -4260,8 +4282,8 @@ static noinline int split_leaf(struct btrfs_trans_handle *trans,
42604282
else
42614283
btrfs_item_key(l, &disk_key, mid);
42624284

4263-
right = btrfs_alloc_tree_block(trans, root, 0, root->root_key.objectid,
4264-
&disk_key, 0, l->start, 0);
4285+
right = alloc_tree_block_no_bg_flush(trans, root, 0, &disk_key, 0,
4286+
l->start, 0);
42654287
if (IS_ERR(right))
42664288
return PTR_ERR(right);
42674289

0 commit comments

Comments
 (0)