Skip to content

Commit d0c138e

Browse files
author
CKI KWF Bot
committed
Merge: iomap: sync to upstream v6.17
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/1634 JIRA: https://issues.redhat.com/browse/RHEL-121230 Sync the iomap subsystem with upstream v6.17. Tested via fstests on various filesystems and architectures. Signed-off-by: Brian Foster <bfoster@redhat.com> Approved-by: Carlos Maiolino <cmaiolino@redhat.com> Approved-by: Andrey Albershteyn <aalbersh@redhat.com> Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com> Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
2 parents c722a94 + ff99a2b commit d0c138e

File tree

27 files changed

+1457
-1110
lines changed

27 files changed

+1457
-1110
lines changed

Documentation/filesystems/iomap/design.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,6 @@ structure below:
167167
struct dax_device *dax_dev;
168168
void *inline_data;
169169
void *private;
170-
const struct iomap_folio_ops *folio_ops;
171170
u64 validity_cookie;
172171
};
173172
@@ -246,6 +245,10 @@ The fields are as follows:
246245
* **IOMAP_F_PRIVATE**: Starting with this value, the upper bits can
247246
be set by the filesystem for its own purposes.
248247

248+
* **IOMAP_F_ANON_WRITE**: Indicates that (write) I/O does not have a target
249+
block assigned to it yet and the file system will do that in the bio
250+
submission handler, splitting the I/O as needed.
251+
249252
These flags can be set by iomap itself during file operations.
250253
The filesystem should supply an ``->iomap_end`` function if it needs
251254
to observe these flags:
@@ -276,8 +279,6 @@ The fields are as follows:
276279
<https://lore.kernel.org/all/20180619164137.13720-7-hch@lst.de/>`_.
277280
This value will be passed unchanged to ``->iomap_end``.
278281

279-
* ``folio_ops`` will be covered in the section on pagecache operations.
280-
281282
* ``validity_cookie`` is a magic freshness value set by the filesystem
282283
that should be used to detect stale mappings.
283284
For pagecache operations this is critical for correct operation

Documentation/filesystems/iomap/operations.rst

Lines changed: 50 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -57,21 +57,19 @@ The following address space operations can be wrapped easily:
5757
* ``bmap``
5858
* ``swap_activate``
5959

60-
``struct iomap_folio_ops``
60+
``struct iomap_write_ops``
6161
--------------------------
6262

63-
The ``->iomap_begin`` function for pagecache operations may set the
64-
``struct iomap::folio_ops`` field to an ops structure to override
65-
default behaviors of iomap:
66-
6763
.. code-block:: c
6864
69-
struct iomap_folio_ops {
65+
struct iomap_write_ops {
7066
struct folio *(*get_folio)(struct iomap_iter *iter, loff_t pos,
7167
unsigned len);
7268
void (*put_folio)(struct inode *inode, loff_t pos, unsigned copied,
7369
struct folio *folio);
7470
bool (*iomap_valid)(struct inode *inode, const struct iomap *iomap);
71+
int (*read_folio_range)(const struct iomap_iter *iter,
72+
struct folio *folio, loff_t pos, size_t len);
7573
};
7674
7775
iomap calls these functions:
@@ -127,6 +125,10 @@ iomap calls these functions:
127125
``->iomap_valid``, then the iomap should considered stale and the
128126
validation failed.
129127

128+
- ``read_folio_range``: Called to synchronously read in the range that will
129+
be written to. If this function is not provided, iomap will default to
130+
submitting a bio read request.
131+
130132
These ``struct kiocb`` flags are significant for buffered I/O with iomap:
131133

132134
* ``IOCB_NOWAIT``: Turns on ``IOMAP_NOWAIT``.
@@ -269,7 +271,7 @@ writeback.
269271
It does not lock ``i_rwsem`` or ``invalidate_lock``.
270272

271273
The dirty bit will be cleared for all folios run through the
272-
``->map_blocks`` machinery described below even if the writeback fails.
274+
``->writeback_range`` machinery described below even if the writeback fails.
273275
This is to prevent dirty folio clots when storage devices fail; an
274276
``-EIO`` is recorded for userspace to collect via ``fsync``.
275277

@@ -281,15 +283,14 @@ The ``ops`` structure must be specified and is as follows:
281283
.. code-block:: c
282284
283285
struct iomap_writeback_ops {
284-
int (*map_blocks)(struct iomap_writepage_ctx *wpc, struct inode *inode,
285-
loff_t offset, unsigned len);
286-
int (*prepare_ioend)(struct iomap_ioend *ioend, int status);
287-
void (*discard_folio)(struct folio *folio, loff_t pos);
286+
int (*writeback_range)(struct iomap_writepage_ctx *wpc,
287+
struct folio *folio, u64 pos, unsigned int len, u64 end_pos);
288+
int (*writeback_submit)(struct iomap_writepage_ctx *wpc, int error);
288289
};
289290
290291
The fields are as follows:
291292

292-
- ``map_blocks``: Sets ``wpc->iomap`` to the space mapping of the file
293+
- ``writeback_range``: Sets ``wpc->iomap`` to the space mapping of the file
293294
range (in bytes) given by ``offset`` and ``len``.
294295
iomap calls this function for each dirty fs block in each dirty folio,
295296
though it will `reuse mappings
@@ -304,28 +305,26 @@ The fields are as follows:
304305
This revalidation must be open-coded by the filesystem; it is
305306
unclear if ``iomap::validity_cookie`` can be reused for this
306307
purpose.
307-
This function must be supplied by the filesystem.
308-
309-
- ``prepare_ioend``: Enables filesystems to transform the writeback
310-
ioend or perform any other preparatory work before the writeback I/O
311-
is submitted.
312-
This might include pre-write space accounting updates, or installing
313-
a custom ``->bi_end_io`` function for internal purposes, such as
314-
deferring the ioend completion to a workqueue to run metadata update
315-
transactions from process context.
316-
This function is optional.
317308

318-
- ``discard_folio``: iomap calls this function after ``->map_blocks``
319-
fails to schedule I/O for any part of a dirty folio.
320-
The function should throw away any reservations that may have been
321-
made for the write.
309+
If this methods fails to schedule I/O for any part of a dirty folio, it
310+
should throw away any reservations that may have been made for the write.
322311
The folio will be marked clean and an ``-EIO`` recorded in the
323312
pagecache.
324313
Filesystems can use this callback to `remove
325314
<https://lore.kernel.org/all/20201029163313.1766967-1-bfoster@redhat.com/>`_
326315
delalloc reservations to avoid having delalloc reservations for
327316
clean pagecache.
328-
This function is optional.
317+
This function must be supplied by the filesystem.
318+
319+
- ``writeback_submit``: Submit the previous built writeback context.
320+
Block based file systems should use the iomap_ioend_writeback_submit
321+
helper, other file system can implement their own.
322+
File systems can optionall to hook into writeback bio submission.
323+
This might include pre-write space accounting updates, or installing
324+
a custom ``->bi_end_io`` function for internal purposes, such as
325+
deferring the ioend completion to a workqueue to run metadata update
326+
transactions from process context before submitting the bio.
327+
This function must be supplied by the filesystem.
329328

330329
Pagecache Writeback Completion
331330
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -339,10 +338,9 @@ If the write failed, it will also set the error bits on the folios and
339338
the address space.
340339
This can happen in interrupt or process context, depending on the
341340
storage device.
342-
343341
Filesystems that need to update internal bookkeeping (e.g. unwritten
344-
extent conversions) should provide a ``->prepare_ioend`` function to
345-
set ``struct iomap_end::bio::bi_end_io`` to its own function.
342+
extent conversions) should set their own bi_end_io on the bios
343+
submitted by ``->submit_writeback``
346344
This function should call ``iomap_finish_ioends`` after finishing its
347345
own work (e.g. unwritten extent conversion).
348346

@@ -515,18 +513,33 @@ IOMAP_WRITE`` with any combination of the following enhancements:
515513

516514
* ``IOMAP_ATOMIC``: This write is being issued with torn-write
517515
protection.
518-
Only a single bio can be created for the write, and the write must
519-
not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
520-
set.
516+
Torn-write protection may be provided based on HW-offload or by a
517+
software mechanism provided by the filesystem.
518+
519+
For HW-offload based support, only a single bio can be created for the
520+
write, and the write must not be split into multiple I/O requests, i.e.
521+
flag REQ_ATOMIC must be set.
521522
The file range to write must be aligned to satisfy the requirements
522523
of both the filesystem and the underlying block device's atomic
523524
commit capabilities.
524525
If filesystem metadata updates are required (e.g. unwritten extent
525-
conversion or copy on write), all updates for the entire file range
526+
conversion or copy-on-write), all updates for the entire file range
526527
must be committed atomically as well.
527-
Only one space mapping is allowed per untorn write.
528-
Untorn writes must be aligned to, and must not be longer than, a
529-
single file block.
528+
Untorn-writes may be longer than a single file block. In all cases,
529+
the mapping start disk block must have at least the same alignment as
530+
the write offset.
531+
The filesystems must set IOMAP_F_ATOMIC_BIO to inform iomap core of an
532+
untorn-write based on HW-offload.
533+
534+
For untorn-writes based on a software mechanism provided by the
535+
filesystem, all the disk block alignment and single bio restrictions
536+
which apply for HW-offload based untorn-writes do not apply.
537+
The mechanism would typically be used as a fallback for when
538+
HW-offload based untorn-writes may not be issued, e.g. the range of the
539+
write covers multiple extents, meaning that it is not possible to issue
540+
a single bio.
541+
All filesystem metadata updates for the entire file range must be
542+
committed atomically as well.
530543

531544
Callers commonly hold ``i_rwsem`` in shared or exclusive mode before
532545
calling this function.

block/fops.c

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -537,30 +537,42 @@ static void blkdev_readahead(struct readahead_control *rac)
537537
iomap_readahead(rac, &blkdev_iomap_ops);
538538
}
539539

540-
static int blkdev_map_blocks(struct iomap_writepage_ctx *wpc,
541-
struct inode *inode, loff_t offset, unsigned int len)
540+
static ssize_t blkdev_writeback_range(struct iomap_writepage_ctx *wpc,
541+
struct folio *folio, u64 offset, unsigned int len, u64 end_pos)
542542
{
543-
loff_t isize = i_size_read(inode);
543+
loff_t isize = i_size_read(wpc->inode);
544544

545545
if (WARN_ON_ONCE(offset >= isize))
546546
return -EIO;
547-
if (offset >= wpc->iomap.offset &&
548-
offset < wpc->iomap.offset + wpc->iomap.length)
549-
return 0;
550-
return blkdev_iomap_begin(inode, offset, isize - offset,
551-
IOMAP_WRITE, &wpc->iomap, NULL);
547+
548+
if (offset < wpc->iomap.offset ||
549+
offset >= wpc->iomap.offset + wpc->iomap.length) {
550+
int error;
551+
552+
error = blkdev_iomap_begin(wpc->inode, offset, isize - offset,
553+
IOMAP_WRITE, &wpc->iomap, NULL);
554+
if (error)
555+
return error;
556+
}
557+
558+
return iomap_add_to_ioend(wpc, folio, offset, end_pos, len);
552559
}
553560

554561
static const struct iomap_writeback_ops blkdev_writeback_ops = {
555-
.map_blocks = blkdev_map_blocks,
562+
.writeback_range = blkdev_writeback_range,
563+
.writeback_submit = iomap_ioend_writeback_submit,
556564
};
557565

558566
static int blkdev_writepages(struct address_space *mapping,
559567
struct writeback_control *wbc)
560568
{
561-
struct iomap_writepage_ctx wpc = { };
569+
struct iomap_writepage_ctx wpc = {
570+
.inode = mapping->host,
571+
.wbc = wbc,
572+
.ops = &blkdev_writeback_ops
573+
};
562574

563-
return iomap_writepages(mapping, wbc, &wpc, &blkdev_writeback_ops);
575+
return iomap_writepages(&wpc);
564576
}
565577

566578
const struct address_space_operations def_blk_aops = {
@@ -711,7 +723,8 @@ blkdev_direct_write(struct kiocb *iocb, struct iov_iter *from)
711723

712724
static ssize_t blkdev_buffered_write(struct kiocb *iocb, struct iov_iter *from)
713725
{
714-
return iomap_file_buffered_write(iocb, from, &blkdev_iomap_ops, NULL);
726+
return iomap_file_buffered_write(iocb, from, &blkdev_iomap_ops, NULL,
727+
NULL);
715728
}
716729

717730
/*

0 commit comments

Comments
 (0)