zfs/module
Rob Norris e35088c93f ZIO: add "vdev tracing" facility; use it for ZIL flushing
A problem with zio_flush() is that it issues a flush ZIO to a top-level
vdev, which then recursively issues child flush ZIOs until the real leaf
devices are flushed. As usual, an error in a child ZIO results in the
parent ZIO erroring too, so if a leaf device has failed, it's flush ZIO
will fail, and so will the entire flush operation.

This didn't matter when we used to ignore flush errors, but now that we
propagate them, the flush error propagates into the ZIL write ZIO. This
causes the ZIL to believe its write failed, and fall back to a full txg
wait. This still provides correct behaviour for zil_commit() callers (eg
fsync()) but it ruins performance.

We cannot simply skip flushing failed vdevs, because the associated
write may have succeeded before the vdev failed, which would give the
appearance the write is fully flushed when it is not. Neither can we
issue a "syncing write" to the device (eg SCSI FUA), as this also
degrades performance.

The answer is that we must bind writes and flushes together in a way
such that we only flush the physical devices that we wrote to.

This adds a "vdev tracing" facility to ZIOs, zio_vdev_trace. When
enabled on a ZIO with ZIO_FLAG_VDEV_TRACE, then upon successful
completion (in the _done handler), zio->io_vdev_trace_tree will have a
list of zio_vdev_trace_t objects that each describe a vdev that was
involved in the successful completion of the ZIO.

A companion function, zio_vdev_trace_flush(), is included, that issues a
flush ZIO to the child vdevs on the given trace tree.
zil_lwb_write_done() is updated to use this to bind ZIL writes and
flushes together.

The tracing facility is similar in many ways to the "deferred flushing"
facility inside the ZIL, to the point where it can replace it. Now, if
the flush should be deferred, the trace records from the writing ZIO are
captured and combined with any captured from previous writes. When its
finally time to issue the flush, we issue it to the entire accumulated
set of traced vdevs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-17 14:43:36 +10:00
..
avl Suppress Clang Static Analyzer false positive in the AVL tree code. 2023-03-08 13:51:21 -08:00
icp icp: remove redundant FreeBSD check 2024-05-31 15:13:59 -07:00
lua LUA: Backport CVE-2020-24370's patch 2024-02-07 11:53:05 -08:00
nvpair xdr: header cleanup 2024-04-03 15:13:27 -07:00
os flush: don't report flush error when disabling flush support 2024-08-17 14:42:45 +10:00
unicode Illumos #15286: do_composition() needs sign awareness 2023-01-05 11:16:21 -08:00
zcommon ddt: add FDT feature and support for legacy and new on-disk formats 2024-08-16 11:58:59 -07:00
zfs ZIO: add "vdev tracing" facility; use it for ZIL flushing 2024-08-17 14:43:36 +10:00
zstd zstd: don't call zstd_mempool_reap if there are no buffers (#16302) 2024-07-15 14:51:37 -07:00
.gitignore FreeBSD: Ignore symlink to i386 includes 2022-08-02 16:34:23 -07:00
Kbuild.in ddt: dedup log 2024-08-16 12:03:35 -07:00
Makefile.bsd ddt: dedup log 2024-08-16 12:03:35 -07:00
Makefile.in check-zstd-symbols: also ignore __pfx_ symbols 2023-09-18 09:08:41 -07:00