zfs/include
Rob Norris e35088c93f ZIO: add "vdev tracing" facility; use it for ZIL flushing
A problem with zio_flush() is that it issues a flush ZIO to a top-level
vdev, which then recursively issues child flush ZIOs until the real leaf
devices are flushed. As usual, an error in a child ZIO results in the
parent ZIO erroring too, so if a leaf device has failed, it's flush ZIO
will fail, and so will the entire flush operation.

This didn't matter when we used to ignore flush errors, but now that we
propagate them, the flush error propagates into the ZIL write ZIO. This
causes the ZIL to believe its write failed, and fall back to a full txg
wait. This still provides correct behaviour for zil_commit() callers (eg
fsync()) but it ruins performance.

We cannot simply skip flushing failed vdevs, because the associated
write may have succeeded before the vdev failed, which would give the
appearance the write is fully flushed when it is not. Neither can we
issue a "syncing write" to the device (eg SCSI FUA), as this also
degrades performance.

The answer is that we must bind writes and flushes together in a way
such that we only flush the physical devices that we wrote to.

This adds a "vdev tracing" facility to ZIOs, zio_vdev_trace. When
enabled on a ZIO with ZIO_FLAG_VDEV_TRACE, then upon successful
completion (in the _done handler), zio->io_vdev_trace_tree will have a
list of zio_vdev_trace_t objects that each describe a vdev that was
involved in the successful completion of the ZIO.

A companion function, zio_vdev_trace_flush(), is included, that issues a
flush ZIO to the child vdevs on the given trace tree.
zil_lwb_write_done() is updated to use this to bind ZIL writes and
flushes together.

The tracing facility is similar in many ways to the "deferred flushing"
facility inside the ZIL, to the point where it can replace it. Now, if
the flush should be deferred, the trace records from the writing ZIO are
captured and combined with any captured from previous writes. When its
finally time to issue the flush, we issue it to the entire accumulated
set of traced vdevs.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-17 14:43:36 +10:00
..
os Linux 6.11: add compat macro for page_mapping() 2024-08-13 17:47:18 -07:00
sys ZIO: add "vdev tracing" facility; use it for ZIL flushing 2024-08-17 14:43:36 +10:00
.gitignore OpenZFS restructuring - move platform specific sources 2019-09-06 11:26:26 -07:00
Makefile.am ddt: split internal DDT API into separate header 2024-02-15 11:45:15 -08:00
cityhash.h libzfs: convert to -fvisibility=hidden 2021-06-03 13:17:55 -07:00
libnvpair.h nvpair: Constify string functions 2023-03-14 15:25:50 -07:00
libuutil.h Cleanup: Remove unused uu_pname code 2022-09-19 17:33:52 -07:00
libuutil_common.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
libuutil_impl.h libuutil: deobfuscate internal pointers 2022-11-03 09:57:05 -07:00
libzdb.h libzdb: Initial breakout of libzdb 2024-02-05 10:00:41 -08:00
libzfs.h libzfs.h: Set ZFS_MAXPROPLEN and ZPOOL_MAXPROPLEN to ZAP_MAXVALUELEN 2024-08-08 15:23:40 -07:00
libzfs_core.h ddt: add support for prefetching tables into the ARC 2024-07-26 09:16:18 -07:00
libzfsbootenv.h lib{efi,avl,share,tpool,zfs_core,zfsbootenv,zutil}: -fvisibility=hidden 2021-06-09 17:04:32 -07:00
libzutil.h Parallel pool import 2024-04-22 09:42:38 -07:00
thread_pool.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfeature_common.h ddt: add FDT feature and support for legacy and new on-disk formats 2024-08-16 11:58:59 -07:00
zfs_comutil.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_deleg.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_fletcher.h Drop lying to the compiler in the fletcher4 code 2023-03-24 10:29:19 -07:00
zfs_namecheck.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
zfs_prop.h Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00