zfs/module at e7966e581a4ed8462900cd549c4ee626f9e8c8f0 - zfs

History

Rob Norris e7966e581a zio_flush: propagate flush errors to the ZIL Since the beginning, ZFS' "flush" operation has always ignored errors[1]. Write errors are captured and dealt with, but if a write succeeds but the subsequent flush fails, the operation as a whole will appear to succeed[2]. In the end-of-transaction uberblock+label write+flush ceremony, it's very difficult for this situation to occur. Since all devices are written to, typically the first write will succeed, the first flush will fail unobserved, but then the second write will fail, and the entire transaction is aborted. It's difficult to imagine a real-world scenario where all the writes in that sequence could succeed even as the flushes are failing (understanding that the OS is still seeing hardware problems and taking devices offline). In the ZIL however, it's another story. Since only the write response is checked, if that write succeeds but the flush then fails, the ZIL will believe that it succeeds, and zil_commit() (and thus fsync()) will return success rather than the "correct" behaviour of falling back into txg_wait_synced()[3]. This commit fixes this by adding a simple flag to zio_flush() to indicate whether or not the caller wants to receive flush errors. This flag is enabled for ZIL calls. The existing zio chaining inside the ZIL and the flush handler zil_lwb_flush_vdevs_done() already has all the necessary support to properly handle a flush failure and fail the entire zio chain. This causes zil_commit() to correct fall back to txg_wait_synced() rather than returning success prematurely. 1. The ZFS birth commit (illumos/illumos-gate@fa9e4066f0) had support for flushing devices with write caches with the DKIOCFLUSHWRITECACHE ioctl. No errors are checked. The comment in `zil_flush_vdevs()` from from the time shows the thinking: /* * Wait for all the flushes to complete. Not all devices actually * support the DKIOCFLUSHWRITECACHE ioctl, so it's OK if it fails. */ 2. It's not entirely clear from the code history why this was acceptable for devices that _do_ have write caches. Our best guess is that this was an oversight: between the combination of hardware, pool topology and application behaviour required to hit this, it basically didn't come up. 3. Somewhat frustratingly, zil.c contains comments describing this exact behaviour, and further discussion in #12443 (September 2021). It appears that those involved saw the potential, but were looking at a different problem and so didn't have the context to recognise it for what it was. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com>		2024-08-17 14:42:45 +10:00
..
avl	Suppress Clang Static Analyzer false positive in the AVL tree code.	2023-03-08 13:51:21 -08:00
icp	icp: remove redundant FreeBSD check	2024-05-31 15:13:59 -07:00
lua	LUA: Backport CVE-2020-24370's patch	2024-02-07 11:53:05 -08:00
nvpair	xdr: header cleanup	2024-04-03 15:13:27 -07:00
os	linux/zvol_os: fix zvol queue limits initialization	2024-08-15 14:29:50 -07:00
unicode	Illumos #15286 : do_composition() needs sign awareness	2023-01-05 11:16:21 -08:00
zcommon	ddt: add FDT feature and support for legacy and new on-disk formats	2024-08-16 11:58:59 -07:00
zfs	zio_flush: propagate flush errors to the ZIL	2024-08-17 14:42:45 +10:00
zstd	zstd: don't call zstd_mempool_reap if there are no buffers (#16302 )	2024-07-15 14:51:37 -07:00
.gitignore	FreeBSD: Ignore symlink to i386 includes	2022-08-02 16:34:23 -07:00
Kbuild.in	ddt: dedup log	2024-08-16 12:03:35 -07:00
Makefile.bsd	ddt: dedup log	2024-08-16 12:03:35 -07:00
Makefile.in	check-zstd-symbols: also ignore __pfx_ symbols	2023-09-18 09:08:41 -07:00