zfs/module
Richard Yao 3a7c35119e
Handle unexpected errors in zil_lwb_commit() without ASSERT()
We tripped `ASSERT(error == ENOENT || error == EEXIST || error ==
EALREADY)` in `zil_lwb_commit()` at Klara when doing robustness testing
of ZIL against drive power cycles.

That assertion presumably exists because when this code was written, the
only errors expected from here were EIO, ENOENT, EEXIST and EALREADY,
with EIO having its own handling before the assertion. However, upon
doing a manual depth first search traversal of the source tree, it turns
out that a large number of unexpected errors are possible here. In
theory, EINVAL and ENOSPC can come from dnode_hold_impl(). However, most
unexpected errors originate in the block layer and come to us from
zio_wait() in various ways. One way is ->zl_get_data() -> dmu_buf_hold()
-> dbuf_read() -> zio_wait().

From vdev_disk.c on Linux alone, zio_wait() can return the unexpected
errors ENXIO, ENOTSUP, EOPNOTSUPP, ETIMEDOUT, ENOSPC, ENOLINK,
EREMOTEIO, EBADE, ENODATA, EILSEQ and ENOMEM

This was only observed after what have been likely over 1000 test
iterations, so we do not expect to reproduce this again to find out what
the error code was. However, circumstantial evidence suggests that the
error was ENXIO.

When ENXIO or any other unexpected error occurs, the `fsync()` or
equivalent operation that called zil_commit() will return success, when
in fact, dirty data has not been committed to stable storage. This is a
violation of the Single UNIX Specification.

The code should be able to handle this and any other unknown error by
calling `txg_wait_synced()`. In addition to changing the code to call
txg_wait_synced() on unexpected errors instead of returning, we modify
it to print information about unexpected errors to dmesg.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@klarasystems.com>
Sponsored-By: Wasabi Technology, Inc.
Closes #14532
2023-03-01 09:39:41 -08:00
..
avl Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
icp icp: Prevent compilers from optimizing away memset() in gcm_clear_ctx() 2023-02-28 17:28:50 -08:00
lua x86 asm: Replace .align with .balign 2023-01-24 09:04:39 -08:00
nvpair Replace dead opensolaris.org license link 2022-07-11 14:16:13 -07:00
os Prevent incorrect datasets being mounted 2023-02-27 16:49:34 -08:00
unicode Illumos #15286: do_composition() needs sign awareness 2023-01-05 11:16:21 -08:00
zcommon Configure zed's diagnosis engine with vdev properties 2023-01-23 13:14:25 -08:00
zfs Handle unexpected errors in zil_lwb_commit() without ASSERT() 2023-03-01 09:39:41 -08:00
zstd Resolve WS-2021-0184 vulnerability in zstd 2023-02-02 15:12:51 -08:00
.gitignore FreeBSD: Ignore symlink to i386 includes 2022-08-02 16:34:23 -07:00
Kbuild.in Unify Assembler files between Linux and Windows 2023-01-17 11:09:19 -08:00
Makefile.bsd Makefile.bsd: cleanup and sync with FreeBSD 2023-02-28 17:33:59 -08:00
Makefile.in autoconf: use include directives instead of recursing down lib 2022-05-10 10:18:11 -07:00