OpenZFS on Linux and FreeBSD
Go to file
Rob Norris e7966e581a zio_flush: propagate flush errors to the ZIL
Since the beginning, ZFS' "flush" operation has always ignored
errors[1]. Write errors are captured and dealt with, but if a write
succeeds but the subsequent flush fails, the operation as a whole will
appear to succeed[2].

In the end-of-transaction uberblock+label write+flush ceremony, it's
very difficult for this situation to occur. Since all devices are
written to, typically the first write will succeed, the first flush will
fail unobserved, but then the second write will fail, and the entire
transaction is aborted. It's difficult to imagine a real-world scenario
where all the writes in that sequence could succeed even as the flushes
are failing (understanding that the OS is still seeing hardware problems
and taking devices offline).

In the ZIL however, it's another story. Since only the write response is
checked, if that write succeeds but the flush then fails, the ZIL will
believe that it succeeds, and zil_commit() (and thus fsync()) will
return success rather than the "correct" behaviour of falling back into
txg_wait_synced()[3].

This commit fixes this by adding a simple flag to zio_flush() to
indicate whether or not the caller wants to receive flush errors. This
flag is enabled for ZIL calls. The existing zio chaining inside the ZIL
and the flush handler zil_lwb_flush_vdevs_done() already has all the
necessary support to properly handle a flush failure and fail the entire
zio chain. This causes zil_commit() to correct fall back to
txg_wait_synced() rather than returning success prematurely.

1. The ZFS birth commit (illumos/illumos-gate@fa9e4066f0) had support
   for flushing devices with write caches with the DKIOCFLUSHWRITECACHE
   ioctl. No errors are checked. The comment in `zil_flush_vdevs()` from
   from the time shows the thinking:

   /*
    * Wait for all the flushes to complete.  Not all devices actually
    * support the DKIOCFLUSHWRITECACHE ioctl, so it's OK if it fails.
    */

2. It's not entirely clear from the code history why this was acceptable
   for devices that _do_ have write caches. Our best guess is that this
   was an oversight: between the combination of hardware, pool topology
   and application behaviour required to hit this, it basically didn't
   come up.

3. Somewhat frustratingly, zil.c contains comments describing this exact
   behaviour, and further discussion in #12443 (September 2021). It
   appears that those involved saw the potential, but were looking at a
   different problem and so didn't have the context to recognise it for
   what it was.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
2024-08-17 14:42:45 +10:00
.github Github workflow: fix typo in `zloop` artifact 2024-08-09 16:49:19 -07:00
cmd ddt: dedup log 2024-08-16 12:03:35 -07:00
config Linux 6.11: avoid passing "end" sentinel to register_sysctl() 2024-08-13 17:47:22 -07:00
contrib Updating bash completion build file 2024-08-08 15:39:25 -07:00
etc etc/init.d: decide which variant to use at build time. 2024-04-08 16:52:24 -07:00
include zio_flush: propagate flush errors to the ZIL 2024-08-17 14:42:45 +10:00
lib ddt: dedup log 2024-08-16 12:03:35 -07:00
man Enable L2 cache of all (MRU+MFU) metadata but MFU data only 2024-08-16 13:34:07 -07:00
module zio_flush: propagate flush errors to the ZIL 2024-08-17 14:42:45 +10:00
rpm Linux 6.10 compat: fix rpm-kmod and builtin 2024-08-15 14:00:18 -07:00
scripts disable automatic dependency tracking for dkms builds 2024-06-13 18:08:49 -07:00
tests zts: test for correct fsync() response to ZIL flush failure 2024-08-17 14:32:32 +10:00
udev udev: correctly handle partition #16 and later 2024-03-21 16:38:24 -07:00
.cirrus.yml CI: add FreeBSD build with Cirrus CI 2023-10-06 08:50:26 -07:00
.editorconfig Add an .editorconfig; document git whitespace settings 2020-01-27 13:32:52 -08:00
.gitignore Packaging: Auto-generate changelog during configure (#15528) 2023-11-16 08:58:47 -08:00
.gitmodules .gitmodules: link to openzfs github repository 2021-04-12 09:37:23 -07:00
.mailmap AUTHORS: refresh with recent new contributors (#16362) 2024-07-23 11:47:04 -07:00
AUTHORS AUTHORS: refresh with recent new contributors (#16362) 2024-07-23 11:47:04 -07:00
CODE_OF_CONDUCT.md Documentation corrections 2022-12-22 11:34:28 -08:00
COPYRIGHT Fix typos 2020-06-09 21:24:09 -07:00
LICENSE Update build system and packaging 2018-05-29 16:00:33 -07:00
META Linux 6.9 compat: META (#16358) 2024-07-16 16:27:54 -07:00
Makefile.am Process `script` directory for all configs 2022-10-27 16:45:14 -07:00
NEWS Fix NEWS file 2020-08-26 21:44:41 -07:00
NOTICE Update build system and packaging 2018-05-29 16:00:33 -07:00
README.md FreeBSD: remove support for FreeBSD < 13.0-RELEASE (#16372) 2024-08-05 16:56:45 -07:00
RELEASES.md Add RELEASES.md file 2021-04-02 16:33:40 -07:00
TEST Remove CI builder customization from TEST 2020-03-16 10:46:03 -07:00
autogen.sh Ubuntu 22.04 integration: ShellCheck 2022-11-18 11:24:48 -08:00
configure.ac Packaging: Auto-generate changelog during configure (#15528) 2023-11-16 08:58:47 -08:00
copy-builtin copy-builtin: add hooks with sed/>> 2022-05-10 10:17:43 -07:00
zfs.release.in Move zfs.release generation to configure step 2012-07-12 12:22:51 -07:00

README.md

img

OpenZFS is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the OpenZFS community. This repository contains the code for running OpenZFS on Linux and FreeBSD.

codecov coverity

Official Resources

Installation

Full documentation for installing OpenZFS on your favorite operating system can be found at the Getting Started Page.

Contribute & Develop

We have a separate document with contribution guidelines.

We have a Code of Conduct.

Release

OpenZFS is released under a CDDL license. For more details see the NOTICE, LICENSE and COPYRIGHT files; UCRL-CODE-235197

Supported Kernels

  • The META file contains the officially recognized supported Linux kernel versions.
  • Supported FreeBSD versions are any supported branches and releases starting from 13.0-RELEASE.