zfs/include
Prakash Surya 900d09b285 OpenZFS 9962 - zil_commit should omit cache thrash
As a result of the changes made in 8585, it's possible for an excessive
amount of vdev flush commands to be issued under some workloads.

Specifically, when the workload consists of mostly async write activity,
interspersed with some sync write and/or fsync activity, we can end up
issuing more flush commands to the underlying storage than is actually
necessary. As a result of these flush commands, the write latency and
overall throughput of the pool can be poorly impacted (latency
increases, throughput decreases).

Currently, any time an lwb completes, the vdev(s) written to as a result
of that lwb will be issued a flush command. The intenion is so the data
written to that vdev is on stable storage, prior to communicating to any
waiting threads that their data is safe on disk.

The problem with this scheme, is that sometimes an lwb will not have any
threads waiting for it to complete. This can occur when there's async
activity that gets "converted" to sync requests, as a result of calling
the zil_async_to_sync() function via zil_commit_impl(). When this
occurs, the current code may issue many lwbs that don't have waiters
associated with them, resulting in many flush commands, potentially to
the same vdev(s).

For example, given a pool with a single vdev, and a single fsync() call
that results in 10 lwbs being written out (e.g. due to other async
writes), that will result in 10 flush commands to that single vdev (a
flush issued after each lwb write completes). Ideally, we'd only issue a
single flush command to that vdev, after all 10 lwb writes completed.

Further, and most important as it pertains to this change, since the
flush commands are often very impactful to the performance of the pool's
underlying storage, unnecessarily issuing these flush commands can
poorly impact the performance of the lwb writes themselves. Thus, we
need to avoid issuing flush commands when possible, in order to acheive
the best possible performance out of the pool's underlying storage.

This change attempts to address this problem by changing the ZIL's logic
to only issue a vdev flush command when it detects an lwb that has a
thread waiting for it to complete. When an lwb does not have threads
waiting for it, the responsibility of issuing the flush command to the
vdevs involved with that lwb's write is passed on to the "next" lwb.
It's only once a write for an lwb with waiters completes, do we issue
the vdev flush command(s). As a result, now when we issue the flush(s),
we will issue them to the vdevs involved with that specific lwb's write,
but potentially also to vdevs involved with "previous" lwb writes (i.e.
if the previous lwbs did not have waiters associated with them).

Thus, in our prior example with 10 lwbs, it's only once the last lwb
completes (which will be the lwb containing the waiter for the thread
that called fsync) will we issue the vdev flush command; all of the
other lwbs will find they have no waiters, so they'll pass the
responsibility of the flush to the "next" lwb (until reaching the last
lwb that has the waiter).

Porting Notes:
* Reconciled conflicts with the fastwrite feature.

Authored by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Ported-by: Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9962
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/545190c6
Closes #8188
2018-12-07 11:09:42 -08:00
..
linux Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
spl Linux 4.20 compat: current_kernel_time() 2018-10-31 11:50:42 -05:00
sys OpenZFS 9962 - zil_commit should omit cache thrash 2018-12-07 11:09:42 -08:00
.gitignore Create /proc/sys/kernel/spl/gitrev with git hash 2018-10-08 21:57:02 -07:00
Makefile.am Add libzutil for libzfs or libzpool consumers 2018-11-05 11:22:33 -08:00
libnvpair.h Add JSON output support to channel programs 2018-03-19 12:40:58 -07:00
libuutil.h Correct cppcheck errors 2017-09-19 12:17:29 -07:00
libuutil_common.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
libuutil_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
libzfs.h OpenZFS 8115 - parallel zfs mount 2018-11-15 11:33:58 -08:00
libzfs_core.h Added encryption support for zfs recv -o / -x 2018-08-15 09:48:49 -07:00
libzfs_impl.h OpenZFS 8115 - parallel zfs mount 2018-11-15 11:33:58 -08:00
libzutil.h Fix libudev dependency in libzutil 2018-11-06 17:47:52 -08:00
thread_pool.h Add libtpool (thread pools) 2017-08-09 15:31:08 -07:00
zfeature_common.h Defer new resilvers until the current one ends 2018-10-18 21:06:18 -07:00
zfs_comutil.h OpenZFS 9337 - zfs get all is slow due to uncached metadata 2018-07-12 10:49:27 -07:00
zfs_deleg.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zfs_fletcher.h DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
zfs_namecheck.h OpenZFS 9330 - stack overflow when creating a deeply nested dataset 2018-07-09 13:02:50 -07:00
zfs_prop.h Add zfs module feature and property info to sysfs 2018-09-02 12:09:53 -07:00