zfs/include/sys
Prakash Surya 900d09b285 OpenZFS 9962 - zil_commit should omit cache thrash
As a result of the changes made in 8585, it's possible for an excessive
amount of vdev flush commands to be issued under some workloads.

Specifically, when the workload consists of mostly async write activity,
interspersed with some sync write and/or fsync activity, we can end up
issuing more flush commands to the underlying storage than is actually
necessary. As a result of these flush commands, the write latency and
overall throughput of the pool can be poorly impacted (latency
increases, throughput decreases).

Currently, any time an lwb completes, the vdev(s) written to as a result
of that lwb will be issued a flush command. The intenion is so the data
written to that vdev is on stable storage, prior to communicating to any
waiting threads that their data is safe on disk.

The problem with this scheme, is that sometimes an lwb will not have any
threads waiting for it to complete. This can occur when there's async
activity that gets "converted" to sync requests, as a result of calling
the zil_async_to_sync() function via zil_commit_impl(). When this
occurs, the current code may issue many lwbs that don't have waiters
associated with them, resulting in many flush commands, potentially to
the same vdev(s).

For example, given a pool with a single vdev, and a single fsync() call
that results in 10 lwbs being written out (e.g. due to other async
writes), that will result in 10 flush commands to that single vdev (a
flush issued after each lwb write completes). Ideally, we'd only issue a
single flush command to that vdev, after all 10 lwb writes completed.

Further, and most important as it pertains to this change, since the
flush commands are often very impactful to the performance of the pool's
underlying storage, unnecessarily issuing these flush commands can
poorly impact the performance of the lwb writes themselves. Thus, we
need to avoid issuing flush commands when possible, in order to acheive
the best possible performance out of the pool's underlying storage.

This change attempts to address this problem by changing the ZIL's logic
to only issue a vdev flush command when it detects an lwb that has a
thread waiting for it to complete. When an lwb does not have threads
waiting for it, the responsibility of issuing the flush command to the
vdevs involved with that lwb's write is passed on to the "next" lwb.
It's only once a write for an lwb with waiters completes, do we issue
the vdev flush command(s). As a result, now when we issue the flush(s),
we will issue them to the vdevs involved with that specific lwb's write,
but potentially also to vdevs involved with "previous" lwb writes (i.e.
if the previous lwbs did not have waiters associated with them).

Thus, in our prior example with 10 lwbs, it's only once the last lwb
completes (which will be the lwb containing the waiter for the thread
that called fsync) will we issue the vdev flush command; all of the
other lwbs will find they have no waiters, so they'll pass the
responsibility of the flush to the "next" lwb (until reaching the last
lwb that has the waiter).

Porting Notes:
* Reconciled conflicts with the fastwrite feature.

Authored by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matt Ahrens <matt@delphix.com>
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Ported-by: Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

OpenZFS-issue: https://www.illumos.org/issues/9962
OpenZFS-commit: https://github.com/openzfs/openzfs/commit/545190c6
Closes #8188
2018-12-07 11:09:42 -08:00
..
crypto Add support for selecting encryption backend 2018-08-02 11:59:24 -07:00
fm Add zpool status -s (slow I/Os) and -p (parseable) 2018-11-08 16:47:24 -08:00
fs zed: detect and offline physically removed devices 2018-11-09 11:17:24 -08:00
lua Fix coverity defects: zfs channel programs 2018-02-20 11:19:42 -08:00
sysevent OpenZFS 8959 - Add notifications when a scrub is paused or resumed 2018-01-17 10:31:00 -08:00
Makefile.am Refactor dmu_recv into its own file 2018-10-09 14:05:13 -07:00
abd.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
aggsum.h OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
arc.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
arc_impl.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
avl.h Remove dead code from AVL tree 2017-10-05 19:28:00 -07:00
avl_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
blkptr.h OpenZFS 8067 - zdb should be able to dump literal embedded block pointer 2017-07-07 11:28:01 -07:00
bplist.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
bpobj.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
bptree.h Illumos 4914 - zfs on-disk bookmark structure should be named *_phys_t 2014-08-06 14:48:41 -07:00
bqueue.h Illumos 5960, 5925 2016-01-08 15:08:19 -08:00
cityhash.h OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
dataset_kstats.h Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
dbuf.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
ddt.h Incorrect maximum DVA value in DDE_GET_NDVAS() 2018-02-26 14:20:12 -08:00
dmu.h OpenZFS 9689 - zfs range lock code should not be zpl-specific 2018-10-11 10:19:33 -07:00
dmu_impl.h Fix race in dnode_check_slots_free() 2018-04-10 11:15:05 -07:00
dmu_objset.h Pool allocation classes 2018-09-05 18:33:36 -07:00
dmu_recv.h Refactor dmu_recv into its own file 2018-10-09 14:05:13 -07:00
dmu_send.h Refactor dmu_recv into its own file 2018-10-09 14:05:13 -07:00
dmu_traverse.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
dmu_tx.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
dmu_zfetch.h OpenZFS 6322 - ZFS indirect block predictive prefetch 2016-08-30 14:26:55 -07:00
dnode.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
dsl_bookmark.h Refactor dmu_recv into its own file 2018-10-09 14:05:13 -07:00
dsl_crypt.h Refcounted DSL Crypto Key Mappings 2018-10-03 09:47:11 -07:00
dsl_dataset.h Add types to featureflags in zfs 2018-10-16 11:15:04 -07:00
dsl_deadlist.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_deleg.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_destroy.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
dsl_dir.h Remove duplicate macro in dsl_dir.h 2018-10-01 10:40:11 -07:00
dsl_pool.h OpenZFS 9617 - too-frequent TXG sync causes excessive write inflation 2018-10-04 13:13:28 -07:00
dsl_prop.h Illumos 6171 - dsl_prop_unregister() slows down dataset eviction. 2016-01-12 10:53:12 -08:00
dsl_scan.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
dsl_synctask.h OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
dsl_userhold.h Illumos #3740 2013-11-04 11:17:48 -08:00
edonr.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
efi_partition.h Fix spelling 2017-01-03 11:31:18 -06:00
frame.h Suppress incorrect objtool warnings 2017-12-07 10:28:50 -08:00
hkdf.h Encryption patch follow-up 2017-10-11 16:54:48 -04:00
metaslab.h Pool allocation classes 2018-09-05 18:33:36 -07:00
metaslab_impl.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
mmp.h Record skipped MMP writes in multihost_history 2018-03-06 15:15:15 -08:00
mntent.h Make zfs mount according to relatime config in dataset 2016-04-05 18:55:59 -07:00
multilist.h OpenZFS 7968 - multi-threaded spa_sync() 2017-03-20 18:36:00 -07:00
note.h Update build system and packaging 2018-05-29 16:00:33 -07:00
nvpair.h Add new fnvlist_lookup_* functions 2018-10-03 15:30:55 -07:00
nvpair_impl.h OpenZFS 9580 - Add a hash-table on top of nvlist to speed-up operations 2018-07-30 11:30:03 -07:00
pathname.h Add pn_alloc()/pn_free() functions 2016-04-21 09:49:25 -07:00
policy.h Add `zfs allow` and `zfs unallow` support 2016-06-07 09:16:52 -07:00
range_tree.h OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
refcount.h Add zfs_refcount_transfer_ownership_many() 2018-10-09 10:05:48 -07:00
rrwlock.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
sa_impl.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
sdt.h Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
sha2.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
skein.h OpenZFS 4185 - add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R 2016-10-03 14:51:15 -07:00
spa.h Add zpool status -s (slow I/Os) and -p (parseable) 2018-11-08 16:47:24 -08:00
spa_boot.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
spa_checkpoint.h OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
spa_checksum.h Implementation of AVX2 optimized Fletcher-4 2016-06-02 14:30:51 -07:00
spa_impl.h Defer new resilvers until the current one ends 2018-10-18 21:06:18 -07:00
space_map.h OpenZFS 9238 - ZFS Spacemap Encoding V2 2018-07-05 12:02:34 -07:00
space_reftree.h Illumos #4101, #4102, #4103, #4105, #4106 2014-07-22 09:39:16 -07:00
sysevent.h OpenZFS 6939 - add sysevents to zfs core for commands 2017-07-12 21:28:13 -07:00
trace.h Remove duplicate typedefs from trace.h 2015-01-06 16:53:24 -08:00
trace_acl.h Linux 4.16 compat: inode_set_iversion() 2018-02-08 21:25:19 -08:00
trace_arc.h Support re-prioritizing asynchronous prefetches 2017-12-21 09:13:06 -08:00
trace_common.h OpenZFS 6531 - Provide mechanism to artificially limit disk performance 2016-05-26 10:11:51 -07:00
trace_dbgmsg.h Add line info and SET_ERROR() to ZFS debug log 2017-07-25 23:09:48 -07:00
trace_dbuf.h Prefix all refcount functions with zfs_ 2018-10-01 10:42:05 -07:00
trace_dmu.h tx_waited -> tx_dirty_delayed in trace_dmu.h 2018-01-31 16:13:26 -08:00
trace_dnode.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_multilist.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_txg.h Fix build-it compilation regression 2017-01-24 08:50:15 -08:00
trace_vdev.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
trace_zil.h OpenZFS 8585 - improve batching done in zil_commit() 2017-12-05 09:39:16 -08:00
trace_zio.h Use cstyle -cpP in `make cstyle` check 2016-12-12 10:46:26 -08:00
trace_zrlock.h Fix race in trace point in zrl_add_impl 2018-03-12 11:27:02 -07:00
txg.h Small rework of txg_list code 2018-08-27 10:16:01 -07:00
txg_impl.h OpenZFS 9464 - txg_kick() fails to see that we are quiescing 2018-06-04 14:56:06 -07:00
u8_textprep.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
u8_textprep_data.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
uberblock.h Multi-modifier protection (MMP) 2017-07-13 13:54:00 -04:00
uberblock_impl.h OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
uio_impl.h deadlock between mm_sem and tx assign in zfs_write() and page fault 2018-10-16 11:11:24 -07:00
unique.h Illumos #3742 2013-11-04 10:55:25 -08:00
uuid.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
vdev.h Defer new resilvers until the current one ends 2018-10-18 21:06:18 -07:00
vdev_disk.h Add support for autoexpand property 2018-07-23 15:40:15 -07:00
vdev_file.h Use a dedicated taskq for vdev_file 2016-12-21 10:47:15 -08:00
vdev_impl.h zed: detect and offline physically removed devices 2018-11-09 11:17:24 -08:00
vdev_indirect_births.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_indirect_mapping.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
vdev_raidz.h Use cstyle -cpP in `make cstyle` check 2016-12-12 10:46:26 -08:00
vdev_raidz_impl.h Revert raidz_map and _col structure types 2018-01-09 14:46:52 -08:00
vdev_removal.h OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
xvattr.h Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zap.h Linux 4.19-rc3+ compat: Remove refcount_t compat 2018-09-26 10:29:26 -07:00
zap_impl.h OpenZFS 7793 - ztest fails assertion in dmu_tx_willuse_space 2017-03-07 09:51:59 -08:00
zap_leaf.h Fix ENOSPC in "Handle zap_add() failures in ..." 2018-04-18 14:19:50 -07:00
zcp.h Add tunables for channel programs 2018-06-15 15:10:42 -07:00
zcp_global.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_iter.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zcp_prop.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
zfeature.h Revert "zhack: Add 'feature disable' command" 2016-05-17 11:52:07 -07:00
zfs_acl.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_context.h Add libzutil for libzfs or libzpool consumers 2018-11-05 11:22:33 -08:00
zfs_ctldir.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_debug.h Fix dbgmsg printing in ztest and zdb 2018-10-24 14:36:50 -07:00
zfs_delay.h Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_dir.h Rename zfs_sb_t -> zfsvfs_t 2017-03-10 09:51:33 -08:00
zfs_fuid.h Update build system and packaging 2018-05-29 16:00:33 -07:00
zfs_ioctl.h Add libzutil for libzfs or libzpool consumers 2018-11-05 11:22:33 -08:00
zfs_onexit.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_project.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_ratelimit.h Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_rlock.h OpenZFS 9689 - zfs range lock code should not be zpl-specific 2018-10-11 10:19:33 -07:00
zfs_sa.h Project Quota on ZFS 2018-02-13 14:54:54 -08:00
zfs_stat.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
zfs_sysfs.h Fix in-kernel sysfs entries 2018-09-06 21:44:52 -07:00
zfs_vfsops.h Introduce read/write kstats per dataset 2018-08-20 09:52:37 -07:00
zfs_vnops.h RHEL 7.5 compat: FMODE_KABI_ITERATE 2018-05-02 15:01:24 -07:00
zfs_znode.h Linux does not HAVE_SMB_SHARE 2018-10-17 10:31:38 -07:00
zil.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zil_impl.h OpenZFS 9962 - zil_commit should omit cache thrash 2018-12-07 11:09:42 -08:00
zio.h Add zpool status -s (slow I/Os) and -p (parseable) 2018-11-08 16:47:24 -08:00
zio_checksum.h Remove dependency on linear ABD 2017-03-29 12:24:51 -07:00
zio_compress.h DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
zio_crypt.h Add support for decryption faults in zinject 2018-05-02 15:36:20 -07:00
zio_impl.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00
zio_priority.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zpl.h Linux 4.18 compat: inode timespec -> timespec64 2018-06-19 21:51:18 -07:00
zrlock.h OpenZFS 6328 - Fix cstyle errors in zfs codebase 2017-01-12 09:42:11 -08:00
zthr.h OpenZFS 9166 - zfs storage pool checkpoint 2018-06-26 10:07:42 -07:00
zvol.h Add port of FreeBSD 'volmode' property 2017-07-12 13:05:37 -07:00