Archive-Team/zfs: OpenZFS on Linux and FreeBSD - zfs

OpenZFS on Linux and FreeBSD

Go to file

Matthew Ahrens d07a8deac8 OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations Authored by: Matt Ahrens <mahrens@delphix.com> Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Don Brady <don.brady@intel.com> Ported-by: Matt Ahrens <mahrens@delphix.com> RAID-Z requires that space be allocated in multiples of P+1 sectors, because this is the minimum size block that can have the required amount of parity. Thus blocks on RAIDZ1 must be allocated in a multiple of 2 sectors; on RAIDZ2 multiple of 3; and on RAIDZ3 multiple of 4. A sector is a unit of 2^ashift bytes, typically 512B or 4KB. To satisfy this constraint, the allocation size is rounded up to the proper multiple, resulting in up to 3 "pad sectors" at the end of some blocks. The contents of these pad sectors are not used, so we do not need to read or write these sectors. However, some storage hardware performs much worse (around 1/2 as fast) on mostly-contiguous writes when there are small gaps of non-overwritten data between the writes. Therefore, ZFS creates "optional" zio's when writing RAID-Z blocks that include pad sectors. If writing a pad sector will fill the gap between two (required) writes, we will issue the optional zio, thus doubling performance. The gap-filling performance improvement was introduced in July 2009. Writing the optional zio is done by the io aggregation code in vdev_queue.c. The problem is that it is also subject to the limit on the size of aggregate writes, zfs_vdev_aggregation_limit, which is by default 128KB. For a given block, if the amount of data plus padding written to a leaf device exceeds zfs_vdev_aggregation_limit, the optional zio will not be written, resulting in a ~2x performance degradation. The problem occurs only for certain values of ashift, compressed block size, and RAID-Z configuration (number of parity and data disks). It cannot occur with the default recordsize=128KB. If compression is enabled, all configurations with recordsize=1MB or larger will be impacted to some degree. The problem notably occurs with recordsize=1MB, compression=off, with 10 disks in a RAIDZ2 or RAIDZ3 group (with 512B or 4KB sectors). Therefore this problem has been known as "the 1MB 10-wide RAIDZ2 (or 3) problem". The problem also occurs with the following configurations: With recordsize=512KB or 256KB, compression=off, the problem occurs only in rarely-used configurations: * 4-wide RAIDZ1 with recordsize=512KB and ashift=12 (4KB sectors) * 4-wide RAIDZ2 (either recordsize, either ashift) * 5-wide RAIDZ2 with recordsize=512KB (either ashift) * 6-wide RAIDZ2 with recordsize=512KB (either ashift) With recordsize=1MB, compression=off, ashift=9 (512B sectors) * RAIDZ1 with 4 or 8 disks * RAIDZ2 with 4, 8, or 10 disks * RAIDZ3 with 6, 8, 9, or 10 disks With recordsize=1MB, compression=off, ashift=12 (4KB sectors) * RAIDZ1 with 7 or 8 disks * RAIDZ2 with 4, 5, or 10 disks * RAIDZ3 with 6, 9, or 10 disks With recordsize=2MB and larger (which can only be selected by changing kernel tunables), many configurations are affected, including with higher numbers of disks (up to 18 disks with recordsize=2MB). Increase zfs_vdev_aggregation_limit to allow the optional zio to be aggregated, thus eliminating the problem. Setting it to 256KB fixes all commonly-used configurations. The solution is to aggregate optional zio's regardless of the aggregation size limit. Analysis sponsored by Intel Corp. OpenZFS-issue: https://www.illumos.org/issues/8005 OpenZFS-commit: https://github.com/openzfs/openzfs/pull/321 Closes #5931		2017-06-09 14:05:15 -07:00
cmd	vdev_id: fix failure due to multipath -l bug	2017-06-09 14:05:15 -07:00
config	Linux 4.12 compat: CURRENT_TIME removed	2017-06-09 14:05:15 -07:00
contrib	Init script fixes	2015-09-29 15:27:14 -07:00
etc	Fix zfs-mount.service failure on boot	2017-06-09 14:05:15 -07:00
include	Linux 4.12 compat: fix super_setup_bdi_name() call	2017-06-09 14:05:15 -07:00
lib	Fix import finding spare/l2cache when path changes	2017-06-09 14:05:15 -07:00
man	Add tunable to ignore hole_birth (enabled by default)	2016-09-09 13:20:54 -07:00
module	OpenZFS 8005 - poor performance of 1MB writes on certain RAID-Z configurations	2017-06-09 14:05:15 -07:00
rpm	Prepare to release 0.6.5.9	2017-02-03 13:11:42 -08:00
scripts	Add support for asynchronous zvol minor operations	2016-03-22 18:08:04 -07:00
udev	Support parallel build trees (VPATH builds)	2015-07-17 13:42:51 -07:00
.gitignore	Ignore *.{deb,rpm,tar.gz} files in the top directory.	2013-04-24 16:18:59 -07:00
.gitmodules	Add zimport.sh compatibility test script	2014-02-21 12:10:31 -08:00
AUTHORS	Add a missing > to AUTHORS	2014-09-02 14:18:53 -07:00
COPYRIGHT	Update ZED copyright boilerplate	2015-05-11 15:07:00 -07:00
DISCLAIMER	Fix minor typos and update marketing copy.	2013-03-21 12:51:06 -07:00
META	Prepare to release 0.6.5.9	2017-02-03 13:11:42 -08:00
Makefile.am	Add `make lint` target	2016-09-05 16:07:08 -07:00
OPENSOLARIS.LICENSE	Add CDDL license file	2008-12-01 14:49:34 -08:00
README.markdown	Fix minor typos and update marketing copy.	2013-03-21 12:51:06 -07:00
TEST	Follow 0/-E convention for module load errors	2015-12-23 17:29:35 -08:00
autogen.sh	build: do not call boilerplate ourself	2013-04-02 10:55:20 -07:00
configure.ac	Move dracut directory to contrib	2015-07-09 13:59:37 -07:00
copy-builtin	Fix --enable-linux-builtin	2015-12-23 17:29:34 -08:00
zfs-script-config.sh.in	Initial implementation of zed (ZFS Event Daemon)	2014-04-02 13:10:03 -07:00
zfs.release.in	Move zfs.release generation to configure step	2012-07-12 12:22:51 -07:00

README.markdown

Native ZFS for Linux!

ZFS is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the Illumos community.

ZFS on Linux, which is also known as ZoL, is currently feature complete. It includes fully functional and stable SPA, DMU, ZVOL, and ZPL layers.

Full documentation for installing ZoL on your favorite Linux distribution can be found at: http://zfsonlinux.org