zfs/include
Matthew Ahrens 0dc2f70c5c OpenZFS 9486 - reduce memory used by device removal on fragmented pools
Device removal allocates a new location for each allocated segment on
the disk that's being removed.  Each allocation results in one entry in
the mapping table, which maps from old location + length to new
location.  When a fragmented disk is removed, this can result in a large
number of mapping entries, and thus a large amount of memory consumed by
the mapping table.  In the worst real-world cases, we've seen around 1GB
of RAM per 1TB of storage removed.

We can improve on this situation by allocating larger segments, which
span across both allocated and free regions of the device being removed.
By including free regions in the allocation (and thus mapping), we
reduce the number of mapping entries.  For example, if we have a 4K
allocation followed by 1K free and then 4K allocated, we would allocate
4+1+4 = 9KB, and then move the entire region (including allocated and
free parts).  In this case we used one mapping where previously we would
have used two, but often the ratio is much higher (up to 20:1 in
real-world use).  We then need to mark the regions that were free on the
removing device as free in the new locations, and also obsolete in the
mapping entry.

This method preserves the fragmentation of the removing device, rather
than consolidating its allocated space into a small number of chunks
where possible.  But it results in drastic reduction of memory used by
the mapping table - around 20x in the most-fragmented cases.

In the most fragmented real-world cases, this reduces memory used by the
mapping from ~1GB to ~50MB of RAM per 1TB of storage removed.  Less
fragmented cases will typically also see around 50-100MB of RAM per 1TB
of storage.

Porting notes:

* Add the following as module parameters:
    * zfs_condense_indirect_vdevs_enable
    * zfs_condense_max_obsolete_bytes

* Document the following module parameters:
   * zfs_condense_indirect_vdevs_enable
   * zfs_condense_max_obsolete_bytes
   * zfs_condense_min_mapping_bytes

Authored by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Ported-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Tim Chase <tim@chase2k.com>

OpenZFS-issue: https://illumos.org/issues/9486
OpenZFS-commit: https://github.com/ahrens/illumos/commit/07152e142e44c
External-issue: DLPX-57962
Closes #7536
2018-05-24 10:18:07 -07:00
..
linux Allow mounting datasets more than once 2018-04-13 10:44:05 -07:00
sys OpenZFS 9486 - reduce memory used by device removal on fragmented pools 2018-05-24 10:18:07 -07:00
Makefile.am Retire legacy test infrastructure 2017-08-15 17:26:38 -07:00
libnvpair.h Add JSON output support to channel programs 2018-03-19 12:40:58 -07:00
libuutil.h Correct cppcheck errors 2017-09-19 12:17:29 -07:00
libuutil_common.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
libuutil_impl.h Support custom build directories and move includes 2010-09-08 12:38:56 -07:00
libzfs.h OpenZFS 9075 - Improve ZFS pool import/load process and corrupted pool recovery 2018-05-08 21:35:27 -07:00
libzfs_core.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
libzfs_impl.h OpenZFS 7431 - ZFS Channel Programs 2018-02-08 15:28:18 -08:00
thread_pool.h Add libtpool (thread pools) 2017-08-09 15:31:08 -07:00
zfeature_common.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zfs_comutil.h Illumos #2882, #2883, #2900 2013-09-04 15:49:00 -07:00
zfs_deleg.h OpenZFS 7614, 9064 - zfs device evacuation/removal 2018-04-14 12:16:17 -07:00
zfs_fletcher.h DLPX-44812 integrate EP-220 large memory scalability 2016-11-29 14:34:27 -08:00
zfs_namecheck.h OpenZFS 7386 - zfs get does not work properly with bookmarks 2017-01-26 14:42:15 -08:00
zfs_prop.h Native Encryption for ZFS on Linux 2017-08-14 10:36:48 -07:00