OpenZFS on Linux and FreeBSD
Go to file
Matthew Ahrens df4474f92d Illumos #3805 arc shouldn't cache freed blocks
3805 arc shouldn't cache freed blocks
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Elling <richard.elling@dey-sys.com>
Reviewed by: Will Andrews <will@firepipe.net>
Approved by: Dan McDonald <danmcd@nexenta.com>

References:
  illumos/illumos-gate@6e6d5868f5
  https://www.illumos.org/issues/3805

ZFS should proactively evict freed blocks from the cache.

On dcenter, we saw that we were caching ~256GB of metadata, while the
pool only had <4GB of metadata on disk.  We were wasting about half the
system's RAM (252GB) on blocks that have been freed.

Even though these freed blocks will never be used again, and thus will
eventually be evicted, this causes us to use memory inefficiently for 2
reasons:

1. A block that is freed has no chance of being accessed again, but will
be kept in memory preferentially to a block that was accessed before it
(and is thus older) but has not been freed and thus has at least some
chance of being accessed again.

2. We partition the ARC into several buckets:
user data that has been accessed only once (MRU)
metadata that has been accessed only once (MRU)
user data that has been accessed more than once (MFU)
metadata that has been accessed more than once (MFU)

The user data vs metadata split is somewhat arbitrary, and the primary
control on how much memory is used to cache data vs metadata is to
simply try to keep the proportion the same as it has been in the past
(each bucket "evicts against" itself).  The secondary control is to
evict data before evicting metadata.

Because of this bucketing, we may end up with one bucket mostly
containing freed blocks that are very old, while another bucket has more
recently accessed, still-allocated blocks.  Data in the useful bucket
(with still-allocated blocks) may be evicted in preference to data in
the useless bucket (with old, freed blocks).

On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU
data bucket was 27GB and the MRU metadata bucket was 256GB.  However,
the vast majority of data in the MRU metadata bucket (256GB) was freed
blocks, and thus useless.  Meanwhile, the MFU metadata bucket (230MB)
was constantly evicting useful blocks that will be soon needed.

The problem of cache segmentation is a larger problem that needs more
investigation.  However, if we stop caching freed blocks, it should
reduce the impact of this more fundamental issue.

Ported-by: Richard Yao <ryao@cs.stonybrook.edu>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #1503
2013-06-20 09:55:52 -07:00
cmd Illumos #3552, #3564 2013-06-19 16:22:39 -07:00
config Ensure --with-spl-timeout waits for spl_config.h and symvers 2013-05-02 15:40:44 -07:00
dracut Refresh dracut module setup 2013-03-21 12:51:06 -07:00
etc Improve OpenRC init script 2013-06-18 17:03:25 -07:00
include Illumos #3805 arc shouldn't cache freed blocks 2013-06-20 09:55:52 -07:00
lib Illumos #3006 2013-06-19 15:14:10 -07:00
man Adds zpool split to man page 2013-06-18 15:30:07 -07:00
module Illumos #3805 arc shouldn't cache freed blocks 2013-06-20 09:55:52 -07:00
patches Adding grub2 mkconfig support patch 2012-07-30 16:17:23 -07:00
rpm Modified arcstat.py to run on linux 2013-06-18 15:43:15 -07:00
scripts build: resolve orthographic and other grammatical errors 2013-04-02 10:44:52 -07:00
udev Retire zpool_id infrastructure 2013-01-29 12:23:17 -08:00
.gitignore Ignore *.{deb,rpm,tar.gz} files in the top directory. 2013-04-24 16:18:59 -07:00
AUTHORS Fix minor typos and update marketing copy. 2013-03-21 12:51:06 -07:00
COPYRIGHT Refresh links to web site 2013-03-06 15:46:41 -08:00
DISCLAIMER Fix minor typos and update marketing copy. 2013-03-21 12:51:06 -07:00
META Tag zfs-0.6.1 2013-03-26 08:50:29 -07:00
Makefile.am build: do not call boilerplate ourself 2013-04-02 10:55:20 -07:00
OPENSOLARIS.LICENSE Add CDDL license file 2008-12-01 14:49:34 -08:00
README.markdown Fix minor typos and update marketing copy. 2013-03-21 12:51:06 -07:00
autogen.sh build: do not call boilerplate ourself 2013-04-02 10:55:20 -07:00
configure.ac Modified arcstat.py to run on linux 2013-06-18 15:43:15 -07:00
copy-builtin Consistent menuconfig name 2012-08-26 13:49:37 -07:00
zfs-script-config.sh.in Retire zpool_id infrastructure 2013-01-29 12:23:17 -08:00
zfs.release.in Move zfs.release generation to configure step 2012-07-12 12:22:51 -07:00

README.markdown

Native ZFS for Linux!

ZFS is an advanced file system and volume manager which was originally developed for Solaris and is now maintained by the Illumos community.

ZFS on Linux, which is also known as ZoL, is currently feature complete. It includes fully functional and stable SPA, DMU, ZVOL, and ZPL layers.

Full documentation for installing ZoL on your favorite Linux distribution can be found at: http://zfsonlinux.org