Illumos #3805 arc shouldn't cache freed blocks
3805 arc shouldn't cache freed blocks Reviewed by: George Wilson <george.wilson@delphix.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Richard Elling <richard.elling@dey-sys.com> Reviewed by: Will Andrews <will@firepipe.net> Approved by: Dan McDonald <danmcd@nexenta.com> References: illumos/illumos-gate@6e6d5868f5 https://www.illumos.org/issues/3805 ZFS should proactively evict freed blocks from the cache. On dcenter, we saw that we were caching ~256GB of metadata, while the pool only had <4GB of metadata on disk. We were wasting about half the system's RAM (252GB) on blocks that have been freed. Even though these freed blocks will never be used again, and thus will eventually be evicted, this causes us to use memory inefficiently for 2 reasons: 1. A block that is freed has no chance of being accessed again, but will be kept in memory preferentially to a block that was accessed before it (and is thus older) but has not been freed and thus has at least some chance of being accessed again. 2. We partition the ARC into several buckets: user data that has been accessed only once (MRU) metadata that has been accessed only once (MRU) user data that has been accessed more than once (MFU) metadata that has been accessed more than once (MFU) The user data vs metadata split is somewhat arbitrary, and the primary control on how much memory is used to cache data vs metadata is to simply try to keep the proportion the same as it has been in the past (each bucket "evicts against" itself). The secondary control is to evict data before evicting metadata. Because of this bucketing, we may end up with one bucket mostly containing freed blocks that are very old, while another bucket has more recently accessed, still-allocated blocks. Data in the useful bucket (with still-allocated blocks) may be evicted in preference to data in the useless bucket (with old, freed blocks). On dcenter, we saw that the MFU metadata bucket was 230MB, while the MFU data bucket was 27GB and the MRU metadata bucket was 256GB. However, the vast majority of data in the MRU metadata bucket (256GB) was freed blocks, and thus useless. Meanwhile, the MFU metadata bucket (230MB) was constantly evicting useful blocks that will be soon needed. The problem of cache segmentation is a larger problem that needs more investigation. However, if we stop caching freed blocks, it should reduce the impact of this more fundamental issue. Ported-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1503
This commit is contained in:
parent
6822a0d058
commit
df4474f92d
|
@ -20,6 +20,7 @@
|
|||
*/
|
||||
/*
|
||||
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
|
||||
* Copyright (c) 2013 by Delphix. All rights reserved.
|
||||
*/
|
||||
|
||||
#ifndef _SYS_ARC_H
|
||||
|
@ -127,6 +128,7 @@ zio_t *arc_write(zio_t *pio, spa_t *spa, uint64_t txg,
|
|||
|
||||
arc_prune_t *arc_add_prune_callback(arc_prune_func_t *func, void *private);
|
||||
void arc_remove_prune_callback(arc_prune_t *p);
|
||||
void arc_freed(spa_t *spa, const blkptr_t *bp);
|
||||
|
||||
void arc_set_callback(arc_buf_t *buf, arc_evict_func_t *func, void *private);
|
||||
int arc_buf_evict(arc_buf_t *buf);
|
||||
|
|
|
@ -3240,6 +3240,34 @@ arc_set_callback(arc_buf_t *buf, arc_evict_func_t *func, void *private)
|
|||
buf->b_private = private;
|
||||
}
|
||||
|
||||
/*
|
||||
* Notify the arc that a block was freed, and thus will never be used again.
|
||||
*/
|
||||
void
|
||||
arc_freed(spa_t *spa, const blkptr_t *bp)
|
||||
{
|
||||
arc_buf_hdr_t *hdr;
|
||||
kmutex_t *hash_lock;
|
||||
uint64_t guid = spa_load_guid(spa);
|
||||
|
||||
hdr = buf_hash_find(guid, BP_IDENTITY(bp), BP_PHYSICAL_BIRTH(bp),
|
||||
&hash_lock);
|
||||
if (hdr == NULL)
|
||||
return;
|
||||
if (HDR_BUF_AVAILABLE(hdr)) {
|
||||
arc_buf_t *buf = hdr->b_buf;
|
||||
add_reference(hdr, hash_lock, FTAG);
|
||||
hdr->b_flags &= ~ARC_BUF_AVAILABLE;
|
||||
mutex_exit(hash_lock);
|
||||
|
||||
arc_release(buf, FTAG);
|
||||
(void) arc_buf_remove_ref(buf, FTAG);
|
||||
} else {
|
||||
mutex_exit(hash_lock);
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
/*
|
||||
* This is used by the DMU to let the ARC know that a buffer is
|
||||
* being evicted, so the ARC should clean up. If this arc buf
|
||||
|
|
|
@ -789,6 +789,8 @@ zio_free_sync(zio_t *pio, spa_t *spa, uint64_t txg, const blkptr_t *bp,
|
|||
ASSERT(spa_syncing_txg(spa) == txg);
|
||||
ASSERT(spa_sync_pass(spa) < zfs_sync_pass_deferred_free);
|
||||
|
||||
arc_freed(spa, bp);
|
||||
|
||||
zio = zio_create(pio, spa, txg, bp, NULL, BP_GET_PSIZE(bp),
|
||||
NULL, NULL, ZIO_TYPE_FREE, ZIO_PRIORITY_FREE, flags,
|
||||
NULL, 0, NULL, ZIO_STAGE_OPEN, ZIO_FREE_PIPELINE);
|
||||
|
|
Loading…
Reference in New Issue