Linux: Make zfs_prune() fair on NUMA systems

Previous code evicted nr_to_scan items from each NUMA node.  This
not only multiplied the eviction by the number of nodes, but could
exhaust the smaller ones, evicting inodes used by acive workload
and requiring their immediate recreation.  This patch spreads the
requested eviction between all NUMA nodes proportionally to their
evictable counts, which should be closer to expected LRU logic.
See kernel's super_cache_scan() as a similar logic example.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
This commit is contained in:
Alexander Motin 2024-07-29 17:08:32 -04:00
parent 62e7d3c89e
commit 3c87835471
1 changed files with 13 additions and 5 deletions

View File

@ -1264,14 +1264,22 @@ zfs_prune(struct super_block *sb, unsigned long nr_to_scan, int *objects)
defined(SHRINK_CONTROL_HAS_NID) && \ defined(SHRINK_CONTROL_HAS_NID) && \
defined(SHRINKER_NUMA_AWARE) defined(SHRINKER_NUMA_AWARE)
if (shrinker->flags & SHRINKER_NUMA_AWARE) { if (shrinker->flags & SHRINKER_NUMA_AWARE) {
long tc = 1;
for_each_online_node(sc.nid) {
long c = shrinker->count_objects(shrinker, &sc);
if (c == 0 || c == SHRINK_EMPTY)
continue;
tc += c;
}
*objects = 0; *objects = 0;
for_each_online_node(sc.nid) { for_each_online_node(sc.nid) {
long c = shrinker->count_objects(shrinker, &sc);
if (c == 0 || c == SHRINK_EMPTY)
continue;
if (c > tc)
tc = c;
sc.nr_to_scan = mult_frac(nr_to_scan, c, tc) + 1;
*objects += (*shrinker->scan_objects)(shrinker, &sc); *objects += (*shrinker->scan_objects)(shrinker, &sc);
/*
* reset sc.nr_to_scan, modified by
* scan_objects == super_cache_scan
*/
sc.nr_to_scan = nr_to_scan;
} }
} else { } else {
*objects = (*shrinker->scan_objects)(shrinker, &sc); *objects = (*shrinker->scan_objects)(shrinker, &sc);