Linux: Make zfs_prune() fair on NUMA systems

Previous code evicted nr_to_scan items from each NUMA node.  This
not only multiplied the eviction by the number of nodes, but could
exhaust the smaller ones, evicting inodes used by acive workload
and requiring their immediate recreation.  This patch spreads the
requested eviction between all NUMA nodes proportionally to their
evictable counts, which should be closer to expected LRU logic.
See kernel's super_cache_scan() as a similar logic example.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ameer Hamza <ahamza@ixsystems.com>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16397
This commit is contained in:
Alexander Motin 2024-08-08 18:33:36 -04:00 committed by GitHub
parent 5b9f3b7664
commit 3ae05e34e5
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 13 additions and 5 deletions

View File

@ -1264,14 +1264,22 @@ zfs_prune(struct super_block *sb, unsigned long nr_to_scan, int *objects)
defined(SHRINK_CONTROL_HAS_NID) && \ defined(SHRINK_CONTROL_HAS_NID) && \
defined(SHRINKER_NUMA_AWARE) defined(SHRINKER_NUMA_AWARE)
if (shrinker->flags & SHRINKER_NUMA_AWARE) { if (shrinker->flags & SHRINKER_NUMA_AWARE) {
long tc = 1;
for_each_online_node(sc.nid) {
long c = shrinker->count_objects(shrinker, &sc);
if (c == 0 || c == SHRINK_EMPTY)
continue;
tc += c;
}
*objects = 0; *objects = 0;
for_each_online_node(sc.nid) { for_each_online_node(sc.nid) {
long c = shrinker->count_objects(shrinker, &sc);
if (c == 0 || c == SHRINK_EMPTY)
continue;
if (c > tc)
tc = c;
sc.nr_to_scan = mult_frac(nr_to_scan, c, tc) + 1;
*objects += (*shrinker->scan_objects)(shrinker, &sc); *objects += (*shrinker->scan_objects)(shrinker, &sc);
/*
* reset sc.nr_to_scan, modified by
* scan_objects == super_cache_scan
*/
sc.nr_to_scan = nr_to_scan;
} }
} else { } else {
*objects = (*shrinker->scan_objects)(shrinker, &sc); *objects = (*shrinker->scan_objects)(shrinker, &sc);