kmem_alloc(KM_SLEEP) should use kvmalloc()

`kmem_alloc(size>PAGESIZE, KM_SLEEP)` is backed by `kmalloc()`, which
finds contiguous physical memory.  If there isn't enough contiguous
physical memory available (e.g. due to physical page fragmentation), the
OOM killer will be invoked to make more memory available.  This is not
ideal because processes may be killed when there is still plenty of free
memory (it just happens to be in individual pages, not contiguous runs
of pages).  We have observed this when allocating the ~13KB `zfs_cmd_t`,
for example in `zfsdev_ioctl()`.

This commit changes the behavior of
`kmem_alloc(size>PAGESIZE, KM_SLEEP)` when there are insufficient
contiguous free pages.  In this case we will find individual pages and
stitch them together using virtual memory.  This is accomplished by
using `kvmalloc()`, which implements the described behavior by trying
`kmalloc(__GFP_NORETRY)` and falling back on `vmalloc()`.

The behavior of `kmem_alloc(KM_NOSLEEP)` is not changed; it continues to
use `kmalloc(GPF_ATOMIC | __GFP_NORETRY)`.  This is because `vmalloc()`
may sleep.

Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Wilson <gwilson@delphix.com>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes #11461
This commit is contained in:
Matthew Ahrens 2021-04-06 12:44:54 -07:00 committed by GitHub
parent 57a1214e3a
commit bbcec73783
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 14 additions and 0 deletions

View File

@ -245,7 +245,21 @@ spl_kmem_alloc_impl(size_t size, int flags, int node)
return (NULL);
}
} else {
/*
* We use kmalloc when doing kmem_alloc(KM_NOSLEEP),
* because kvmalloc/vmalloc may sleep. We also use
* kmalloc on systems with limited kernel VA space (e.g.
* 32-bit), which have HIGHMEM. Otherwise we use
* kvmalloc, which tries to get contiguous physical
* memory (fast, like kmalloc) and falls back on using
* virtual memory to stitch together pages (slow, like
* vmalloc).
*/
#ifdef CONFIG_HIGHMEM
if (flags & KM_VMEM) {
#else
if ((flags & KM_VMEM) || !(flags & KM_NOSLEEP)) {
#endif
ptr = spl_kvmalloc(size, lflags);
} else {
ptr = kmalloc_node(size, lflags, node);