Because spl_slab_size() was always returning -ENOSPC for caches of
type KMC_OFFSLAB the cache could never be created. Additionally
the slab size is rounded up to a page which is what kv_alloc()
expects. The kv_alloc() code will minimally allocate a page,
in the KMC_OFFSLAB case this could be reduced.
The basic regression tests kmem:slab_small, kmem:slab_large,
and kmem:slab_align regression were updated to test KMC_OFFSLAB.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ying Zhu <casualfisher@gmail.com>
Closes#266
Update links to refer to the official ZFS on Linux website instead of
@behlendorf's personal fork on github.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Cache aging was implemented because it was part of the default Solaris
kmem_cache behavior. The idea is that per-cpu objects which haven't been
accessed in several seconds should be returned to the cache. On the other
hand Linux slabs never move objects back to the slabs unless there is
memory pressure on the system.
This behavior is now configurable through the 'spl_kmem_cache_expire'
module option. The value is a bit mask with the following meaning.
0x1 - Solaris style cache aging eviction is enabled.
0x2 - Linux style low memory eviction is enabled.
Both methods may be safely enabled simultaneously, but by default
both are disabled. It has never been clear if the kmem cache aging
(which has been around from day one) actually does any good. It has
however been the source of numerous bugs so I wouldn't mind retiring
it entirely.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes zfsonlinux/zfs#1227
Closes#210
Disable this test because it may result in an OOM event on the
system which can result in the test infrastructure being killed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Restructure the the SPLAT headers such that each test only
includes the minimal set of headers it requires.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
After the emergency slab objects were merged I started observing
timeout failures in the kmem:slab_overcommit test. These were
due to the ineffecient way the slab_overcommit reclaim function
was implemented. And due to the additional cost of potentially
allocating ten of thousands of emergency objects and tracking
them on a single list.
This patch addresses the first concern by enhansing the test
case to trace all of the allocations objects as a linked list.
This allows for a cleaner version of the reclaim function to
simply release SPLAT_KMEM_OBJ_RECLAIM objects.
Since this touches some common code all the tests which share
these data structions were also updated. After making these
changes slab_overcommit is reliably passing. However, there
is certainly additional cleanup which could be done here.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Gcc version 4.7.0 reports the delta.tv_sec in the slab reclaim test
as potentially unitialized. In practice this will never occur but
to keep gcc happy we initialize the variable to zero.
Signed-off-by: Brian Behlendorf <behlendo@fedora-17-amd64.(none)>
This test is designed to verify that direct reclaim is functioning as
expected. We allocate a large number of objects thus creating a large
number of slabs. We then apply memory pressure and expect that the
direct reclaim path can easily recover those slabs. The registered
reclaim function will free the objects and the slab shrinker will call
it repeatedly until at least a single slab can be freed.
Note it may not be possible to reclaim every last slab via direct reclaim
without a failure because the shrinker_rwsem may be contended. For this
reason, quickly reclaiming 3/4 of the slabs is considered a success.
This should all be possible within 10 seconds. For reference, on a
system with 2G of memory this test takes roughly 0.2 seconds to run.
It may take longer on larger memory systems but should still easily
complete in the alloted 10 seconds.
Signed-off-by: Prakash Surya <surya1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#107
Remove the kmem_set_warning() hack used by the kmem-splat regression
tests with a per-allocation flag called __GFP_NOWARN. This matches
the lower level linux flag of similar by slightly different function.
The idea is you can then explicitly set this flag on requests where
you know your breaking the max 8k rule but you need/want to do it
anyway.
This is currently used by the regression tests where we intentionally
push things to the limit but don't want the log noise. Additionally,
we are forced to use it in spl_kmem_cache_create() because by default
NR_CPUS is very large and theres no easy way to handle that.
Finally, I've added a stack_dump() call to the warning when it is
trigger to make to clear exactly where the allocation is taking place.
Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added
standardized headers to all source file to clearly indicate the
copyright, license, and to give credit where credit is due.
This regression test could crash in splat_kmem_cache_test_reclaim()
due to a race between the slab relclaim and the normal exiting of
the thread. Specifically, the kct structure could be free'd by
the thread performing the allocations while the reclaim function
was also working on that's threads kct structure. The simplest
fix is to extend the kcp->kcp_lock over the reclaim to prevent
the kct from being freed. A better fix would be to ref count
these structures, but since is just a regression this locking
change is enough. Surprisingly this was only observed commonly
under RHEL5.4 but all platform could have hit this.
The big fix here is the removal of kmalloc() in kv_alloc(). It used
to be true in previous kernels that kmallocs over PAGE_SIZE would
always be pages aligned. This is no longer true atleast in 2.6.31
there are no longer any alignment expectations. Since kv_alloc()
requires the resulting address to be page align we no only either
directly allocate pages in the KMC_KMEM case, or directly call
__vmalloc() both of which will always return a page aligned address.
Additionally, to avoid wasting memory size is always a power of two.
As for cleanup several helper functions were introduced to calculate
the aligned sizes of various data structures. This helps ensure no
case is accidentally missed where the alignment needs to be taken in
to account. The helpers now use P2ROUNDUP_TYPE instead of P2ROUNDUP
which is safer since the type will be explict and we no longer count
on the compiler to auto promote types hopefully as we expected.
Always wnforce minimum (SPL_KMEM_CACHE_ALIGN) and maximum (PAGE_SIZE)
alignment restrictions at cache creation time.
Use SPL_KMEM_CACHE_ALIGN in splat alignment test.
Basically everything we need to monitor the global memory state of
the system is now cleanly available via global_page_state(). The
problem is that this interface is still fairly recent, and there
has been one change in the page state enum which we need to handle.
These changes basically boil down to the following:
- If global_page_state() is available we should use it. Several
autoconf checks have been added to detect the correct enum names.
- If global_page_state() is not available check to see if
get_zone_counts() symbol is available and use that.
- If the get_zone_counts() symbol is not exported we have no choice
be to dynamically aquire it at load time. This is an absolute
last resort for old kernel which we don't want to patch to
cleanly export the symbol.
The slab_overcommit test case could hang on a system with fragmented
memory because it was creating a kmem based slab with 256K objects.
To avoid this I've removed the KMC_KMEM flag which allows the slab
to decide if it should be kmem or vmem backed based on the object
side. The slab_lock test shares this code and will also be effected.
But the point of these two tests is to stress cache locking and
memory overcommit, the type of slab is not critical. In fact, allowing
the slab to do the default smart thing is preferable.
- Proper ioctl() 32/64-bit binary compatibility. We need to ensure the
ioctl data itself is always packed the same for 32/64-bit binaries.
Additionally, the correct thing to do is encode this size in bytes
as part of the command using _IOC_SIZE().
- Minor formatting changes to respect the 80 character limit.
- Move all SPLAT_SUBSYSTEM_* defines in to splat-ctl.h.
- Increase SPLAT_SUBSYSTEM_UNKNOWN because we were getting close
to accidentally using it for a real registered subsystem.
In the interests of portability I have added a FC10/i686 box to
my list of development platforms. The hope is this will allow me
to keep current with upstream kernel API changes, and at the same
time ensure I don't accidentally break x86 support. This patch
resolves all remaining issues observed under that environment.
1) SPL_AC_ZONE_STAT_ITEM_FIA autoconf check added. As of 2.6.21
the kernel added a clean API for modules to get the global count
for free, inactive, and active pages. The SPL attempts to detect
if this API is available and directly map spl_global_page_state()
to global_page_state(). If the full API is not available then
spl_global_page_state() is implemented as a thin layer to get
these values via get_zone_counts() if that symbol is available.
2) New kmem:vmem_size regression test added to validate correct
vmem_size() functionality. The test case acquires the current
global vmem state, allocates from the vmem region, then verifies
the allocation is correctly reflected in the vmem_size() stats.
3) Change splat_kmem_cache_thread_test() to always use KMC_KMEM
based memory. On x86 systems with limited virtual address space
failures resulted due to exhaustig the address space. The tests
really need to problem exhausting all memory on the system thus
we need to use the physical address space.
4) Change kmem:slab_lock to cap it's memory usage at availrmem
instead of using the native linux nr_free_pages(). This provides
additional test coverage of the SPL Linux VM integration.
5) Change kmem:slab_overcommit to perform allocation of 256K
instead of 1M. On x86 based systems it is not possible to create
a kmem backed slab with entires of that size. To compensate for
this the number of allocations performed in increased by 4x.
6) Additional autoconf documentation for proposed upstream API
changes to make additional symbols available to modules.
7) Console error messages added when spl_kallsyms_lookup_name()
fails to locate an expected symbol. This causes the module to fail
to load and we need to know exactly which symbol was not available.
This was a false positive the callpath being walked is impossible
because the splat_kmem_cache_test_kcp_alloc() function will ensure
kcp->kcp_kcd[0] is initialized to NULL. However, there is no harm
is making this explicit for the test case so I'm adding a line to
clearly set it to correct the analysis.
- Added slab work queue task which gradually ages and free's slabs
from the cache which have not been used recently.
- Optimized slab packing algorithm to ensure each slab contains the
maximum number of objects without create to large a slab.
- Fix deadlock, we can never call kv_free() under the skc_lock. We
now unlink the objects and slabs from the cache itself and attach
them to a private work list. The contents of the list are then
subsequently freed outside the spin lock.
- Move magazine create/destroy operation on to local cpu.
- Further performace optimizations by minimize the usage of the large
per-cache skc_lock. This includes the addition of KMC_BIT_REAPING
bit mask which is used to prevent concurrent reaping, and to defer
new slab creation when reaping is occuring.
- Add KMC_BIT_DESTROYING bit mask which is set when the cache is being
destroyed, this is used to catch any task accessing the cache while
it is being destroyed.
- Add comments to all the functions and additional comments to try
and make everything as clear as possible.
- Major cleanup and additions to the SPLAT kmem tests to more
rigerously stress the cache implementation and look for any problems.
This includes correctness and performance tests.
- Updated portable work queue interfaces