Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Clemens Fruhwirth	8e99d66b05	Add support for rw semaphore under PREEMPT_RT_FULL The main complication from the RT patch set is that the RW semaphore locks change such that read locks on an rwsem can be taken only by a single thread. All other threads are locked out. This single thread can take a read lock multiple times though. The underlying implementation changes to a mutex with an additional read_depth count. The implementation can be best understood by inspecting the RT patch. rwsem_rt.h and rt.c give the best insight into how RT rwsem works. My implementation for rwsem_tryupgrade is basically an inversion of rt_downgrade_write found in rt.c. Please see the comments in the code. Unfortunately, I have to drop SPLAT rwlock test4 completely as this test tries to take multiple locks from different threads, which RT rwsems do not support. Otherwise SPLAT, zconfig.sh, zpios-sanity.sh and zfs-tests.sh pass on my Debian-testing VM with the kernel linux-image-4.8.0-1-rt-amd64. Tested-by: kernelOfTruth <kerneloftruth@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Clemens Fruhwirth <clemens@endorphin.org> Closes zfsonlinux/zfs#5491 Closes #589 Closes #308	2016-12-19 12:45:24 -08:00
Chunwei Chen	9c9ad845ef	Refactor some splat macro to function Refactor the code by making splat_test_{init,fini}, splat_subsystem_{init,fini} into functions. They don't have reason to be macro and it would be too bloated to inline every call. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2016-12-15 11:30:11 -08:00
Chunwei Chen	71a3c9c45d	Fix splat memleak SPLAT_TEST_FINI didn't call kfree causing memleak. Signed-off-by: Chunwei Chen <david.chen@osnexus.com>	2016-12-15 11:30:11 -08:00
Ubuntu	cbba714667	Add TASKQID_INVALID and TASKQID_INITIAL macros Add the TASKQID_INVALID and TASKQID_INITIAL macros and update the taskq implementation and test cases to use them. This is solely for the purposes of readability and introduces no functional change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-11-02 10:34:19 -07:00
Ubuntu	1b457bcbe5	Fix vmem_size() Add a minimal implementation of vmem_size() which accounts for the virtual memory usage of the SPL's kmem cache. This functionality is only useful on 32-bit systems with a small virtual address space. The following assumptions are made: 1) The major SPL consumer of virtual memory is the kmem cache. 2) Memory allocated with vmem_alloc() is short lived and can be ignored. 3) Allow a 4MB floor as a generous pad given normal consumption. 4) The spl_kmem_cache_sem only contends with cache create/destroy. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2016-11-02 10:34:19 -07:00
Chunwei Chen	ae7eda1dde	Linux 4.9 compat: group_info changes In Linux 4.9, torvalds/linux@81243ea, group_info changed from 2d array via ->blocks to 1d array via ->gid. We change the spl cred functions accordingly. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #581	2016-10-20 09:33:28 -07:00
Chunwei Chen	87063d7dc3	Fix splat-cred.c cred usage No need to crhold current_cred(), fix possible leak in splat_cred_test2 Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Closes #556	2016-10-20 09:33:22 -07:00
Brian Behlendorf	6c2a66bfa8	Fix aarch64 type warning Explicitly cast type in splat-rwlock.c test case to silence the following warning. warning: format ‘%ld’ expects argument of type ‘long int’, but argument N has type ‘int’ Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #574	2016-10-01 18:33:01 -07:00
Chunwei Chen	ea5f1a200b	Fix use-after-free in splat_taskq_test7 This splat_vprint is using tq_arg->name after tq_arg is freed. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #557	2016-05-31 11:58:42 -07:00
Chunwei Chen	f58040c0fc	Implement a proper rw_tryupgrade Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes zfsonlinux/zfs#4692 Closes #554	2016-05-31 11:44:15 -07:00
Brian Behlendorf	a6ae97caed	Add rw_tryupgrade() This implementation of rw_tryupgrade() behaves slightly differently from its counterparts on other platforms. It drops the RW_READER lock and then acquires the RW_WRITER lock leaving a small window where no lock is held. On other platforms the lock is never released during the upgrade process. This is necessary under Linux because the kernel does not provide an upgrade function. There are currently no callers in the ZFS code where this change in behavior is a problem. In fact, in most cases the code is already written such that if the upgrade fails the RW_READER lock is dropped and the caller blocks waiting to acquire the lock as RW_WRITER. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Matthew Thode <prometheanfire@gentoo.org> Closes zfsonlinux/zfs#4388 Closes #534	2016-03-10 13:05:25 -08:00
Chunwei Chen	e843553d03	Don't hold mutex until release cv in cv_wait If a thread is holding mutex when doing cv_destroy, it might end up waiting a thread in cv_wait. The waiter would wake up trying to aquire the same mutex and cause deadlock. We solve this by move the mutex_enter to the bottom of cv_wait, so that the waiter will release the cv first, allowing cv_destroy to succeed and have a chance to free the mutex. This would create race condition on the cv_mutex. We use xchg to set and check it to ensure we won't be harmed by the race. This would result in the cv_mutex debugging becomes best-effort. Also, the change reveals a race, which was unlikely before, where we call mutex_destroy while test threads are still holding the mutex. We use kthread_stop to make sure the threads are exit before mutex_destroy. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue zfsonlinux/zfs#4166 Issue zfsonlinux/zfs#4106	2016-01-12 15:18:44 -08:00
Brian Behlendorf	2a552736b7	Fix do_div() types in condvar:timeout The do_div() macro expects unsigned types and this is detected in powerpc implementation of do_div(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #516	2015-12-22 10:32:35 -08:00
Brian Behlendorf	e7b75d9b46	Limit maximum object size in kmem tests Limit the maximum object size to 1/128 of total system memory for the kmem cache tests. Large values can result in out of memory errors for systems with less the 512M of memory. Additionally, use the known number of objects per-slab for calculating the number of objects to use for a test. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-11-16 15:02:24 -08:00
Brian Behlendorf	4fa4cab972	Linux 4.2 compat: misc_deregister() The misc_deregister() function was changed to a void return type. Rather than add compatibility code to detect this change simply ignore the return code on all kernels. It was only used to log an informational error message of no real value. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-09-01 09:20:45 -07:00
Brian Behlendorf	ebc2c9374b	Linux 4.2 compat: vfs_rename() Attempting to perform a vfs_rename() on Linux 4.2 and newer kernels results in an EACCES error. Rather than attempting to add and maintain more ugly compatibility code it's best to just retire this interface. As a first step the SPLAT test is disabled for Linux 4.2 and newer kernels. vn_rename: Failed vn_rename /tmp/vn.tmp.1 -> /tmp/vn.tmp.2 (13) Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#3653	2015-08-19 16:03:29 -07:00
Brian Behlendorf	62aa81a577	Add defclsyspri macro Add a new defclsyspri macro which can be used to request the default Linux scheduler priority. Neither the minclsyspri or maxclsyspri map to the default Linux kernel thread priority. This makes it awkward to create taskqs which run with the same priority as the rest of the kernel threads on the system which can lead to performance issues. All SPL callers which previously used minclsyspri or maxclsyspri have been changed to use defclsyspri. The vast majority of callers were part of the test suite which won't have an external impact. The few places where it could impact performance the change was from maxclsyspri to defclsyspri. This makes it more likely the process will be scheduled which may help performance. To facilitate further performance analysis the spl_taskq_thread_priority module option has been added. When disabled (0) all newly created kernel threads will use the default kernel thread priority. When enabled (1) the specified taskq priority will be used. By default this value is enabled (1). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-07-23 13:25:49 -07:00
Turbo Fredriksson	37d7cd94f3	Support parallel build trees (VPATH builds) Build products from an out of tree build should be written relative to the build directory. Sources should be referred to by their locations in the source directory. This is accomplished by adding the 'src' and 'obj' variables for the module Makefile.am, using relative paths to reference source files, and by setting VPATH when source files are not co-located with the Makefile. This enables the following: $ mkdir build $ cd build $ ../configure $ make -s This change also has the advantage of resolving the following warning which is generated by modern versions of automake. Makefile.am:00: warning: source file 'xxx' is in a subdirectory, Makefile.am:00: but option 'subdir-objects' is disabled Signed-off-by: Turbo Fredriksson <turbo@bayour.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#1082	2015-07-17 12:53:11 -07:00
Brian Behlendorf	f7a973d99b	Add TASKQ_DYNAMIC feature Setting the TASKQ_DYNAMIC flag will create a taskq with dynamic semantics. Initially only a single worker thread will be created to service tasks dispatched to the queue. As additional threads are needed they will be dynamically spawned up to the max number specified by 'nthreads'. When the threads are no longer needed, because the taskq is empty, they will automatically terminate. Due to the low cost of creating and destroying threads under Linux by default new threads and spawned and terminated aggressively. There are two modules options which can be tuned to adjust this behavior if needed. * spl_taskq_thread_sequential - The number of sequential tasks, without interruption, which needed to be handled by a worker thread before a new worker thread is spawned. Default 4. * spl_taskq_thread_dynamic - Provides the ability to completely disable the use of dynamic taskqs on the system. This is provided for the purposes of debugging and troubleshooting. Default 1 (enabled). This behavior is fundamentally consistent with the dynamic taskq implementation found in both illumos and FreeBSD. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #458	2015-06-24 15:14:18 -07:00
Chris Dunlop	a876b0305e	Make taskq_wait() block until the queue is empty Under Illumos taskq_wait() returns when there are no more tasks in the queue. This behavior differs from ZoL and FreeBSD where taskq_wait() returns when all the tasks in the queue at the beginning of the taskq_wait() call are complete. New tasks added whilst taskq_wait() is running will be ignored. This difference in semantics makes it possible that new subtle issues could be introduced when porting changes from Illumos. To avoid that possibility the taskq_wait() function is being updated such that it blocks until the queue in empty. The previous behavior remains available through the taskq_wait_outstanding() interface. Note that this function was previously called taskq_wait_all() but has been renamed to avoid confusion. Signed-off-by: Chris Dunlop <chris@onthe.net.au> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #455	2015-06-09 12:20:12 -07:00
Brian Behlendorf	6ab08667a4	Reduce splat_taskq_test2_impl() stack frame size Slightly increasing the size of a kmutex_t has caused us to exceed the stack frame warning size in splat_taskq_test2_impl(). To address this the tq_args have been moved to the heap. cc1: warnings being treated as errors spl-0.6.3/module/splat/splat-taskq.c:358: error: the frame size of 1040 bytes is larger than 1024 bytes Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #435	2015-03-03 10:18:31 -08:00
Brian Behlendorf	d0d5dd7144	Add MUTEX_FSTRANS mutex type There are regions in the ZFS code where it is desirable to be able to be set PF_FSTRANS while a specific mutex is held. The ZFS code could be updated to set/clear this flag in all the correct places, but this is undesirable for a few reasons. 1) It would require changes to a significant amount of the ZFS code. This would complicate applying patches from upstream. 2) It would be easy to accidentally miss a critical region in the initial patch or to have an future change introduce a new one. Both of these concerns can be addressed by adding a new mutex type which is responsible for managing PF_FSTRANS, support for which was added to the SPL in commit `9099312` - Merge branch 'kmem-rework'. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issue #435	2015-03-03 10:18:24 -08:00
Brian Behlendorf	c1bc8e610b	Retire spl_module_init()/spl_module_fini() In the original implementation of the SPL wrappers were provided for module initialization and cleanup. This was done to abstract away any compatibility code which might be needed for the SPL. As it turned out the only significant compatibility issue was that the default pwd during module load differed under Illumos and Linux. Since this is such as minor thing and the wrappers complicate the code they are being retired. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue zfsonlinux/zfs#2985	2015-02-27 13:43:39 -08:00
Brian Behlendorf	3018bffa9b	Refine slab cache sizing This change is designed to improve the memory utilization of slabs by more carefully setting their size. The way the code currently works is problematic for slabs which contain large objects (>1MB). This is due to slabs being unconditionally rounded up to a power of two which may result in unused space at the end of the slab. The reason the existing code rounds up every slab is because it assumes it will backed by the buddy allocator. Since the buddy allocator can only performs power of two allocations this is desirable because it avoids wasting any space. However, this logic breaks down if slab is backed by vmalloc() which operates at a page level granularity. In this case, the optimal thing to do is calculate the minimum required slab size given certain constraints (object size, alignment, objects/slab, etc). Therefore, this patch reworks the spl_slab_size() function so that it sizes KMC_KMEM slabs differently than KMC_VMEM slabs. KMC_KMEM slabs are rounded up to the nearest power of two, and KMC_VMEM slabs are allowed to be the minimum required size. This change also reduces the default number of objects per slab. This reduces how much memory a single cache object can pin, which can result in significant memory saving for highly fragmented caches. But depending on the workload it may result in slabs being allocated and freed more frequently. In practice, this has been shown to be a better default for most workloads. Also the maximum slab size has been reduced to 4MB on 32-bit systems. Due to the limited virtual address space it's critical the we be as frugal as possible. A limit of 4M still lets us reasonably comfortably allocate a limited number of 1MB objects. Finally, the kmem:slab_small and kmem:slab_large SPLAT tests were extended to provide better test coverage of various object sizes and alignments. Caches are created with random parameters and their basic functionality is verified by allocating several slabs worth of objects. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-01-16 13:55:09 -08:00
Brian Behlendorf	c3eabc75b1	Refactor generic memory allocation interfaces This patch achieves the following goals: 1. It replaces the preprocessor kmem flag to gfp flag mapping with proper translation logic. This eliminates the potential for surprises that were previously possible where kmem flags were mapped to gfp flags. 2. It maps vmem_alloc() allocations to kmem_alloc() for allocations sized less than or equal to the newly-added spl_kmem_alloc_max parameter. This ensures that small allocations will not contend on a single global lock, large allocations can still be handled, and potentially limited virtual address space will not be squandered. This behavior is entirely different than under Illumos due to different memory management strategies employed by the respective kernels. However, this functionally provides the semantics required. 3. The --disable-debug-kmem, --enable-debug-kmem (default), and --enable-debug-kmem-tracking allocators have been unified in to a single spl_kmem_alloc_impl() allocation function. This was done to simplify the code and make it more maintainable. 4. Improve portability by exposing an implementation of the memory allocations functions that can be safely used in the same way they are used on Illumos. Specifically, callers may safely use KM_SLEEP in contexts which perform filesystem IO. This allows us to eliminate an entire class of Linux specific changes which were previously required to avoid deadlocking the system. This change will be largely transparent to existing callers but there are a few caveats: 1. Because the headers were refactored and extraneous includes removed callers may find they need to explicitly add additional #includes. In particular, kmem_cache.h must now be explicitly includes to access the SPL's kmem cache implementation. This behavior is different from Illumos but it was done to avoid always masking the Linux slab functions when kmem.h is included. 2. Callers, like Lustre, which made assumptions about the definitions of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated. Other callers such as ZFS which did not will not require changes. 3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains its original meaning of allowing allocations to access reserved memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP. 4. The KM_NODEBUG flags has been retired and the default warning threshold increased to 32k. 5. The kmem_virt() functions has been removed. For callers which need to distinguish between a physical and virtual address use is_vmalloc_addr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-01-16 13:55:09 -08:00
Brian Behlendorf	e5b9b344c7	Refactor existing code This change introduces no functional changes to the memory management interfaces. It only restructures the existing codes by separating the kmem, vmem, and kmem cache implementations in the separate source and header files. Splitting this functionality in to separate files required the addition of spl_vmem_{init,fini}() and spl_kmem_cache_{initi,fini}() functions. Additionally, several minor changes to the #include's were required to accommodate the removal of extraneous header from kmem.h. But again, while large this patch introduces no functional changes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-01-16 13:55:08 -08:00
Brian Behlendorf	03a783534a	Fix debug object on stack warning When running the SPLAT tests on a kernel with CONFIG_DEBUG_OBJECTS=y enabled the following warning is generated. ODEBUG: object is on stack, but not annotated WARNING: at lib/debugobjects.c:300 __debug_object_init+0x221/0x480() This is caused by the test cases placing a debug object on the stack rather than the heap. This isn't harmful since they are small objects but to make CONFIG_DEBUG_OBJECTS=y happy the objects have been relocated to the heap. This impacted taskq tests 1, 3, and 7. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #424	2015-01-07 13:52:20 -08:00
Ned Bass	52479ecf58	Remove compat includes from sys/types.h Don't include the compatibility code in linux/_compat.h in the public header sys/types.h. This causes problems when an external code base includes the ZFS headers and has its own conflicting compatibility code. Lustre, in particular, defined SHRINK_STOP for compatibility with pre-3.12 kernels in a way that conflicted with the SPL's definition. Because Lustre ZFS OSD includes ZFS headers it fails to build due to a '"SHRINK_STOP" redefined' compiler warning. To avoid such conflicts only include the compat headers from .c files or private headers. Also, for consistency, include sys/.h before linux/*.h then sort by header name. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #411	2014-11-19 10:35:12 -08:00
Brian Behlendorf	8d9a23e82c	Retire legacy debugging infrastructure When the SPL was originally written Linux tracepoints were still in their infancy. Therefore, an entire debugging subsystem was added to facilite tracing which served us well for many years. Now that Linux tracepoints have matured they provide all the functionality of the previous tracing subsystem. Rather than maintain parallel functionality it makes sense to fully adopt tracepoints. Therefore, this patch retires the legacy debugging infrastructure. See zfsonlinux/zfs@bc9f413 for the tracepoint changes. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #408	2014-11-19 10:35:07 -08:00
Brian Behlendorf	917fef2732	Lower minimum objects/slab threshold As long as we can fit a minimum of one object/slab there's no reason to prevent the creation of the cache. This effectively pushes the maximum object size up to 32MB. The splat cache tests were extended accordingly to verify this functionality. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-11-05 10:08:21 -08:00
Tim Chase	802a4a2ad5	Linux 3.12 compat: shrinker semantics The new shrinker API as of Linux 3.12 modifies "struct shrinker" by replacing the @shrink callback with the pair of @count_objects and @scan_objects. It also requires the return value of @count_objects to return the number of objects actually freed whereas the previous @shrink callback returned the number of remaining freeable objects. This patch adds support for the new @scan_objects return value semantics and updates the splat shrinker test case appropriately. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #403	2014-10-28 09:20:13 -07:00
Brian Behlendorf	89a461e70c	Remove shrink_{i,d}node_cache() wrappers This is optional functionality which may or may not be useful to ZFS when using older kernels. It is never a hard requirement. Therefore this functionality is being removed from the SPL and a simpler slimmed down version will be added to ZFS. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	8bbbe46f86	Remove global memory variables Platforms such as Illumos and FreeBSD have historically provided global variables which summerize the memory state of a system. Linux on the otherhand doesn't expose any of this information to kernel modules and uses entirely different mechanisms for memory management. In order to simplify the original ZFS port to Linux these global variables were emulated by the SPL for the benefit of ZFS. As ZoL has matured over the years it has moved steadily away from these interfaces and now no longer depends on them at all. Therefore, this patch completely removes the global variables availrmem, minfree, desfree, lotsfree, needfree, swapfs_minfree, and swapfs_reserve. This greatly simplifies the memory management code and eliminates a common area of confusion. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	e1310afae3	Remove get_vmalloc_info() wrapper The get_vmalloc_info() function was used to back the vmem_size() function. This was always problematic and resulted in brittle code because the kernel never provided a clean interface for modules. However, it turns out that the only caller of this function in ZFS uses it to determine the total virtual address space size. This can be determined easily without get_vmalloc_info() so vmem_size() has been updated to take this approach which allows us to shed the get_vmalloc_info() dependency. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	3a92530563	Update code to use misc_register()/misc_deregister() When the SPL was originally written it was designed to use the device_create() and device_destroy() functions. Unfortunately, these functions changed considerably over the years making them difficult to rely on. As it turns out a better choice would have been to use the misc_register()/misc_deregister() functions. This interface for registering character devices has remained stable, is simple, and provides everything we need. Therefore the code has been reworked to use this interface. The higher level ZFS code has always depended on these same interfaces so this is also as a step towards minimizing our kernel dependencies. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:07:28 -07:00
Brian Behlendorf	0cb3dafccd	Update SPLAT to use kmutex_t for portability For consistency throughout the code update the SPLAT infrastructure to use the wrapped mutex interfaces. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:07:28 -07:00
Brian Behlendorf	6203295438	Make license compatibility checks consistent Apply the license specified in the META file to ensure the compatibility checks are all performed consistently. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:07:28 -07:00
Brian Behlendorf	81857a34d1	Fix bug in SPLAT taskq:front While running SPLAT on a kernel with CONFIG_DEBUG_ATOMIC_SLEEP enabled the taskq:front was flagged as a test which might sleep which in an unsafe context. Specifically, the splat_vprint() function which internally takes a mutex was being called under a spin lock. Moving the log function outside the spin lock cleanly solves this issue. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-03 10:42:20 -07:00
Brian Behlendorf	a073aeb060	Add KMC_SLAB cache type For small objects the Linux slab allocator has several advantages over its counterpart in the SPL. These include: 1) It is more memory-efficient and packs objects more tightly. 2) It is continually tuned to maximize performance. Therefore it makes sense to layer the SPLs slab allocator on top of the Linux slab allocator. This allows us to leverage the advantages above while preserving the Illumos semantics we depend on. However, there are some things we need to be careful of: 1) The Linux slab allocator was never designed to work well with large objects. Because the SPL slab must still handle this use case a cut off limit was added to transition from Linux slab backed objects to kmem or vmem backed slabs. spl_kmem_cache_slab_limit - Objects less than or equal to this size in bytes will be backed by the Linux slab. By default this value is zero which disables the Linux slab functionality. Reasonable values for this cut off limit are in the range of 4096-16386 bytes. spl_kmem_cache_kmem_limit - Objects less than or equal to this size in bytes will be backed by a kmem slab. Objects over this size will be vmem backed instead. This value defaults to 1/8 a page, or 512 bytes on an x86_64 architecture. 2) Be aware that using the Linux slab may inadvertently introduce new deadlocks. Care has been taken previously to ensure that all allocations which occur in the write path use GFP_NOIO. However, there may be internal allocations performed in the Linux slab which do not honor these flags. If this is the case a deadlock may occur. The path forward is definitely to start relying on the Linux slab. But for that to happen we need to start building confidence that there aren't any unexpected surprises lurking for us. And ideally need to move completely away from using the SPLs slab for large memory allocations. This patch is a first step. NOTES: 1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab backed caches but it is not supported for kmem/vmem backed caches. 2) Regardless of the spl_kmem_cache_*_limit settings a cache may be explicitly set to a given type by passed the KMC_KMEM, KMC_VMEM, or KMC_SLAB flags during cache creation. 3) The constructors, destructors, and reclaim callbacks are all functional and will be called regardless of the cache type. 4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to the issues involved in presenting correct object accounting. Instead they will appear in /proc/slabinfo under the same names. 5) Several kmem SPLAT tests needed to be fixed because they relied incorrectly on internal kmem slab accounting. With the updated test cases all the SPLAT tests pass as expected. 6) An autoconf test was added to ensure that the __GFP_COMP flag was correctly added to the default flags used when allocating a slab. This is required to ensure all pages in higher order slabs are properly refcounted, see `ae16ed9`. 7) When using the SLUB allocator there is no need to attempt to set the __GFP_COMP flag. This has been the default behavior for the SLUB since Linux 2.6.25. 8) When using the SLUB it may be desirable to set the slub_nomerge kernel parameter to prevent caches from being merged. Original-patch-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #356	2014-05-22 10:28:01 -07:00
Chunwei Chen	545e9ac00a	Add ddi_time_after and friends When comparing times gotten from ddi_get_lbolt, we have to take account of wrap around of jiffies. Therefore, we cannot use 't1 < t2'. Instead we should use 't1 - t2 < 0'. This patch add ddi_time_after and friends to address this issue. They have strict type restriction, clock_t for vanilla and int64_t for 64 version, to prevent type conversion from screwing things. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #335	2014-04-14 09:32:01 -07:00
Tim Chase	17a527cb0f	Support post-3.13 kthread_create() semantics. Provide spl_kthread_create() as a wrapper to the kernel's kthread_create() to provide pre-3.13 semantics. Re-try if the call is interrupted or if it would have returned -ENOMEM. Otherwise return NULL. Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #339	2014-04-08 12:44:42 -07:00
Brian Behlendorf	e19101e08f	splat cred:groupmember: Fix false positives Due to certain assumptions made in the the cred:groupmember test it could result in false positives when run on specific distributions. This was solely a bug in the test case and not in the groupmember() function which the test case was validating. To prevent future false positives the test case has been rewritten to be both more rigerous and to make fewer assumptions about the system. Minor style cleanup was done to cr_groups_search() and groupmember() functions. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-08 12:44:41 -07:00
Brian Behlendorf	668d2a0da5	splat kmem:slab_reclaim: Test cleanup By setting __GFP_NORETRY the kernel memory reclaim logic was allowed to abort early and dump a falled allocation stack to the console. Since this was done in a tight loop to fill memory it could result in a large number of stacks being dumped to the console. This in turn slowed down the test sufficiently so it exceeded the time limit and failed. To resolve this issue the __GFP_NORETRY flag is being removed. This is how it should have been called originally to ensure we're simulating the behavior of most callers which will use the GFP_KERNEL flag. In addition, the reclaim granularity of 1000 objects was far to coarse for this to be a realistic test. For kmem:slab_reclaim there might only be a few thousand objects total in the cache. Therefore, the SPLAT_KMEM_OBJ_RECLAIM constant for these tests was lowered. This will cause the reclaim callback to run more frequently which makes for a better test case. The frequency of the cache reaping in kmem:slab_reap was increased to accommodate the reduced number of objects released during the reclaim. These changes only impact the test cases and were done to remove false positives caused by the test case itself. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-04-08 12:44:41 -07:00
Brian Behlendorf	921a35adeb	Add module versioning Use the standard Linux MODULE_VERSION macro to expose the installed spl and splat module versions. This will also automatically add a checksum of the .c files and headers in "srcversion". See: /sys/module/spl/version /sys/module/spl/srcversion /sys/module/splat/version /sys/module/splat/srcversion Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#1923 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-12-06 11:03:43 -08:00
Richard Yao	df2c0f1849	Replace current_kernel_time() with getnstimeofday() current_kernel_time() is used by the SPLAT, but it is not meant for performance measurement. We modify the SPLAT to use getnstimeofday(), which is equivalent to the gethrestime() function on Solaris. Additionally, we update gethrestime() to invoke getnstimeofday(). Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #279	2013-10-09 13:28:30 -07:00
Brian Behlendorf	ceb3872825	Fix KMC_OFFSLAB type caches Because spl_slab_size() was always returning -ENOSPC for caches of type KMC_OFFSLAB the cache could never be created. Additionally the slab size is rounded up to a page which is what kv_alloc() expects. The kv_alloc() code will minimally allocate a page, in the KMC_OFFSLAB case this could be reduced. The basic regression tests kmem:slab_small, kmem:slab_large, and kmem:slab_align regression were updated to test KMC_OFFSLAB. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Ying Zhu <casualfisher@gmail.com> Closes #266	2013-07-30 15:39:23 -07:00
Yuxuan Shui	79a7ab2581	Linux 3.10 compat: add missing include of linux/slab.h Linux kernel commit torvalds/linux@0d01ff2 changes some includes we were depending on through linux/proc_fs.h. Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #257	2013-07-08 15:21:28 -07:00
Ned Bass	3d6af2dd6d	Refresh links to web site Update links to refer to the official ZFS on Linux website instead of @behlendorf's personal fork on github. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-04 19:09:34 -08:00
Brian Behlendorf	0936c3449f	Add spl_kmem_cache_expire module option Cache aging was implemented because it was part of the default Solaris kmem_cache behavior. The idea is that per-cpu objects which haven't been accessed in several seconds should be returned to the cache. On the other hand Linux slabs never move objects back to the slabs unless there is memory pressure on the system. This behavior is now configurable through the 'spl_kmem_cache_expire' module option. The value is a bit mask with the following meaning. 0x1 - Solaris style cache aging eviction is enabled. 0x2 - Linux style low memory eviction is enabled. Both methods may be safely enabled simultaneously, but by default both are disabled. It has never been clear if the kmem cache aging (which has been around from day one) actually does any good. It has however been the source of numerous bugs so I wouldn't mind retiring it entirely. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#1227 Closes #210	2013-01-28 09:34:12 -08:00
Ned Bass	8842263bd0	call_usermodehelper() should wait for process As of Linux 3.4 the UMH_WAIT_* constants were renumbered. In particular, the meaning of "1" changed from UMH_WAIT_PROC (wait for process to complete), to UMH_WAIT_EXEC (wait for the exec, but not the process). A number of call sites used the number 1 instead of the constant name, so the behavior was not as expected on kernels with this change. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-01-09 16:54:19 -08:00

1 2 3

132 Commits