Archive-Team/zfs - zfs - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Tim Chase	200366f23f	Provide kstat for taskqs This patch provides 2 new kstats to display task queues: /proc/spl/taskqs-all - Display all task queues /proc/spl/taskqs - Display only "active" task queues A task queue is considered to be "active" if it currently has active (running) threads or if any of its pending, priority, delay or waitq lists are not empty. If the task queue has running threads, displays each thread function's address (symbolically, if possibly) and its argument. If the task queue has a non-empty list of pending, priority or delayed task queue entries (taskq_ent_t), displays each entry's thread function address and arguemnt. If the task queue has any waiters, displays each waiting task's pid. Note: This patch also updates some comments in taskq.h which referred to "taskq_t" when they should have referred to "taskq_ent_t". Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #491	2015-12-16 09:35:22 -08:00
Brian Behlendorf	9eb361aaa5	Default to --disable-debug-kmem The default kmem debugging (--enable-debug-kmem) can severely impact performance on large-scale NUMA systems due to the atomic operations used in the memory accounting. A 32-thread fio test running on a 40-core 80-thread system and performing 100% cached reads with kmem debugging is: Enabled: READ: io=177071MB, aggrb=2951.2MB/s, minb=2951.2MB/s, maxb=2951.2MB/s, Disabled: READ: io=271454MB, aggrb=4524.4MB/s, minb=4524.4MB/s, maxb=4524.4MB/s, Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Issues #463	2015-07-21 11:47:10 -07:00
Brian Behlendorf	c3eabc75b1	Refactor generic memory allocation interfaces This patch achieves the following goals: 1. It replaces the preprocessor kmem flag to gfp flag mapping with proper translation logic. This eliminates the potential for surprises that were previously possible where kmem flags were mapped to gfp flags. 2. It maps vmem_alloc() allocations to kmem_alloc() for allocations sized less than or equal to the newly-added spl_kmem_alloc_max parameter. This ensures that small allocations will not contend on a single global lock, large allocations can still be handled, and potentially limited virtual address space will not be squandered. This behavior is entirely different than under Illumos due to different memory management strategies employed by the respective kernels. However, this functionally provides the semantics required. 3. The --disable-debug-kmem, --enable-debug-kmem (default), and --enable-debug-kmem-tracking allocators have been unified in to a single spl_kmem_alloc_impl() allocation function. This was done to simplify the code and make it more maintainable. 4. Improve portability by exposing an implementation of the memory allocations functions that can be safely used in the same way they are used on Illumos. Specifically, callers may safely use KM_SLEEP in contexts which perform filesystem IO. This allows us to eliminate an entire class of Linux specific changes which were previously required to avoid deadlocking the system. This change will be largely transparent to existing callers but there are a few caveats: 1. Because the headers were refactored and extraneous includes removed callers may find they need to explicitly add additional #includes. In particular, kmem_cache.h must now be explicitly includes to access the SPL's kmem cache implementation. This behavior is different from Illumos but it was done to avoid always masking the Linux slab functions when kmem.h is included. 2. Callers, like Lustre, which made assumptions about the definitions of KM_SLEEP, KM_NOSLEEP, and KM_PUSHPAGE will need to be updated. Other callers such as ZFS which did not will not require changes. 3. KM_PUSHPAGE is no longer overloaded to imply GFP_NOIO. It retains its original meaning of allowing allocations to access reserved memory. KM_PUSHPAGE callers can be converted back to KM_SLEEP. 4. The KM_NODEBUG flags has been retired and the default warning threshold increased to 32k. 5. The kmem_virt() functions has been removed. For callers which need to distinguish between a physical and virtual address use is_vmalloc_addr(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-01-16 13:55:09 -08:00
Brian Behlendorf	e5b9b344c7	Refactor existing code This change introduces no functional changes to the memory management interfaces. It only restructures the existing codes by separating the kmem, vmem, and kmem cache implementations in the separate source and header files. Splitting this functionality in to separate files required the addition of spl_vmem_{init,fini}() and spl_kmem_cache_{initi,fini}() functions. Additionally, several minor changes to the #include's were required to accommodate the removal of extraneous header from kmem.h. But again, while large this patch introduces no functional changes. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2015-01-16 13:55:08 -08:00
Brian Behlendorf	8d9a23e82c	Retire legacy debugging infrastructure When the SPL was originally written Linux tracepoints were still in their infancy. Therefore, an entire debugging subsystem was added to facilite tracing which served us well for many years. Now that Linux tracepoints have matured they provide all the functionality of the previous tracing subsystem. Rather than maintain parallel functionality it makes sense to fully adopt tracepoints. Therefore, this patch retires the legacy debugging infrastructure. See zfsonlinux/zfs@bc9f413 for the tracepoint changes. Signed-off-by: Ned Bass <bass6@llnl.gov> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #408	2014-11-19 10:35:07 -08:00
Brian Behlendorf	0fac9c9e6d	Remove proc_handler() wrapper As of Linux 2.6.32 the proc handlers where updated to expect only five arguments. Therefore there is no longer a need to maintain this compatibility code and this infrastructure can be simplified. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:52 -07:00
Brian Behlendorf	44778f4110	Remove kallsyms_lookup_name() wrapper After the removable of get_vmalloc_info(), the unused global memory variables, and the optional dcache/icache shrinkers there is no longer a need for the kallsyms compatibility code. This allows us to eliminate another brittle area of the code by removing the kernel upcall this functionality depended on for older kernels. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	8bbbe46f86	Remove global memory variables Platforms such as Illumos and FreeBSD have historically provided global variables which summerize the memory state of a system. Linux on the otherhand doesn't expose any of this information to kernel modules and uses entirely different mechanisms for memory management. In order to simplify the original ZFS port to Linux these global variables were emulated by the SPL for the benefit of ZFS. As ZoL has matured over the years it has moved steadily away from these interfaces and now no longer depends on them at all. Therefore, this patch completely removes the global variables availrmem, minfree, desfree, lotsfree, needfree, swapfs_minfree, and swapfs_reserve. This greatly simplifies the memory management code and eliminates a common area of confusion. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:51 -07:00
Brian Behlendorf	9c91800d19	Remove CTL_UNNUMBERED sysctl interface Support for the CTL_UNNUMBERED sysctl interface was removed in Linux 2.6.19. There is no longer any reason to maintain this compatibility code. There also issue any reason to keep around the CTL_NAME macro and helpers so they have been retired. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:50 -07:00
Brian Behlendorf	b38bf6a4e3	Remove register_sysctl() compatibility code The register_sysctl() interface has been stable since Linux 2.6.21. There is no longer a need to maintain compatibility code. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2014-10-17 15:11:50 -07:00
Brian Behlendorf	a073aeb060	Add KMC_SLAB cache type For small objects the Linux slab allocator has several advantages over its counterpart in the SPL. These include: 1) It is more memory-efficient and packs objects more tightly. 2) It is continually tuned to maximize performance. Therefore it makes sense to layer the SPLs slab allocator on top of the Linux slab allocator. This allows us to leverage the advantages above while preserving the Illumos semantics we depend on. However, there are some things we need to be careful of: 1) The Linux slab allocator was never designed to work well with large objects. Because the SPL slab must still handle this use case a cut off limit was added to transition from Linux slab backed objects to kmem or vmem backed slabs. spl_kmem_cache_slab_limit - Objects less than or equal to this size in bytes will be backed by the Linux slab. By default this value is zero which disables the Linux slab functionality. Reasonable values for this cut off limit are in the range of 4096-16386 bytes. spl_kmem_cache_kmem_limit - Objects less than or equal to this size in bytes will be backed by a kmem slab. Objects over this size will be vmem backed instead. This value defaults to 1/8 a page, or 512 bytes on an x86_64 architecture. 2) Be aware that using the Linux slab may inadvertently introduce new deadlocks. Care has been taken previously to ensure that all allocations which occur in the write path use GFP_NOIO. However, there may be internal allocations performed in the Linux slab which do not honor these flags. If this is the case a deadlock may occur. The path forward is definitely to start relying on the Linux slab. But for that to happen we need to start building confidence that there aren't any unexpected surprises lurking for us. And ideally need to move completely away from using the SPLs slab for large memory allocations. This patch is a first step. NOTES: 1) The KMC_NOMAGAZINE flag was leveraged to support the Linux slab backed caches but it is not supported for kmem/vmem backed caches. 2) Regardless of the spl_kmem_cache_*_limit settings a cache may be explicitly set to a given type by passed the KMC_KMEM, KMC_VMEM, or KMC_SLAB flags during cache creation. 3) The constructors, destructors, and reclaim callbacks are all functional and will be called regardless of the cache type. 4) KMC_SLAB caches will not appear in /proc/spl/kmem/slab due to the issues involved in presenting correct object accounting. Instead they will appear in /proc/slabinfo under the same names. 5) Several kmem SPLAT tests needed to be fixed because they relied incorrectly on internal kmem slab accounting. With the updated test cases all the SPLAT tests pass as expected. 6) An autoconf test was added to ensure that the __GFP_COMP flag was correctly added to the default flags used when allocating a slab. This is required to ensure all pages in higher order slabs are properly refcounted, see `ae16ed9`. 7) When using the SLUB allocator there is no need to attempt to set the __GFP_COMP flag. This has been the default behavior for the SLUB since Linux 2.6.25. 8) When using the SLUB it may be desirable to set the slub_nomerge kernel parameter to prevent caches from being merged. Original-patch-by: DHE <git@dehacked.net> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Signed-off-by: DHE <git@dehacked.net> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #356	2014-05-22 10:28:01 -07:00
Richard Yao	acf0ade362	Simplify hostid logic There is plenty of compatibility code for a hw_hostid that isn't used by anything. At the same time, there are apparently issues with the current hostid logic. coredumb in #zfsonlinux on freenode reported that Fedora 17 changes its hostid on every boot, which required force importing his pool. A suggestion by wca was to adopt FreeBSD's behavior, where it treats hostid as zero if /etc/hostid does not exist Adopting FreeBSD's behavior permits us to eliminate plenty of code, including a userland helper that invokes the system's hostid as a fallback. Signed-off-by: Richard Yao <ryao@cs.stonybrook.edu> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #224	2014-04-14 09:04:41 -07:00
Richard Yao	e3c4d44886	PaX/GrSecurity Linux 3.8.y compat: Use __no_const on struct ctl_table The PaX team started constifying `struct ctl_table` as of their Linux 3.8.0 patchset. This lead to zfsonlinux/spl#225 and Gentoo bug #463012. While investigating our options, I learned that there is a preprocessor directive called CONSTIFY_PLUGIN that we can use to detect the presence of the PaX changes and adjust the code accordingly. The PaX Team had suggested adopting ctl_table_no_const, but supporting older kernels required declaring that whenever the CONSTIFY_PLUGIN was set. Future compiler changes could potentially cause that to break in the presence of -Werror, so instead we define our own spl_ctl_table typdef and use that. This should be compatible with all PaX kernels. This introduces a Linux kernel version number check to prevent a build failure on versions of the PaX GCC plugin that existed for kernels before Linux 3.8.0. Affected versions of the PaX plugin will trigger a compiler error when they see no_const cast on a non-constified structure. Ordinarily, we would need an autotools check to catch that. However, it is safe to do a kernel version check instead of an autotools check in this specific instance because the affected versions of the PaX GCC plugin only exist for Linux kernels before 3.8.0 and the constification of `struct ctl_table` by the PaX developers only occurs in Linux 3.8.0 and later. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #225	2013-08-08 09:51:34 -07:00
Richard Yao	f2a745c41d	Linux 3.10 compat: Do not rely on struct proc_dir_entry definition Linux kernel commit torvalds/linux#59d8053f moved the definition of struct proc_dir_entry from include/linux/proc_fs.h to the private header fs/proc/internal.h. The SPL relied on that to map Solaris' kstat to entries in /proc/spl/kstat. Since the proc_dir_entry structure is now private the only safe thing to do is wrap the opaque proc handle with our own structure. This actually ends up simplify the code and is good because it moves us away from depending on implementation details of /proc. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue #257	2013-07-08 15:25:18 -07:00
Ned Bass	3d6af2dd6d	Refresh links to web site Update links to refer to the official ZFS on Linux website instead of @behlendorf's personal fork on github. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2013-03-04 19:09:34 -08:00
Brian Behlendorf	034f1b331e	Fix spl_kmem_init_kallsyms_lookup() panic Due to I/O buffering the helper may return successfully before the proc handler has a chance to execute. To catch this case wait up to 1 second to verify spl_kallsyms_lookup_name_fn was updated to a non SYMBOL_POISON value. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes zfsonlinux/zfs#699 Closes zfsonlinux/zfs#859	2012-12-19 09:06:35 -08:00
Brian Behlendorf	165f13c33a	Improved vmem cached deadlock detection The entire goal of performing the slab allocations asynchronously is to be able to detect when a vmalloc() deadlocks. In this case, and only this case, do we want to start allocating emergency objects. The trick here is to minimize false positives because the overhead of tracking emergency objects is far higher than normal slab objects. With that goal in mind the code was reworked to be less sensitive to slow allocations by increasing the wait time. Once a cache is is marked deadlocked all subsequent allocations which can not be satisfied with existing cache objects will immediately allocate new emergency objects. This behavior persists until the asynchronous allocation completes and clears the deadlocked flag. The result of these tweaks is that far fewer emergency objects get created which is important because this minimizes the cost of releasing them latter in kmem_cache_free(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-11-06 14:54:15 -08:00
Brian Behlendorf	e2dcc6e2b8	Emergency slab objects This patch is designed to resolve a deadlock which can occur with __vmalloc() based slabs. The issue is that the Linux kernel does not honor the flags passed to __vmalloc(). This makes it unsafe to use in a writeback context. Unfortunately, this is a use case ZFS depends on for correct operation. Fixing this issue in the upstream kernel was pursued and patches are available which resolve the issue. https://bugs.gentoo.org/show_bug.cgi?id=416685 However, these changes were rejected because upstream felt that using __vmalloc() in the context of writeback should never be done. Their solution was for us to rewrite parts of ZFS to accomidate the Linux VM. While that is probably the right long term solution, and it is something we want to pursue, it is not a trivial task and will likely destabilize the existing code. This work has been planned for the 0.7.0 release but in the meanwhile we want to improve the SPL slab implementation to accomidate this expected ZFS usage. This is accomplished by performing the __vmalloc() asynchronously in the context of a work queue. This doesn't prevent the posibility of the worker thread from deadlocking. However, the caller can now safely block on a wait queue for the slab allocation to complete. Normally this will occur in a reasonable amount of time and the caller will be woken up when the new slab is available,. The objects will then get cached in the per-cpu magazines and everything will proceed as usual. However, if the __vmalloc() deadlocks for the reasons described above, or is just very slow, then the callers on the wait queues will timeout out. When this rare situation occurs they will attempt to kmalloc() a single minimally sized object using the GFP_NOIO flags. This allocation will not deadlock because kmalloc() will honor the passed flags and the caller will be able to make forward progress. As long as forward progress can be maintained then even if the worker thread is deadlocked the critical thread will make progress. This will eventually allow the deadlocked worker thread to complete and normal operation will resume. These emergency allocations will likely be slow since they require contiguous pages. However, their use should be rare so the impact is expected to be minimal. If that turns out not to be the case in practice further optimizations are possible. One additional concern is if these emergency objects are long lived. Right now they are simply tracked on a list which must be walked when an object is freed. Is they accumulate on a system and the list grows freeing objects will become more expensive. This could be handled relatively easily by using a hash instead of a list, but that optimization (if needed) is left for a follow up patch. Additionally, these emeregency objects could be repacked in to existing slabs as objects are freed if the kmem_cache_set_move() functionality was implemented. See issue https://github.com/zfsonlinux/spl/issues/26 for full details. This work would also help reduce ZFS's memory fragmentation problems. The /proc/spl/kmem/slab file has had two new columns added at the end. The 'emerg' column reports the current number of these emergency objects in use for the cache, and the following 'max' column shows the historical worst case. These value should give us a good idea of how often these objects are needed. Based on these values under real use cases we can tune the default behavior. Lastly, as a side benefit using a single work queue for the slab allocations should reduce cpu contention on the global virtual address space lock. This should manifest itself as reduced cpu usage for the system. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-08-27 12:00:42 -07:00
Brian Behlendorf	4b2220f0b9	Add --enable-debug-log configure option Until now the notion of an internal debug logging infrastructure was conflated with enabling ASSERT()s. This patch clarifies things by cleanly breaking the two subsystem apart. The result of this is the following behavior. --enable-debug - Enable/disable code wrapped in ASSERT()s. --disable-debug ASSERT()s are used to check invariants and are never required for correct operation. They are disabled by default because they may impact performance. --enable-debug-log - Enable/disable the debug log infrastructure. --disable-debug-log This infrastructure allows the spl code and its consumer to log messages to an in-kernel log. The granularity of the logging can be controlled by a debug mask. By default the mask disables most debug messages resulting in a negligible performance impact. Because of this the debug log is enabled by default. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2012-02-02 11:27:54 -08:00
Brian Behlendorf	1114ae6ae7	Prepend spl_ to all init/fini functions This is a bit of cleanup I'd been meaning to get to for a while to reduce the chance of a type conflict. Well that conflict finally occurred with the kstat_init() function which conflicts with a function in the 2.6.32-6-pve kernel. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #56	2011-11-11 09:18:28 -08:00
Darik Horn	c95b308d12	Correct typos in the spl proc handler. Correct a format typo that causes /proc/sys/kernel/spl/hostid to return a decimal number instead of a hexadecimal number.	2011-04-24 20:56:07 -05:00
Darik Horn	fa6f7d8f9d	Import spl_hostid as a module parameter. Provide a call_usermodehelper() alternative by letting the hostid be passed as a module parameter like this: $ modprobe spl spl_hostid=0x12345678 Internally change the spl_hostid variable to unsigned long because that is the type that the coreutils /usr/bin/hostid returns. Move the hostid command into GET_HOSTID_CMD for consistency with the similar GET_KALLSYMS_ADDR_CMD invocation. Use argv[0] instead of sh_path for consistency internally and with other Linux drivers. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>	2011-04-21 09:41:01 -07:00
Brian Behlendorf	3336e29cc2	Add slab usage summeries to /proc One of the most common things you want to know when looking at the slab is how much memory is being used. This information was available in /proc/spl/kmem/slab but only on a per-slab basis. This commit adds the following /proc/sys/kernel/spl/kmem/slab* entries to make total slab usage easily available at a glance. slab_kmem_total - Total kmem slab size slab_kmem_avail - Alloc'd kmem slab size slab_kmem_max - Max observed kmem slab size slab_vmem_total - Total vmem slab size slab_vmem_avail - Alloc'd vmem slab size slab_vmem_max - Max observed vmem slab size NOTE: The slab_*_max values are expected to over report because they show maximum values since boot, not current values.	2011-04-06 20:06:03 -07:00
Brian Behlendorf	d0a1038ff3	Update /proc/spl/kmem/slab output The 'slab_fail', 'slab_create', and 'slab_destroy' columns in the slab output have been removed because they are virtually always zero and not very useful. The much more useful 'size' and 'alloc' columns have been added which show the total slab size and how much of the total size has been allocated to objects. Finally, the formatting has been updated to be much more human readable while still being friendly for tool like awk to parse.	2011-04-06 20:06:03 -07:00
Brian Behlendorf	b17edc10a9	Prefix all SPL debug macros with 'S' To avoid conflicts with symbols defined by dependent packages all debugging symbols have been prefixed with a 'S' for SPL. Any dependent package needing to integrate with the SPL debug should include the spl-debug.h header and use the 'S' prefixed macros. They must also build with DEBUG defined.	2010-07-20 13:30:40 -07:00
Brian Behlendorf	55abb0929e	Split <sys/debug.h> header To avoid symbol conflicts with dependent packages the debug header must be split in to several parts. The <sys/debug.h> header now only contains the Solaris macro's such as ASSERT and VERIFY. The spl-debug.h header contain the spl specific debugging infrastructure and should be included by any package which needs to use the spl logging. Finally the spl-trace.h header contains internal data structures only used for the log facility and should not be included by anythign by spl-debug.c. This way dependent packages can include the standard Solaris headers without picking up any SPL debug macros. However, if the dependant package want to integrate with the SPL debugging subsystem they can then explicitly include spl-debug.h. Along with this change I have dropped the CHECK_STACK macros because the upstream Linux kernel now has much better stack depth checking built in and we don't need this complexity. Additionally SBUG has been replaced with PANIC and provided as part of the Solaris macro set. While the Solaris version is really panic() that conflicts with the Linux kernel so we'll just have to make due to PANIC. It should rarely be called directly, the prefered usage would be an ASSERT or VERIFY. There's lots of change here but this cleanup was overdue.	2010-07-20 13:29:35 -07:00
Brian Behlendorf	79a3bf130b	Linux-2.6.33 compat, .ctl_name removed from struct ctl_table As of linux-2.6.33 the ctl_name member of the ctl_table struct has been entirely removed. The upstream code has been updated to depend entirely on the the procname member. To handle this all references to ctl_name are wrapped in a CTL_NAME macro which simply expands to nothing for newer kernels. Older kernels are supported by having it expand to .ctl_name = X just as before.	2010-06-30 12:49:12 -07:00
Brian Behlendorf	ae4c36adce	Cleanly split Linux proc.h (fs) from conflicting Solaris proc.h (process) Under linux the proc.h header is for the /proc filesystem, and under Solaris the proc/h header if for processes. This patch correctly moves the Linux proc functionality in a linux/proc_compat.h header and leaves the sys/proc.h for use by Solaris. Minor updates were required to all the call sites where it was included of course.	2010-06-11 15:57:25 -07:00
Brian Behlendorf	716154c592	Public Release Prep Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added standardized headers to all source file to clearly indicate the copyright, license, and to give credit where credit is due.	2010-05-17 15:18:00 -07:00
Brian Behlendorf	3977f8370f	Linux 2.6.32 compat, proc_handler() API change As of linux-2.6.32 the 'struct file *filp' argument was dropped from the proc_handle() prototype. It was apparently unused _almost_ everywhere in the kernel and this was simply cleanup. I've added a new SPL_AC_5ARGS_PROC_HANDLER autoconf check for this and the proper compat macros to correctly define the prototypes and some helper functions. It's not pretty but API compat changes rarely are.	2010-03-04 12:14:56 -08:00
Brian Behlendorf	242f539a2e	Add skc_flags and full header to /proc/spl/kmem/slab.	2009-12-11 11:20:08 -08:00
Brian Behlendorf	d04c8a563c	Atomic64 compatibility for 32-bit systems without kernel support. This patch is another step towards updating the code to handle the 32-bit kernels which I have not been regularly testing. This changes do not really impact the common case I'm expected which is the latest kernel running on an x86_64 arch. Until the linux-2.6.31 kernel the x86 arch did not have support for 64-bit atomic operations. Additionally, the new atomic_compat.h support for this case was wrong because it embedded a spinlock in the atomic variable which must always and only be 64-bits total. To handle these 32-bit issues we now simply fall back to the --enable-atomic-spinlock implementation if the kernel does not provide the 64-bit atomic funcs. The second issue this patch addresses is the DEBUG_KMEM assumption that there will always be atomic64 funcs available. On 32-bit archs this may not be true, and actually that's just fine. In that case the kernel will will never be able to allocate more the 32-bits worth anyway. So just check if atomic64 funcs are available, if they are not it means this is a 32-bit machine and we can safely use atomic_t's instead.	2009-12-04 15:54:12 -08:00
Brian Behlendorf	055ffd98cf	Autoconf --enable-debug-* cleanup Cleanup the --enable-debug-* configure options, this has been pending for quite some time and I am glad I finally got to it. To summerize: 1) All SPL_AC_DEBUG_* macros were updated to be a more autoconf friendly. This mainly involved shift to the GNU approved usage of AC_ARG_ENABLE and ensuring AS_IF is used rather than directly using an if [ test ] construct. 2) --enable-debug-kmem=yes by default. This simply enabled keeping a running tally of total memory allocated and freed and reporting a memory leak if there was one at module unload. Additionally, it ensure /proc/spl/kmem/slab will exist by default which is handy. The overhead is low for this and it should not impact performance. 3) --enable-debug-kmem-tracking=no by default. This option was added to provide a configure option to enable to detailed memory allocation tracking. This support was always there but you had to know where to turn it on. By default this support is disabled because it is known to badly hurt performence, however it is invaluable when chasing a memory leak. 4) --enable-debug-kstat removed. After further reflection I can't see why you would ever really want to turn this support off. It is now always on which had the nice side effect of simplifying the proc handling code in spl-proc.c. We can now always assume the top level directory will be there. 5) --enable-debug-callb removed. This never really did anything, it was put in provisionally because it might have been needed. It turns out it was not so I am just removing it to prevent confusion.	2009-10-30 13:58:51 -07:00
Brian Behlendorf	4d54fdee1d	Reimplement mutexs for Linux lock profiling/analysis For a generic explanation of why mutexs needed to be reimplemented to work with the kernel lock profiling see commits: `e811949a57` and `d28db80fd0` The specific changes made to the mutex implemetation are as follows. The Linux mutex structure is now directly embedded in the kmutex_t. This allows a kmutex_t to be directly case to a mutex struct and passed directly to the Linux primative. Just like with the rwlocks it is critical that these functions be implemented as '#defines to ensure the location information is preserved. The preprocessor can then do a direct replacement of the Solaris primative with the linux primative. Just as with the rwlocks we need to track the lock owner. Here things get a little more interesting because depending on your kernel version, and how you've built your kernel Linux may already do this for you. If your running a 2.6.29 or newer kernel on a SMP system the lock owner will be tracked. This was added to Linux to support adaptive mutexs, more on that shortly. Alternately, your kernel might track the lock owner if you've set CONFIG_DEBUG_MUTEXES in the kernel build. If neither of the above things is true for your kernel the kmutex_t type will include and track the lock owner to ensure correct behavior. This is all handled by a new autoconf check called SPL_AC_MUTEX_OWNER. Concerning adaptive mutexs these are a very recent development and they did not make it in to either the latest FC11 of SLES11 kernels. Ideally, I'd love to see this kernel change appear in one of these distros because it does help performance. From Linux kernel commit: 0d66bf6d3514b35eb6897629059443132992dbd7 "Testing with Ingo's test-mutex application... gave a 345% boost for VFS scalability on my testbox" However, if you don't want to backport this change yourself you can still simply export the task_curr() symbol. The kmutex_t implementation will use this symbol when it's available to provide it's own adaptive mutexs. Finally, DEBUG_MUTEX support was removed including the proc handlers. This was done because now that we are cleanly integrated with the kernel profiling all this information and much much more is available in debug kernel builds. This code was now redundant. Update mutexs validated on: - SLES10 (ppc64) - SLES11 (x86_64) - CHAOS4.2 (x86_64) - RHEL5.3 (x86_64) - RHEL6 (x86_64) - FC11 (x86_64)	2009-09-25 14:47:01 -07:00
Brian Behlendorf	96dded3844	SLES10 Fixes (part 2): - Configure check, the div64_64() function was renamed to div64_u64() as of 2.6.26. - Configure check, the global_page_state() fuction was introduced in 2.6.18 kernels. The earlier 2.6.16 based SLES10 must not try and use it, thankfully get_zone_counts() is still available. - To simplify debugging poison all symbols aquired dynamically using spl_kallsyms_lookup_name() with SYMBOL_POISON. - Add console messages when the user mode helpers fail. - spl_kmem_init_globals() use bit shifts instead of division. - When the monotonic clock is unavailable __gethrtime() must perform the HZ division as an 'unsigned long long' because the SPL only implements __udivdi3(), and not __divdi3() for 'long long' division on 32-bit arches.	2009-05-20 10:08:37 -07:00
Ricardo M. Correia	a0b5ae8aca	Fix off-by-1 truncation of hw_serial when converting from integer to string, when writing to /proc/sys/kernel/spl/spl_hostid. Fixes hostid mismatch which leads to assertion failure when the hostid/hw_serial is a 10-character decimal number: $ zpool status pool: lustre state: ONLINE lt-zpool: zpool_main.c:3176: status_callback: Assertion `reason == ZPOOL_STATUS_OK' failed. zsh: 5262 abort zpool status	2009-03-12 15:47:50 -07:00
Brian Behlendorf	d1ff2312b0	Linux VM Integration Cleanup Remove all instances of functions being reimplemented in the SPL. When the prototypes are available in the linux headers but the function address itself is not exported use kallsyms_lookup_name() to find the address. The function name itself can them become a define which calls a function pointer. This is preferable to reimplementing the function in the SPL because it ensures we get the correct version of the function for the running kernel. This is actually pretty safe because the prototype is defined in the headers so we know we are calling the function properly. This patch also includes a rhel5 kernel patch we exports the needed symbols so we don't need to use kallsyms_lookup_name(). There are autoconf checks to detect if the symbol is exported and if so to use it directly. We should add patches for stock upstream kernels as needed if for no other reason than so we can easily track which additional symbols we needed exported. Those patches can also be used by anyone willing to rebuild their kernel, but this should not be a requirement. The rhel5 version of the export-symbols patch has been applied to the chaos kernel. Additional fixes: 1) Implement vmem_size() function using get_vmalloc_info() 2) SPL_CHECK_SYMBOL_EXPORT macro updated to use $LINUX_OBJ instead of $LINUX because Module.symvers is a build product. When $LINUX_OBJ != $LINUX we will not properly detect exported symbols. 3) SPL_LINUX_COMPILE_IFELSE macro updated to add include2 and $LINUX/include search paths to allow proper compilation when the kernel target build directory is not the source directory.	2009-03-04 10:04:15 -08:00
Brian Behlendorf	99639e4a13	Add zone_get_hostid() function Minimal support added for the zone_get_hostid() function. Only global zones are supported therefore this function must be called with a NULL argumment. Additionally, I've added the HW_HOSTID_LEN define and updated all instances where a hard coded magic value of 11 was used; "A good riddance of bad rubbish!"	2009-02-19 11:26:17 -08:00
Brian Behlendorf	4ab13d3b5c	Additional Linux VM integration Added support for Solaris swapfs_minfree, and swapfs_reserve tunables. In additional availrmem is now available and return a reasonable value which is reasonably analogous to the Solaris meaning. On linux we return the sun of free and inactive pages since these are all easily reclaimable. All tunables are available in /proc/sys/kernel/spl/vm/* and they may need a little adjusting once we observe the real behavior. Some of the defaults are mapped to similar linux counterparts, others are straight from the OpenSolaris defaults.	2009-02-05 12:26:34 -08:00
Brian Behlendorf	36b313dacf	Linux VM integration / device special files Support added to provide reasonable values for the global Solaris VM variables: minfree, desfree, lotsfree, needfree. These values are set to the sum of their per-zone linux counterparts which should be close enough for Solaris consumers. When a non-GPL app links against the SPL we cannot use the udev interfaces, which means non of the device special files are created. Because of this I had added a poor mans udev which cause the SPL to invoke an upcall and create the basic devices when a minor is registered. When a minor is unregistered we use the vnode interface to unlink the special file.	2009-02-04 15:15:41 -08:00
Brian Behlendorf	617d5a673c	Rename modules to module and update references	2009-01-15 10:44:54 -08:00

41 Commits