This fix was long overdue. Most of the ground work was laid long
ago to include the exact function and line number in the error message
which there was an issue with a memory allocation call. However,
probably due to lack of time at the moment that informatin never
made it in to the error message. This patch fixes that and trys
to standardize the kmem debug messages as well.
It turns out Solaris incidentally includes kstat.h from kmem.h. As
a side effect of this certain higher level .c files which should
explicitly include kstat.h don't because they happen to get it
via kmem.h. To make like easier for everyone I do the same.
This patch adds three missing Solaris functions: kmem_asprintf(), strfree(),
and strdup(). They are all implemented as a thin layer which just calls
their Linux counterparts. As part of this an autoconf check for kvasprintf
was added because it does not appear in older kernels. If the kernel does
not provide it then spl-generic implements it.
Additionally the dead DEBUG_KMEM_UNIMPLEMENTED code was removed to clean
things up and make the kmem.h a little more readable.
Add the basic xuio structure and typedefs for Solaris style zero copy.
There's a decent chance this will not be the way I handle this on Linux
but providing the basic types simplifies things for now.
Under linux the proc.h header is for the /proc filesystem, and under
Solaris the proc/h header if for processes. This patch correctly
moves the Linux proc functionality in a linux/proc_compat.h header
and leaves the sys/proc.h for use by Solaris. Minor updates were
required to all the call sites where it was included of course.
Running 'zpool create' on a 32-bit machine with an SPL compiled with
gcc 4.4.4 led to a stack overlow. This turned out to be due to some
sort of 'optimization' by gcc:
uint64_t __umoddi3(uint64_t dividend, uint64_t divisor)
{
return dividend - divisor * (dividend / divisor);
}
This code was supposed to be using __udivdi3 to implement /, but gcc
instead implemented it via __umoddi3 itself.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Remove RW_COUNT() from the rwlock implementation. The idea was that it
could be used as a generic wrapper for getting at the internal state
of a rwlock. While a good idea it's proven problematic to keep it
correct for multiple archs and internal implementation changes. In
short it hasn't been worth the trouble.
With that and simplicity in mind things have been updated to use the
rwsem_is_locked() function instead of RW_COUNT for the RW_*_HELD()
functions. As for rw_upgrade() it remains only implemented for
the generic rwsem implemenation. It remains to be determined if its
worth the effort of adding a custom implementation for each arch.
While I may prefer to have the system panic on an SBUG and to get
crash dump for analysis. I suspect most peoples systems are not
configured from crash dump and the best thing to so is to simply
halt the thread and print an error to the console. This way they
have a good chance of actually saving the stack trace and debug log.
Remove the kmem_set_warning() hack used by the kmem-splat regression
tests with a per-allocation flag called __GFP_NOWARN. This matches
the lower level linux flag of similar by slightly different function.
The idea is you can then explicitly set this flag on requests where
you know your breaking the max 8k rule but you need/want to do it
anyway.
This is currently used by the regression tests where we intentionally
push things to the limit but don't want the log noise. Additionally,
we are forced to use it in spl_kmem_cache_create() because by default
NR_CPUS is very large and theres no easy way to handle that.
Finally, I've added a stack_dump() call to the warning when it is
trigger to make to clear exactly where the allocation is taking place.
Using /tmp/ is a preferable default, it can always be overriden
using the module option on a case-by-case basis.
Additionally standardize some log messages based on the same
default log level used by the kernel.
Ensure kdevpkg is defined is srpm case before using it to define
the devel_requires macro. Interestingly this is not an issue for
rpm-4.7.1-4 but it is for rpm-4.4.2.3-18.
Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files. Added
standardized headers to all source file to clearly indicate the
copyright, license, and to give credit where credit is due.
While this does incur slightly more overhead we should be using
do_posix_clock_monotonic_gettime() for gethrtime() as described
by the existing comment.
This is a minor extension to the condition variable API to allow
for reasonable signal handling on Linux. The cv_wait() function by
definition must wait unconditionally for cv_signal()/cv_broadcast()
before waking it. This makes it impossible to woken by a signal
such as SIGTERM. The cv_wait_interruptible() function was added
to handle this case. It behaves identically to cv_wait() with the
exception that it waits interruptibly allowing a signal to wake it
up. This means you do need to be careful and check issig() after
waking.
When dumping a debug log first check that it is safe to create
a new thread and block waiting for it. If we are in an atomic
context or irqs and disabled it is not safe to sleep and we
must write out of the debug log from the current process.
During module init spl_setup()->The vn_set_pwd("/") was failing
with -EFAULT because user_path_dir() and __user_walk() both
expect 'filename' to be a user space address and it's not in
this case. To handle this the data segment size is increased
to to ensure strncpy_from_user() does not fail with -EFAULT.
Additionally, I've added a printk() warning to catch this and
log it to the console if it ever reoccurs. I thought everything
was working properly here because there consequences of this
failing are subtle and usually non-critical.
For kernels using the CONFIG_RWSEM_GENERIC_SPINLOCK implementation
nothing has changed. But if your kernel is building with arch
specific rwsems rw_tryupgrade() has been disabled until it can
be implemented correctly. In particular, the x86 implementation
now leverages atomic primatives for serialization rather than
spinlocks. So to get this working again it will need to be
implemented as a cmpxchg for x86 and likely something similiar
for other arches we are interested in. For now it's safest
to simply disable it.
The cleanest way to do this is to set AM_LIBTOOLFLAGS = --silent. However,
AM_LIBTOOLFLAGS is not honored by automake-1.9.6-2.1 which is what I have
been using. To cleanly handle this I am updating to automake-1.11-3 which
is why it looks like there is a lot of churn in the Makefiles.
We need dependent packages to be able to include spl_config.h to
build properly. This was partially solved in commit 0cbaeb1 by using
AH_BOTTOM to #undef common #defines (PACKAGE, VERSION, etc) which
autoconf always adds and cannot be easily removed. This solution
works as long as the spl_config.h is included before your projects
config.h. That turns out to be easier said than done. In particular,
this is a problem when your package includes its config.h using the
-include gcc option which ensures the first thing included is your
config.h.
To handle all cases cleanly I have removed the AH_BOTTOM hack and
replaced it with an AC_CONFIG_HEADERS command. This command runs
immediately after spl_config.h is written and with a little awk-foo
it strips the offending #defines from the file. This eliminates
the problem entirely and makes header safe for inclusion.
Also in this change I have removed the few places in the code where
spl_config.h is included. It is now added to the gcc compile line
to ensure the config results are always available.
Finally, I have also disabled the verbose kernel builds. If you
want them back you can always build with 'make V=1'. Since things
are working now they don't need to be on by default.
Allowing MAX_ORDER-1 sized allocations for kmem based slabs have
been observed to result in deadlocks. To help prvent this limit
max kmem based slab size to MAX_ORDER-3. Just for the record
callers should not be creating slabs like this, but if they do
we should still handle it as safely as we can.
Along with the addition of signed kernel modules in newer kernel
we have a few new build products we need to ignore. LKLM has the
whole thread for those interested: http://lkml.org/lkml/2007/2/14/164
If the distro/installation really is unsupported (i.e. unknown) we should
not make it look like a known distribution (i.e. RHEL) complete with
dependencies on other RPMs and trying to find kenrel source in the RH
standard location.
Additionally add 'k' prefix for kernel requires for consistency.
/lib/modules/$(uname -r)/source. This will likely fail when building
under a mock (http://fedoraproject.org/wiki/Projects/Mock) chroot
environment since `uname -r` will report the running kernel which
likely is not the kernel in your chroot. To cleanly handle this
we fallback to using the first kernel in your chroot.
The kernel-devel package which contains all the kernel headers and
a few build products such as Module.symver{s} is all the is required.
Full source is not needed.
As of linux-2.6.32 the 'struct file *filp' argument was dropped from
the proc_handle() prototype. It was apparently unused _almost_
everywhere in the kernel and this was simply cleanup.
I've added a new SPL_AC_5ARGS_PROC_HANDLER autoconf check for this and
the proper compat macros to correctly define the prototypes and some
helper functions. It's not pretty but API compat changes rarely are.
Fix panic() string, which was being used as a format string, instead of an already-formatted string.
Signed-off-by: Ricardo M. Correia <Ricardo.M.Correia@Sun.COM>
This test case verifies the correct behavior of taskq_wait_id().
In particular it ensure the the following two cases are handled
properly:
1) Task ids larger than the waited for task id can run and
complete as long as there is an available worker thread.
2) All task ids lower than the waited one must complete before
unblocking even if the waited task id itself has completed.
In the initial version of taskq_lowest_id() the entire pending and
work list was locked under the tq->tq_lock to determine the lowest
outstanding taskqid. At the time this done because I was rushed
and wanted to make sure it was right... fast was secondary. Well now
fast is important too so I carefully thought through the pending
and work list management and convinced myself it is safe and correct
to simply check the first entry. I added a large comment to the source
to explain this. But basically as long as we are careful to ensure the
pending and work list stay sorted this is safe and fast.
The motivation for this chance was that I was observing as much as
10% of the total CPU time go to waiting on the tq->tq_lock when the
pending list was long. This resolves that problems and frees up
that CPU time for something useful.
This regression test could crash in splat_kmem_cache_test_reclaim()
due to a race between the slab relclaim and the normal exiting of
the thread. Specifically, the kct structure could be free'd by
the thread performing the allocations while the reclaim function
was also working on that's threads kct structure. The simplest
fix is to extend the kcp->kcp_lock over the reclaim to prevent
the kct from being freed. A better fix would be to ref count
these structures, but since is just a regression this locking
change is enough. Surprisingly this was only observed commonly
under RHEL5.4 but all platform could have hit this.