Commit Graph

669 Commits

Author SHA1 Message Date
Brian Behlendorf 82b8c8fa64 Proposed fix for low memory ZFS deadlocks
Deadlocks in the zvol were observed when one of the ZFS threads
performing IO trys to allocate memory while the system is low
on memory.  The low memory condition causes dirty pages to be
synced to the zvol but this can't progress because the original
thread is blocked waiting on a memory allocation.  Thus we end
up deadlocking.

A proper solution proposed by Wizeman is to change KM_SLEEP from
GFP_KERNEL top GFP_NOFS.  This will prevent the memory allocation
which is trying to allocate memory from forcing a sync to the
zvol in shrink_page_list()->pageout().

The down side to all of this is that we are using a pretty big
hammer by changing KM_SLEEP.  This change means ALL of the zfs
memory allocations will be until to trigger dirty data to be
synced.  The caller still should be able to reclaim memory from
the various slab caches.  We will be totally dependent of other
kernel processes which happen to be running and a small number
of asynchronous reclaim threads to trigger the reclaim of dirty
data pages.  This should be OK but I think we may see some
slightly longer allocation times when under memory pressure.

We shall see.
2010-07-13 21:30:56 -07:00
Brian Behlendorf a4bfd8ea1b Add __divdi3(), remove __udivdi3() kernel dependency
Up until now no SPL consumer attempted to perform signed 64-bit
division so there was no need to support this.  That has now
changed so I adding 64-bit division support for 32-bit platforms.
The signed implementation is based on the unsigned version.

Since the have been several bug reports in the past concerning
correct 64-bit division on 32-bit platforms I added some long
over due regression tests.  Much to my surprise the unsigned
64-bit division regression tests failed.

This was surprising because __udivdi3() was implemented by simply
calling div64_u64() which is provided by the kernel.  This meant
that the linux kernels 64-bit division algorithm on 32-bit platforms
was flawed.  After some investigation this turned out to be exactly
the case.

Because of this I was forced to abandon the kernel helper and
instead to fully implement 64-bit division in the spl.  There are
several published implementation out there on how to do this
properly and I settled on one proposed in the book Hacker's Delight.
Their proposed algoritm is freely available without restriction
and I have just modified it to be linux kernel friendly.

The update implementation now passed all the unsigned and signed
regression tests.  This should be functional, but not fast, which is
good enough for out purposes.  If you want fast too I'd strongly
suggest you upgrade to a 64-bit platform.  I have also reported the
kernel bug and we'll see if we can't get it fixed up stream.
2010-07-13 16:44:02 -07:00
Brian Behlendorf d466208f1e Update config.guess to recognize additional distros
The following distros were added: redhat, fedora, debian,
ubuntu, sles, slackware, and gentoo.
2010-07-02 14:48:27 -07:00
Lars Johannsen dbe561d8ab Allow config/build to work with autoconf-2.65
As of autoconf-2.65 the AC_LANG_SOURCE source macro no longer
includes the confdef.h results when expanded.  To handle this
simply explicitly include confdef.h in conftest.c.  This will
cause two copies to of confdef.h to be added to the test for
earlier autoconf versions but this is not harmful.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-07-02 14:00:28 -07:00
Brian Behlendorf 1814251453 Require gawk the usermode helper fails with awk
For some reason when awk invoked by the usermode helper the command
always fails.  Interestingly gawk does not suffer from this problem
which is why I never observed this failure since the distro I tested
with all had gawk installed instead of awk.  Anyway, the simplest
thing to do here is to just make gawk mandatory.  I've added a
configure check for gawk specifically and have updated the command
to call gawk not awk.
2010-07-01 16:38:08 -07:00
Brian Behlendorf 7119bf7044 Add configure check for user_path_dir()
I didn't notice at the time but user_path_dir() was not introduced
at the same time as set_fs_pwd() change.  I had lumped the two
together but in fact user_path_dir() was introduced in 2.6.27 and
set_fs_pwd() taking 2 args was introduced in 2.6.25.  This means
builds against 2.6.25-2.6.26 kernels were broken.

To fix this I've added a check for user_path_dir() and no longer
assume that if set_fs_pwd() takes 2 args then user_path_dir() is
also available.
2010-07-01 13:53:26 -07:00
Brian Behlendorf e2d28a3743 Use $target_cpu instead of `arch`
We should not be using arch for a few reasons.  First off it might
not be installed on their system, and secondly they may be trying
to cross-compile.
2010-07-01 13:52:46 -07:00
Brian Behlendorf 8fd4e3af2e Check sourcelink is set before passing to readlink
When no source was found in any of the expected paths treat
this as fatal and provide the user with a hint as to what
they should do.
2010-07-01 13:52:04 -07:00
Ned Bass 55f10ae5e9 Implementation of a regression test for TQ_FRONT.
Use 3 threads and 8 tasks.  Dispatch the final 3 tasks with TQ_FRONT.
The first three tasks keep the worker threads busy while we stuff the
queues.  Use msleep() to force a known execution order, assuming
TQ_FRONT is properly honored.  Verify that the expected completion
order occurs.

The splat_taskq_test5_order() function may be useful in more than
one test.  This commit generalizes it by renaming the function to
splat_taskq_test_order() and adding a name argument instead of
assuming SPLAT_TASKQ_TEST5_NAME as the test name.

The documentation for splat taskq regression test #5 swaps the two required
completion orders in the diagram.  This commit corrects the error.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-07-01 10:59:52 -07:00
Ned Bass 1a73940d39 Initialize the /dev/splatctl device buffer
On open() and initialize the buffer with the SPL version string.  The
user space splat utility expects to find the SPL version string when
it opens and reads from /dev/splatctl.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-07-01 10:59:46 -07:00
Ned Bass f0d8bb26b4 Implementation of the TQ_FRONT flag.
Adds a task queue to receive tasks dispatched with TQ_FRONT.  Worker
threads pull tasks from this high priority queue before the default
pending queue.

Executing tasks out of FIFO order potentially breaks taskq_lowest_id()
if we do not preserve the ordering of the work list by taskqid.
Therefore, instead of always appending to the work list, we search for
the appropriate place to insert a task.  The common case is to append
to the list, so we make this operation efficient by searching the work
list in reverse order.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-07-01 10:59:38 -07:00
Brian Behlendorf c2688979a4 Remove AC_DEFINE for DEBUG/NDEBUG
Whoops, I momentarilly forgot I had explicitly set these as CC
options so dependent packages which need to include spl_config.h
would not end up having these defined which can result in
accidentally hanging debug enabled at best, or a build failure
at worst.
2010-07-01 09:40:29 -07:00
Brian Behlendorf c950d1480d Only make compiler warnings fatal with --enable-debug
While in theory I like the idea of compiler warnings always being
fatal.  In practice this causes problems when small harmless errors
cause build failures for end users.  To handle this I've updated
the build system such that -Werror is only used when --enable-debug
is passed to configure.  This is how I always build when developing
so I'll catch all build warnings and end users will not get stuck
by minor issues.
2010-06-30 17:05:36 -07:00
Brian Behlendorf 6801b7154c Linux-2.6.33 compat, O_DSYNC flag added
Prior to linux-2.6.33 only O_DSYNC semantics were implemented and
they used the O_SYNC flag.  As of linux-2.6.33 this behavior was
properly split in to O_SYNC and O_DSYNC respectively.
2010-06-30 12:49:39 -07:00
Brian Behlendorf 79a3bf130b Linux-2.6.33 compat, .ctl_name removed from struct ctl_table
As of linux-2.6.33 the ctl_name member of the ctl_table struct
has been entirely removed.  The upstream code has been updated
to depend entirely on the the procname member.  To handle this
all references to ctl_name are wrapped in a CTL_NAME macro which
simply expands to nothing for newer kernels.  Older kernels are
supported by having it expand to .ctl_name = X just as before.
2010-06-30 12:49:12 -07:00
Brian Behlendorf fd921c2e0c Linux-2.6.33 compat, check <generated/utsrelease.h> for UTS_RELEASE
It seems the upstream community moved the definition of UTS_RELEASE
yet again as of linux-2.6.33.  Update the build system to check in
all three possible locations where your kernel version may be defined.

	$kernelbuild/include/linux/version.h
	$kernelbuild/include/linux/utsrelease.h
	$kernelbuild/include/generated/utsrelease.h
2010-06-30 12:48:18 -07:00
Brian Behlendorf 1e48754059 Add basic README
A simple README with a short summary of the project and a link
directing people to the online documentation.
2010-06-29 14:18:18 -07:00
Brian Behlendorf ede0bdffb6 Treat mutex->owner as volatile
When HAVE_MUTEX_OWNER is defined and we are directly accessing
mutex->owner treat is as volative with the ACCESS_ONCE() helper.
Without this you may get a stale cached value when accessing it
from different cpus.  This can result in incorrect behavior from
mutex_owned() and mutex_owner().  This is not a problem for the
!HAVE_MUTEX_OWNER case because in this case all the accesses
are covered by a spin lock which similarly gaurentees we will
not be accessing stale data.

Secondly, check CONFIG_SMP before allowing access to mutex->owner.
I see that for non-SMP setups the kernel does not track the owner
so we cannot rely on it.

Thirdly, check CONFIG_MUTEX_DEBUG when this is defined and the
HAVE_MUTEX_OWNER is defined surprisingly the mutex->owner will
not be cleared on mutex_exit().  When this is the case the SPL
needs to make sure to do it to ensure MUTEX_HELD() behaves as
expected or you will certainly assert in mutex_destroy().

Finally, improve the mutex regression tests.  For mutex_owned() we
now minimally check that it behaves correctly when checked from the
owner thread or the non-owner thread.  This subtle behaviour has bit
me before and I'd like to catch it early next time if it reappears.

As for mutex_owned() regression test additonally verify that
mutex->owner is always cleared on mutex_exit().
2010-06-28 16:02:57 -07:00
Brian Behlendorf 616df2dd8b Fix subtle race in threads test case
The call to wake_up() must be moved under the spin lock because
once we drop the lock 'tp' may no longer be valid because the
creating thread has exited.  This basic thread implementation
was correct, this was simply a flaw in the test case.
2010-06-28 12:34:20 -07:00
Brian Behlendorf 5be4767ae1 Accept but ignore TASKQ_DC_BATCH and TQ_FRONT
For the moment the SPL accepts the TASKQ_DC_BATCH and TQ_FRONT
flags however they get silently ignored.  This is harmless for
the moment but it does need to be implemented at some point.
2010-06-28 11:39:43 -07:00
Brian Behlendorf e6de04b73c Add kmem_vasprintf function
We might as well have both asprintf() variants.  This allows us
to safely pass a va_list through several levels of the stack
using va_copy() instead of va_start().
2010-06-24 09:41:59 -07:00
Brian Behlendorf 438683c0a9 Revert "Support TQ_FRONT flag used by taskq_dispatch()"
This reverts commit eb12b3782c.
2010-06-21 10:19:44 -07:00
Brian Behlendorf 3cb77549d1 Update warnings in kmem debug code
This fix was long overdue.  Most of the ground work was laid long
ago to include the exact function and line number in the error message
which there was an issue with a memory allocation call.  However,
probably due to lack of time at the moment that informatin never
made it in to the error message.  This patch fixes that and trys
to standardize the kmem debug messages as well.
2010-06-16 16:01:16 -07:00
Brian Behlendorf 8ffef449ef Add missing header util/sscanf.h 2010-06-14 14:20:31 -07:00
Brian Behlendorf def465ad4b Include kstat.h from kmem.h
It turns out Solaris incidentally includes kstat.h from kmem.h.  As
a side effect of this certain higher level .c files which should
explicitly include kstat.h don't because they happen to get it
via kmem.h.  To make like easier for everyone I do the same.
2010-06-14 14:18:48 -07:00
Brian Behlendorf eb12b3782c Support TQ_FRONT flag used by taskq_dispatch()
Allow taskq_dispatch() to insert work items at the head of the
queue instead of just the tail by passing the TQ_FRONT flag.
2010-06-11 15:57:25 -07:00
Brian Behlendorf 32c6147dee Minor cleanup and Solaris API additions.
Minor formatting cleanups.

API additions:
* {U}INT8_{MIN,MAX}, {U}INT16_{MIN,MAX} macros.
* id_t typedef
* ddi_get_lbolt(), ddi_get_lbolt64() functions.
2010-06-11 15:57:25 -07:00
Brian Behlendorf b868e22f05 Add kmem_asprintf(), strfree(), strdup(), and minor cleanup.
This patch adds three missing Solaris functions: kmem_asprintf(), strfree(),
and strdup().  They are all implemented as a thin layer which just calls
their Linux counterparts.  As part of this an autoconf check for kvasprintf
was added because it does not appear in older kernels.  If the kernel does
not provide it then spl-generic implements it.

Additionally the dead DEBUG_KMEM_UNIMPLEMENTED code was removed to clean
things up and make the kmem.h a little more readable.
2010-06-11 15:57:25 -07:00
Brian Behlendorf bb1bb2c4c4 Add xuio_* structures and typedefs.
Add the basic xuio structure and typedefs for Solaris style zero copy.
There's a decent chance this will not be the way I handle this on Linux
but providing the basic types simplifies things for now.
2010-06-11 15:57:25 -07:00
Brian Behlendorf 750a7101f8 Stub out additional missing headers 2010-06-11 15:57:25 -07:00
Brian Behlendorf ae4c36adce Cleanly split Linux proc.h (fs) from conflicting Solaris proc.h (process)
Under linux the proc.h header is for the /proc filesystem, and under
Solaris the proc/h header if for processes.  This patch correctly
moves the Linux proc functionality in a linux/proc_compat.h header
and leaves the sys/proc.h for use by Solaris.  Minor updates were
required to all the call sites where it was included of course.
2010-06-11 15:57:25 -07:00
Brian Behlendorf 71b1242e67 Update META to version 0.5.0 2010-06-11 15:57:25 -07:00
Alex Zhuravlev 1b4ad25e2f Stack overflow on 64-bit modulus operations on 32-bit architectures.
Running 'zpool create' on a 32-bit machine with an SPL compiled with
gcc 4.4.4 led to a stack overlow.  This turned out to be due to some
sort of 'optimization' by gcc:

uint64_t __umoddi3(uint64_t dividend, uint64_t divisor)
{
   return dividend - divisor * (dividend / divisor);
}

This code was supposed to be using __udivdi3 to implement /, but gcc
instead implemented it via __umoddi3 itself.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
2010-06-03 09:06:55 -07:00
Brian Behlendorf 8a1c9a02fb Minor 32-bit fix cast to hrtime_t before the mutliply.
It's important to cast to hrtime_t before doing the multiply because
the ts.tv_sec type is only 32-bits and we need to promote it to 64-bits.
2010-05-23 09:51:17 -07:00
Brian Behlendorf 49638d8388 Refresh autogen.sh products with automake 1.11.1. 2010-05-21 15:52:06 -07:00
Brian Behlendorf 3cca28a785 Re-Prep for 0.4.9 tag with a few more fixes and updated ChangeLog 2010-05-21 14:17:44 -07:00
Brian Behlendorf edbbb609bd Minor spec file cleanup for RHEL6 package dependency. 2010-05-21 11:53:49 -07:00
Brian Behlendorf 32f5faff69 Simplify rwlock implementation.
Remove RW_COUNT() from the rwlock implementation.  The idea was that it
could be used as a generic wrapper for getting at the internal state
of a rwlock.  While a good idea it's proven problematic to keep it
correct for multiple archs and internal implementation changes.  In
short it hasn't been worth the trouble.

With that and simplicity in mind things have been updated to use the
rwsem_is_locked() function instead of RW_COUNT for the RW_*_HELD()
functions.  As for rw_upgrade() it remains only implemented for
the generic rwsem implemenation.  It remains to be determined if its
worth the effort of adding a custom implementation for each arch.
2010-05-20 14:20:34 -07:00
Brian Behlendorf 23d91792ef Use KM_NODEBUG macro in preference to __GFP_NOWARN. 2010-05-20 14:16:59 -07:00
Brian Behlendorf 3626ae6a70 Disable spl_debug_panic_on_bug by default.
While I may prefer to have the system panic on an SBUG and to get
crash dump for analysis.  I suspect most peoples systems are not
configured from crash dump and the best thing to so is to simply
halt the thread and print an error to the console.  This way they
have a good chance of actually saving the stack trace and debug log.
2010-05-20 10:15:51 -07:00
Brian Behlendorf e0dcb22e4e Adjust 'large' object sizes in kmem:slab_large test.
64K objects are large for a kmem based slab (2M slabs)
1M objects are large for a vmem cased slab (32M slabs)
2010-05-20 09:52:37 -07:00
Brian Behlendorf 5198ea0e71 Remove kmem_set_warning() interface replace with __GFP_NOWARN flag.
Remove the kmem_set_warning() hack used by the kmem-splat regression
tests with a per-allocation flag called __GFP_NOWARN.  This matches
the lower level linux flag of similar by slightly different function.
The idea is you can then explicitly set this flag on requests where
you know your breaking the max 8k rule but you need/want to do it
anyway.

This is currently used by the regression tests where we intentionally
push things to the limit but don't want the log noise.  Additionally,
we are forced to use it in spl_kmem_cache_create() because by default
NR_CPUS is very large and theres no easy way to handle that.

Finally, I've added a stack_dump() call to the warning when it is
trigger to make to clear exactly where the allocation is taking place.
2010-05-19 16:53:13 -07:00
Brian Behlendorf 627a74972c Set default debug log patch to /tmp/spl-log.
Using /tmp/ is a preferable default, it can always be overriden
using the module option on a case-by-case basis.

Additionally standardize some log messages based on the same
default log level used by the kernel.
2010-05-19 16:17:06 -07:00
Brian Behlendorf 99879b257c Minor spec file cleanup for srpm case.
Ensure kdevpkg is defined is srpm case before using it to define
the devel_requires macro.  Interestingly this is not an issue for
rpm-4.7.1-4 but it is for rpm-4.4.2.3-18.
2010-05-18 09:18:20 -07:00
Brian Behlendorf de7cc34821 Prep for 0.4.9 tag, updated META and ChangeLog 2010-05-17 15:47:24 -07:00
Brian Behlendorf 716154c592 Public Release Prep
Updated AUTHORS, COPYING, DISCLAIMER, and INSTALL files.  Added
standardized headers to all source file to clearly indicate the
copyright, license, and to give credit where credit is due.
2010-05-17 15:18:00 -07:00
Brian Behlendorf 8e2140b770 Add 3 missing typedefs.
Add processorid_t, pc_t, index_t.
2010-05-14 09:42:53 -07:00
Brian Behlendorf a76df2dc0f Add console_*printf() functions.
Add support for the missing console_vprintf() and console_printf()
functions.
2010-05-14 09:40:52 -07:00
Brian Behlendorf 6020190e8f Use do_posix_clock_monotonic_gettime() as described by comment.
While this does incur slightly more overhead we should be using
do_posix_clock_monotonic_gettime() for gethrtime() as described
by the existing comment.
2010-05-14 09:31:22 -07:00
Brian Behlendorf f752b46eb3 Add cv_wait_interruptible() function.
This is a minor extension to the condition variable API to allow
for reasonable signal handling on Linux.  The cv_wait() function by
definition must wait unconditionally for cv_signal()/cv_broadcast()
before waking it.  This makes it impossible to woken by a signal
such as SIGTERM.  The cv_wait_interruptible() function was added
to handle this case.  It behaves identically to cv_wait() with the
exception that it waits interruptibly allowing a signal to wake it
up.  This means you do need to be careful and check issig() after
waking.
2010-05-14 09:24:51 -07:00