For the case where we have a zil to replay we need to ensure that
zv->zv_objset contains the current objset. Since the caller has
a hold on the object set it is safe to pass to zil_replay as part
of the zv. Call path zvol_create_minor()->zil_replay()->
zil_parse()->zil_replay_log_record()->zvol_replay_write().
During spa_load the spl->spa_deferred_bpobj maybe be opened and closed
multiple times. It's critical that when the object is closed the
bpo->bpo_object is set to zero to indicate the object is closed.
If it's not during spl_load_retry the spl->spa_deferred_bpobj can
be closes twice resulting in a NULL deref.
This appears to have been fixed upstream the same way.
This reverts commit 411dd65af1.
gcc version 4.1.2 does not like having differing prototypes
for zio_execute, one version in the .c with inline and one
version in the .h without. Thus I'm reverting this change
and we'll see how critical this particular stack reduction is.
This commit preserves the recursive function dbuf_hold_impl() but moves
the local variables and function arguments to the heap to minimize
the stack frame size. Enough space is initially allocated on the
stack for 20 levels of recursion. This technique was based on commit
34229a2f2a which reduced stack usage of
traverse_visitbp().
dbuf_hold_impl() is mutually recursive with dbuf_findbp(),
but the latter function is also called from other functions.
Therefore dbuf_findbp() must contain logic to determine how to call
dbuf_hold_impl(). To this end, dbuf_hold_impl() now takes a
struct dbuf_hold_impl_data pointer as an argument. If that argument
is NULL it calls dbuf_hold_impl() as before, otherwise it calls
__debuf_hold_impl() with a single dbuf_hold_impl_data pointer argument.
As the name implies, dbuf_hold_impl_data stores the arguments and local
variables for dbuf_hold_impl().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Github issue 22 reported a stack overrun when the zfs module was
loaded, possibly related to the presence of existing zpools created
under zfs-fuse. The stack trace showed 9 levels of recursion between
dsl_scan_visitbp() and dsl_scan_recurse(). To reduce stack overhead in
that code path, this commit moves the 128 byte blkptr_t data strucutre
in dsl_scan_visitbp() to the heap.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Eliminated local variables pointing to members of the zio struct.
Just refer to the struct members directly. This saved about 32 bytes per
call, but this function can be called recurisvely up to 19 levels deep,
so we potentially save up to 608 bytes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Deep recursive call chains are contributing to segfaults in ztest due to
heavy stack use. Inlining zio_execute() helps reduce the stack depth of
the zio_notify_parent() -> zio_execute() -> zio_wait() recursive cycle.
I am no longer seeing ztest segfaults in this code path with this change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Deep recursive call chains are contributing to segfaults in ztest due
to heavy stack use. Inlining dbuf_findbp() helps reduce the stack depth
of the dbuf_findbp() -> dbuf_hold_impl() cycle. However, segfaults are
still occurring in this code path, so further reductions are still needed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Deep recursive call chains are contributing to segfaults in ztest due
to heavy stack use. Inlining zio_notify_parent() helps reduce the
stack depth of the zio_notify_parent() -> zio_execute() -> zio_done()
recursive cycle. I am no longer seeing ztest segfaults in this code
path with this change combined with the zio_done() stack reduction in
the previous commit.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The spa_load function may call itself recursively through
the spa_load_impl function. This call path of spa_load->
spa_load_impl->spa_load->spa_load_impl takes 640 bytes of
stack. By forcing spa_load_impl to be inlined as part of
spa_load the can be reduced to 448 bytes, for a savings of
192 bytes,
The feature branch 'fix-taskq' in Linux's ZFS tree changes the taskq_dispatch()
flag from TQ_SLEEP to TQ_NOSLEEP to avoid sleeping in some circumstances.
However, this has the side effect that taskq_dispatch() now may fail, and since
the return code was not even being checked, it could lead to zio's not being
scheduled to execute.
I'm fixing this in a simplistic but not very elegant way, by just looping until
taskq_dispatch() succeeds.
Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The dmu_object_info_t structures which are roughly 60 bytes
have been moved from the stack to the heap. Every little
bit helps and there's absolutely no reason these temporary
working variables need to be on the stack.
During module load we could deadlock because the zvol_init()
callpath took the spa_namespace_lock before the zvol_state_lock.
The rest of the zvol code takes the locks in the opposite order.
In particular, I observed the following deadlock cause by the
lock inversion.
I've fixed the ording by creating an unlocked version of
zvol_create_minor and zvol_remove_minor. This allows me to
take the zvol_state_lock before the spa_namespace_lock in
zvol_cr_minors_common and simply call the unlocked version.
With the update to onnv_141 how minor devices were created and
removed for ZVOL was substantially changed. The updated system
is much more tightly integrated with Solaris's /dev/ filesystem.
This is great for Solaris but bad for Linux.
On the kernel side the ZFS_IOC_{CREATE,REMOVE}_MINOR ioctl
entry points have been re-added. They now call directly in
to the ZVOL to create the needed minor node and add the sysfs
entried for udev.
Also as part of this change I've decided it would really be
best if all the zvols were in a /dev/zvol directory like on
Solaris. Organizationally this makes sense and on the code
side it allows us to know a block device is a zvol simply by
where it is located in /dev/. Unless Solaris there still is
to ./dsk or ./rdsk as part of the path.
Extend the Makefiles with an uninstall target to cleanly
remove a package which was installed with 'make install'.
Additionally, ensure a 'depmod -a' is run as part of the
install to update the module dependency information.
The dmu_objset_pool() and dmu_objset_name() symbols are needed
by Lustre and should be exported.
Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The ZFS update to onnv_141 brought with it support for a
security label attribute called mlslabel. This feature
depends on zones to work correctly and thus I am disabling
it under Linux. Equivilant functionality could be added
at some point in the future.
We should just make a best effort when removing zvol minors
from the system during destroy. Failure here should never
prevent the pool from being destroyed. For example, if a
pool is so heavily damaged it cannot be opened (part of the
minor removal) we still want to be able to destroy it. The
worst case here is we may orphan a few minors but even that
is unlikely and not particularly harmful.
This was done because there are now lots of resource.fs.zfs.statechange
events being posted but they do not include the state. For the moment
the state must always be healthy but there's no harm in making this
explicit.
Previously I was adding the FM_EREPORT_TIME time when the nvlist
was constructed. However, with the update to onnv_141 these
ereport nvlists are now constructed in several places and it
doesn't make sense for each of them to have to add this common
bit of info. To handle this the FM_EREPORT_TIME is now only
added once when the event is posted.
Just to be clear this only indicates that the ZFS code was built
with or without debugging enabled. It says nothing about about
how the SPL was built, they can be build differently by design.
Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
This change updates the ZFS code to use the slightly reworked
SPL debug infrastructure. It also explicitly sets all ZFS
dprintf debugging to use the SS_USER1 subsystem for logging
in the SPL debug log.
This was caught under Debian Lenny builds because they are one of
the few/only current distros based on a 2.6.26 kernel. In one
of the build conditionals I accidently failed to assign the
return code to rc before returning.
The prototype for an add_range() function was added to the kernel
header include/linux/range.h which conflicts with the static
add_range() defined in zfs_fm.c. To resolve the conflict all
range functions in zfs_fm.c have been prefixed with zei which
is short for the zfs_ecksum_info struct since all of these
functions operate on that base structure.
Devices were only being created at module load time or when a
dataset was created. Similiar devices were not always being
removed at all the correct times. This patch updates all the
places where devices should either be created or removed. I'm
reasonably sure I got them all but if theres a case I missed
we can catch it with a follow up patch.
module load/unload
zfs create/remove
zpool import/export
zpool destroy
This patch also adds a simple regression test to zconfig.sh
to ensure zpool import/export is basically working properly.
This test specifically checks that devices are created
properly, removed after export, created after import, and
removed as a consequence of a zpool destroy.
With the recent ZVOL update zvol_set_volblocksize() accidentally
lost its mutex_exit(). This was noticed when zvol_create_minor()
blocked on the zvol_state_lock while it was holding the
spa_namespace_lock(). This caused everything to get blocked
up and hung the system.