As part of commit f162433deb the /zvol/
path component was added for zvol devices. This ensured all zvol
devices would be created by udev in /dev/zvol/<pool>/<dataset>, as
opposed to the previous /dev/<pool>/<dataset> path. Logically, it
was nice to organize them in a directory much like Solaris does.
However, while initial testing showed this to work fine with modern
kernels it does not appear to be supported under RHEL5. The extra
path component triggers a NULL deref in create_dir(). Anyway, to
avoid having different zvol path names based on your kernel version
its more consistent simply to revert to the original naming convention.
If you really want the zvol component you can always add custom
udev rules to do exactly this.
We can revisiting this change again once we are willing to drop
support for RHEL5 and similar older distros.
Interestingly this looks like an upstream bug as well. If for some
reason we are unable to get a zvols statistics, because perhaps the
zpool is hopelessly corrupt, we would trigger the VERIFY. This
commit adds the proper error handling just to propagate the error
back to user space. Now the user space tools still must handle this
properly but in the worst case the tool will crash or perhaps have
some missing output. That's far far better than crashing the host.
Closes#45
Partitions for a zvol device were not appearing in /dev/zvol/<pool>/
at module load time for a couple of reasons.
1) The Linux block layer expects a block device to have a non-zero
capacity during add_disk(). If the capacity is zero it does not
attempt to open the device which means we never trigger a partition
scan. The device capacity was just being set during the first open.
2) Because we expect to be adding a block device to the zvol_state_list
during zvol_create_minor() the zvol_state_lock() is held. This
can result in a deadlock in add_disk() when it attempts to open
the block device via zvol_open() which also takes this same lock.
To avoid this issue special handling has been added to zvol_open()
and zvol_release() to allow the mutex owner to enter these functions
without retaking the lock.
3) In __zvol_create_minor() the call to dmu_objset_disown() must occur
before the call to add_disk(). As mentioned above add_disk() results
in a call to zvol_open() which will attempt to call dmu_objset_own()
again on the objset. If the objset is already open it will fail
resulting in a failed open. This in turn means the kernel will be
unable to read the partition information from the device.
For the case where we have a zil to replay we need to ensure that
zv->zv_objset contains the current objset. Since the caller has
a hold on the object set it is safe to pass to zil_replay as part
of the zv. Call path zvol_create_minor()->zil_replay()->
zil_parse()->zil_replay_log_record()->zvol_replay_write().
During spa_load the spl->spa_deferred_bpobj maybe be opened and closed
multiple times. It's critical that when the object is closed the
bpo->bpo_object is set to zero to indicate the object is closed.
If it's not during spl_load_retry the spl->spa_deferred_bpobj can
be closes twice resulting in a NULL deref.
This appears to have been fixed upstream the same way.
This reverts commit 411dd65af1.
gcc version 4.1.2 does not like having differing prototypes
for zio_execute, one version in the .c with inline and one
version in the .h without. Thus I'm reverting this change
and we'll see how critical this particular stack reduction is.
This commit preserves the recursive function dbuf_hold_impl() but moves
the local variables and function arguments to the heap to minimize
the stack frame size. Enough space is initially allocated on the
stack for 20 levels of recursion. This technique was based on commit
34229a2f2a which reduced stack usage of
traverse_visitbp().
dbuf_hold_impl() is mutually recursive with dbuf_findbp(),
but the latter function is also called from other functions.
Therefore dbuf_findbp() must contain logic to determine how to call
dbuf_hold_impl(). To this end, dbuf_hold_impl() now takes a
struct dbuf_hold_impl_data pointer as an argument. If that argument
is NULL it calls dbuf_hold_impl() as before, otherwise it calls
__debuf_hold_impl() with a single dbuf_hold_impl_data pointer argument.
As the name implies, dbuf_hold_impl_data stores the arguments and local
variables for dbuf_hold_impl().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Github issue 22 reported a stack overrun when the zfs module was
loaded, possibly related to the presence of existing zpools created
under zfs-fuse. The stack trace showed 9 levels of recursion between
dsl_scan_visitbp() and dsl_scan_recurse(). To reduce stack overhead in
that code path, this commit moves the 128 byte blkptr_t data strucutre
in dsl_scan_visitbp() to the heap.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Eliminated local variables pointing to members of the zio struct.
Just refer to the struct members directly. This saved about 32 bytes per
call, but this function can be called recurisvely up to 19 levels deep,
so we potentially save up to 608 bytes.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Deep recursive call chains are contributing to segfaults in ztest due to
heavy stack use. Inlining zio_execute() helps reduce the stack depth of
the zio_notify_parent() -> zio_execute() -> zio_wait() recursive cycle.
I am no longer seeing ztest segfaults in this code path with this change.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Deep recursive call chains are contributing to segfaults in ztest due
to heavy stack use. Inlining dbuf_findbp() helps reduce the stack depth
of the dbuf_findbp() -> dbuf_hold_impl() cycle. However, segfaults are
still occurring in this code path, so further reductions are still needed.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Deep recursive call chains are contributing to segfaults in ztest due
to heavy stack use. Inlining zio_notify_parent() helps reduce the
stack depth of the zio_notify_parent() -> zio_execute() -> zio_done()
recursive cycle. I am no longer seeing ztest segfaults in this code
path with this change combined with the zio_done() stack reduction in
the previous commit.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The spa_load function may call itself recursively through
the spa_load_impl function. This call path of spa_load->
spa_load_impl->spa_load->spa_load_impl takes 640 bytes of
stack. By forcing spa_load_impl to be inlined as part of
spa_load the can be reduced to 448 bytes, for a savings of
192 bytes,
The feature branch 'fix-taskq' in Linux's ZFS tree changes the taskq_dispatch()
flag from TQ_SLEEP to TQ_NOSLEEP to avoid sleeping in some circumstances.
However, this has the side effect that taskq_dispatch() now may fail, and since
the return code was not even being checked, it could lead to zio's not being
scheduled to execute.
I'm fixing this in a simplistic but not very elegant way, by just looping until
taskq_dispatch() succeeds.
Signed-off-by: Ricardo M. Correia <ricardo.correia@oracle.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The dmu_object_info_t structures which are roughly 60 bytes
have been moved from the stack to the heap. Every little
bit helps and there's absolutely no reason these temporary
working variables need to be on the stack.