The upstream commit cb code had a few bugs:
1) The arguments of the list_move_tail() call in txg_dispatch_callbacks()
were reversed by mistake. This caused the commit callbacks to not be
called at all.
2) ztest had a bug in ztest_dmu_commit_callbacks() where "error" was not
initialized correctly. This seems to have caused the test to always take
the simulated error code path, which made ztest unable to detect whether
commit cbs were being called for transactions that successfuly complete.
3) ztest had another bug in ztest_dmu_commit_callbacks() where the commit
cb threshold was not being compared correctly.
4) The commit cb taskq was using 'max_ncpus * 2' as the maxalloc argument
of taskq_create(), which could have caused unnecessary delays in the txg
sync thread.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
The mzap_update() function allocates enough memory for a full
dbuf which can be 128K in size. Ideally, this memory should
be allocated from our slab but in the short term it's simplest
just to vmem_alloc() the memory instead.
Closes#48
As part of commit f162433deb the /zvol/
path component was added for zvol devices. This ensured all zvol
devices would be created by udev in /dev/zvol/<pool>/<dataset>, as
opposed to the previous /dev/<pool>/<dataset> path. Logically, it
was nice to organize them in a directory much like Solaris does.
However, while initial testing showed this to work fine with modern
kernels it does not appear to be supported under RHEL5. The extra
path component triggers a NULL deref in create_dir(). Anyway, to
avoid having different zvol path names based on your kernel version
its more consistent simply to revert to the original naming convention.
If you really want the zvol component you can always add custom
udev rules to do exactly this.
We can revisiting this change again once we are willing to drop
support for RHEL5 and similar older distros.
I've noticed the TopGit linux-zfs-branch has some linux-kernel-mem
changes which were incorrectly merged. To fix the issue I'm
reverting the changes in the linux-kernel-mem topic branch, then
remerging the revert, and finally reapplying and merging the
change correctly.
Interestingly this looks like an upstream bug as well. If for some
reason we are unable to get a zvols statistics, because perhaps the
zpool is hopelessly corrupt, we would trigger the VERIFY. This
commit adds the proper error handling just to propagate the error
back to user space. Now the user space tools still must handle this
properly but in the worst case the tool will crash or perhaps have
some missing output. That's far far better than crashing the host.
Closes#45
Partitions for a zvol device were not appearing in /dev/zvol/<pool>/
at module load time for a couple of reasons.
1) The Linux block layer expects a block device to have a non-zero
capacity during add_disk(). If the capacity is zero it does not
attempt to open the device which means we never trigger a partition
scan. The device capacity was just being set during the first open.
2) Because we expect to be adding a block device to the zvol_state_list
during zvol_create_minor() the zvol_state_lock() is held. This
can result in a deadlock in add_disk() when it attempts to open
the block device via zvol_open() which also takes this same lock.
To avoid this issue special handling has been added to zvol_open()
and zvol_release() to allow the mutex owner to enter these functions
without retaking the lock.
3) In __zvol_create_minor() the call to dmu_objset_disown() must occur
before the call to add_disk(). As mentioned above add_disk() results
in a call to zvol_open() which will attempt to call dmu_objset_own()
again on the objset. If the objset is already open it will fail
resulting in a failed open. This in turn means the kernel will be
unable to read the partition information from the device.
For the case where we have a zil to replay we need to ensure that
zv->zv_objset contains the current objset. Since the caller has
a hold on the object set it is safe to pass to zil_replay as part
of the zv. Call path zvol_create_minor()->zil_replay()->
zil_parse()->zil_replay_log_record()->zvol_replay_write().
During spa_load the spl->spa_deferred_bpobj maybe be opened and closed
multiple times. It's critical that when the object is closed the
bpo->bpo_object is set to zero to indicate the object is closed.
If it's not during spl_load_retry the spl->spa_deferred_bpobj can
be closes twice resulting in a NULL deref.
This appears to have been fixed upstream the same way.
This reverts commit 411dd65af1.
gcc version 4.1.2 does not like having differing prototypes
for zio_execute, one version in the .c with inline and one
version in the .h without. Thus I'm reverting this change
and we'll see how critical this particular stack reduction is.