The upstream commit cb code had a few bugs:
1) The arguments of the list_move_tail() call in txg_dispatch_callbacks()
were reversed by mistake. This caused the commit callbacks to not be
called at all.
2) ztest had a bug in ztest_dmu_commit_callbacks() where "error" was not
initialized correctly. This seems to have caused the test to always take
the simulated error code path, which made ztest unable to detect whether
commit cbs were being called for transactions that successfuly complete.
3) ztest had another bug in ztest_dmu_commit_callbacks() where the commit
cb threshold was not being compared correctly.
4) The commit cb taskq was using 'max_ncpus * 2' as the maxalloc argument
of taskq_create(), which could have caused unnecessary delays in the txg
sync thread.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Add two additional basic sanity tests to confirm zvol snapshots
and clones work. The snapshot test is basically the same as the
example provided in the wiki. The clone test goes one step father
and clones the snapshot then modifies it to match the original
modified volume. It them compares them to ensure everything was
modified as expected.
These are just meant to be sanity tests to catch obvious breakage
before tagging a release. They are still not a substitute for a
full regression test suite.
It appears that in earlier kernels the maximum name length of a
kobject was KOBJ_NAME_LEN (20) bytes. This was later extended to
dynamically allocate enough memory if it was over KOBJ_NAME_LEN,
and finally it was always made dynamic. Unfortunately, util this
last step happened it doesn't look like it always safe to use
names larger than KOBJ_NAME_LEN. For example, under the RHEL5
2.6.18 kernel if the kobject name length exceeds KOBJ_NAME_LEN
a NULL dereference is tripped.
To avoid this issue the build system has been update to check
to see if KOBJ_NAME_LEN is defined. If it is we have to assume
the maximum kobject name length is only 20 bytes. This 20 byte
name must minimally include the following components.
<zpool>/<dataset>[@snapshot[partition]]
While the zfs utilities do block until the expected device appears
they can only do this for full devices, not partitions. This means
that once as device appears it still may take a little bit of time
before the kernel rescans the partition table, updates sysfs, udev
is notified and the partition devices are created. The test case
itself could block briefly waiting for the partition beause it knows
what to expect. But for now the simpler thing to do is just delay.
See previous commit for details. But the gist is with the removal of
the zvol path component the regression tests must be updated to use
the correct path name.
As part of commit f162433deb the /zvol/
path component was added for zvol devices. This ensured all zvol
devices would be created by udev in /dev/zvol/<pool>/<dataset>, as
opposed to the previous /dev/<pool>/<dataset> path. Logically, it
was nice to organize them in a directory much like Solaris does.
However, while initial testing showed this to work fine with modern
kernels it does not appear to be supported under RHEL5. The extra
path component triggers a NULL deref in create_dir(). Anyway, to
avoid having different zvol path names based on your kernel version
its more consistent simply to revert to the original naming convention.
If you really want the zvol component you can always add custom
udev rules to do exactly this.
We can revisiting this change again once we are willing to drop
support for RHEL5 and similar older distros.
Interestingly this looks like an upstream bug as well. If for some
reason we are unable to get a zvols statistics, because perhaps the
zpool is hopelessly corrupt, we would trigger the VERIFY. This
commit adds the proper error handling just to propagate the error
back to user space. Now the user space tools still must handle this
properly but in the worst case the tool will crash or perhaps have
some missing output. That's far far better than crashing the host.
Closes#45
Several folks have now remarked that when the regression tests
fail they leave a mess behind. This was done intentionally at
the time to facilitate debugging the wreckage.
However, this also means that you may need to do some manual
cleanup such as removing the loopback devices before re-running
the tests. To simplify this proceedure I've added the '-c'
option to zconfig.sh which will attempt to cleanup the mess
from a previous test before starting.
This is somewhat dangerous because it must guess as to which
loopback devices you were using. But this risk is fairly minimal
because devices which are currently still is use can not be
cleaned up. And because only devices with 'zpool' in the name
are considered for removal. That said if your running parallel
copies of say zconfig.sh this may cause you some trouble.
Update the zconfig.sh test script to verify not only that volumes,
snapshots, and clones are created and removed properly. But also
verify that the partition information for each of these types of
devices is properly enumerated by the kernel.
Tests 4 and 5 now also create two partitions on the original volume
and these partitions are expected to also exist on the snapshot and
the clone. Correctness is verified after import/export, module
load/unload, dataset creation, and pool destruction.
Additionally, the code to create a partition table was refactored
in to a small helper function to simplify the test cases. And
finally all of the function variables were flagged 'local' to ensure
their scope is limited. This should have been done a while ago.
Partitions for a zvol device were not appearing in /dev/zvol/<pool>/
at module load time for a couple of reasons.
1) The Linux block layer expects a block device to have a non-zero
capacity during add_disk(). If the capacity is zero it does not
attempt to open the device which means we never trigger a partition
scan. The device capacity was just being set during the first open.
2) Because we expect to be adding a block device to the zvol_state_list
during zvol_create_minor() the zvol_state_lock() is held. This
can result in a deadlock in add_disk() when it attempts to open
the block device via zvol_open() which also takes this same lock.
To avoid this issue special handling has been added to zvol_open()
and zvol_release() to allow the mutex owner to enter these functions
without retaking the lock.
3) In __zvol_create_minor() the call to dmu_objset_disown() must occur
before the call to add_disk(). As mentioned above add_disk() results
in a call to zvol_open() which will attempt to call dmu_objset_own()
again on the objset. If the objset is already open it will fail
resulting in a failed open. This in turn means the kernel will be
unable to read the partition information from the device.
For the case where we have a zil to replay we need to ensure that
zv->zv_objset contains the current objset. Since the caller has
a hold on the object set it is safe to pass to zil_replay as part
of the zv. Call path zvol_create_minor()->zil_replay()->
zil_parse()->zil_replay_log_record()->zvol_replay_write().
During spa_load the spl->spa_deferred_bpobj maybe be opened and closed
multiple times. It's critical that when the object is closed the
bpo->bpo_object is set to zero to indicate the object is closed.
If it's not during spl_load_retry the spl->spa_deferred_bpobj can
be closes twice resulting in a NULL deref.
This appears to have been fixed upstream the same way.