Introduce four new vdev properties:
checksum_n
checksum_t
io_n
io_t
These properties can be used for configuring the thresholds of zed's
diagnosis engine and are interpeted as <N> events in T <seconds>.
When this property is set to a non-default value on a top-level vdev,
those thresholds will also apply to its leaf vdevs. This behavior can be
overridden by explicitly setting the property on the leaf vdev.
Note that, these properties do not persist across vdev replacement. For
this reason, it is advisable to set the property on the top-level vdev
instead of the leaf vdev.
The default values for zed's diagnosis engine (10 events, 600 seconds)
remains unchanged.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Sponsored-by: Seagate Technology LLC
Closes#13805
Test checksum error reporting to ZED via the call paths
vdev_raidz_io_done_unrecoverable() and zio_checksum_verify().
Sponsored-by: Seagate Technology LLC
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
Closes#14190
We drop /multiple/ seconds off the generation, a dozen off a clean
rebuild, 185 files, and trivialise the distribution,
which can now be trivially generated via the provided snippets
Dist diff:
-zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib
+zfs-2.1.99/tests/zfs-tests/tests/functional/pam/utilities.kshlib.in
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#13316
A simple change, but so many tests break with it,
and those are the majority of this.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#13078
- Replaces use of manual `zpool sync`
- Don't use `log_must sync_pool` as `sync_pool` uses it internally
- Replace many (but not all) uses of `sync` with `sync_pool`
This makes the tests more consistent, and makes searching easier.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Closes#12894
200ms time-out is relatively long, but if we already hit the cap,
then we'll likely be able to spawn multiple new jobs when we wake up
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Closes#11807
events_002 exercises the ZED, ensuring that it neither misses events,
nor reporting events twice.
On slow test hardware, some of the timeouts are insufficient to allow
the ZED to properly settle. Conversely, on fast hardware these same
timeouts are too long, unnecessarily slowing the test run.
Instead of using a fixed timeout, wait for the expected final event
before returning. Additionally, wait with a timeout for unexpected
events to avoid missing them if they show up late.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Antonio Russo <aerusso@aerusso.net>
Closes#11703
The events_001_pos.ksh test case can fail because it's possible,
and correct, for the config_sync event to be posted after the last
"expected" event. To accommodate this the run_and_verify() function
has been updated to wait for all non-history events, not just the
last event. This does not increase the run time of the test as
long as all the events do get generated.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#11147
ZED will log zevents summaries to the syslog, however the log entries
tend to drop event details that can be useful for diagnosis. This is
especially true for ereport events, like io, checksum, and delay.
Update the all-syslog.sh script to log additional event information.
Add an optional config option, ZED_SYSLOG_DISPLAY_GUIDS, to zed.rc
for choosing GUIDs over names for pool and vdev.
Change the default ZED_SYSLOG_SUBCLASS_EXCLUDE to exclude history_event
events. These events tend to be frequent, convey no meaningful info,
and are already logged in the zpool history.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@delphix.com>
Closes#10967
Currently the best way to wait for the completion of a long-running
operation in a pool, like a scrub or device removal, is to poll 'zpool
status' and parse its output, which is neither efficient nor convenient.
This change adds a 'wait' subcommand to the zpool command. When invoked,
'zpool wait' will block until a specified type of background activity
completes. Currently, this subcommand can wait for any of the following:
- Scrubs or resilvers to complete
- Devices to initialized
- Devices to be replaced
- Devices to be removed
- Checkpoints to be discarded
- Background freeing to complete
For example, a scrub that is in progress could be waited for by running
zpool wait -t scrub <pool>
This also adds a -w flag to the attach, checkpoint, initialize, replace,
remove, and scrub subcommands. When used, this flag makes the operations
kicked off by these subcommands synchronous instead of asynchronous.
This functionality is implemented using a new ioctl. The type of
activity to wait for is provided as input to the ioctl, and the ioctl
blocks until all activity of that type has completed. An ioctl was used
over other methods of kernel-userspace communiction primarily for the
sake of portability.
Porting Notes:
This is ported from Delphix OS change DLPX-44432. The following changes
were made while porting:
- Added ZoL-style ioctl input declaration.
- Reorganized error handling in zpool_initialize in libzfs to integrate
better with changes made for TRIM support.
- Fixed check for whether a checkpoint discard is in progress.
Previously it also waited if the pool had a checkpoint, instead of
just if a checkpoint was being discarded.
- Exposed zfs_initialize_chunk_size as a ZoL-style tunable.
- Updated more existing tests to make use of new 'zpool wait'
functionality, tests that don't exist in Delphix OS.
- Used existing ZoL tunable zfs_scan_suspend_progress, together with
zinject, in place of a new tunable zfs_scan_max_blks_per_txg.
- Added support for a non-integral interval argument to zpool wait.
Future work:
ZoL has support for trimming devices, which Delphix OS does not. In the
future, 'zpool wait' could be extended to add the ability to wait for
trim operations to complete.
Reviewed-by: Matt Ahrens <matt@delphix.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: John Gallagher <john.gallagher@delphix.com>
Closes#9162
Removing hardcoded paths in events.cfg
Reviewed-by: Giuseppe Di Natale <guss80@gmail.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: bunder2015 <omfgbunder@gmail.com>
Closes#7805
Fedora 28's RPM build checks warn when executable files don't have a
shebang line. These warnings are caused when we (incorrectly)
include data & config files in the_SCRIPTS automake lines. Files in
_SCRIPTS are marked executable by automake. This patch fixes the
issue by including non-executable scripts in a _DATA line instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#7359Closes#7395
Some usage patterns like send/recv of replication streams can
produce a large number of events. In such a case, the current
all-syslog.sh zedlet will hold up to its name, and flood the
logs with mostly redundant information. Two mitigate this
situation, this changeset introduces to new variables
ZED_SYSLOG_SUBCLASS_INCLUDE and ZED_SYSLOG_SUBCLASS_EXCLUDE
to zed.rc that give more control over which event classes end
up in the syslog.
Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Daniel Kobras <d.kobras@science-computing.de>
Closes#6886Closes#7260
* Add a zed script to kick off a scrub after a resilver. The script is
disabled by default.
* Add a optional $PATH (-P) option to zed to allow it to use a custom
$PATH for its zedlets. This is needed when you're running zed under
the ZTS in a local workspace.
* Update test scripts to not copy in all-debug.sh and all-syslog.sh by
default. They can be optionally copied in as part of zed_setup().
These scripts slow down zed considerably under heavy events loads and
can cause events to be dropped or their delivery delayed. This was
causing some sporadic failures in the 'fault' tests.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Closes#4662Closes#7086
This change adds a make target 'testscheck' which verifies all ksh test
scripts, part of the ZFS Test Suite, have their executable bit set:
this should avoid adding new test scripts that cannot be executed by
the test-runner.py which would result in the following warning message:
Warning: Test removed from TestGroup because it failed verification.
This also verifies both *.cfg and *.kshlib files used in the ZFS Test
Suite are not executable.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: loli10K <ezomori.nozomu@gmail.com>
Closes#7131
Currently, scrubs and resilvers can take an extremely
long time to complete. This is largely due to the fact
that zfs scans process pools in logical order, as
determined by each block's bookmark. This makes sense
from a simplicity perspective, but blocks in zfs are
often scattered randomly across disks, particularly
due to zfs's copy-on-write mechanisms.
This patch improves performance by splitting scrubs
and resilvers into a metadata scanning phase and an IO
issuing phase. The metadata scan reads through the
structure of the pool and gathers an in-memory queue
of I/Os, sorted by size and offset on disk. The issuing
phase will then issue the scrub I/Os as sequentially as
possible, greatly improving performance.
This patch also updates and cleans up some of the scan
code which has not been updated in several years.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Authored-by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Authored-by: Alek Pinchuk <apinchuk@datto.com>
Authored-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes#3625Closes#6256
History commands and events were being suppressed for the
'zpool create' command since the history object did not
yet exist. Create the object earlier so this history
doesn't get lost.
Split the pool_destroy event in to pool_destroy and
pool_export so they may be distinguished.
Updated events_001_pos and events_002_pos test cases. They
now check for the expected history events and were reworked
to be more reliable.
Reviewed-by: Nathaniel Clark <nathaniel.l.clark@intel.com>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6712Closes#6486
Fix spurious events_002_pos failures by waiting longer before
grabbing the log to check for the resilver_finish event. It
would be better to rework this logic to wait only as long as
needed rather than a fixed timeout.
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6651
In zed event test cases, a brief delay was introduced
to allow for events to make it to the zed log. On at least
one buildbot builder, the 1 second delay is not long enough.
Therefore, increasing the delay should ensure the zed has
more than enough time to write to its log.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Closes#6395
* events_001_pos - Verify the expected events are generated when
invoking the various zpool sub-commands. These events must
appear in `zpool event` and be consumed by the ZED.
* events_002_pos - Verify the ZED consumes events which were
generated while it wasn't running when it is started.
Additionally, verify that events are only processed once.
As part of this change the default.cfg used by the test suite
was changed to a default.cfg.in file. This was needed so the
install location of all zed scripts, not only the enabled ones,
could be reliably determined.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes#6128