After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails.
An end-of-txg async task is charged with actually faulting the vdev.
In a single-disk pool, the probe failure will degrade the last disk, and
then suspend the pool. However, vdev_fault_wanted is not cleared. After
the pool returns, the transaction finishes and the async task runs and
faults the vdev, which suspends the pool again.
The fix is simple: when reopening a vdev, clear the async fault flag. If
the vdev is still failed, the startup probe will quickly notice and
degrade/suspend it again. If not, all is well!
Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Don Brady <don.brady@klarasystems.com>