making sure the last quiesced txg is synced

Fixed a potential bug as described in #8233:

Consider this scenario (see [txg.c](06f3fc2a4b/module/zfs/txg.c) ):
There is heavy write load when the pool exports.
After `txg_sync_stop`'s call of `txg_wait_synced` returns, many more txgs get processed, but right before` txg_sync_stop` gets `tx_sync_lock`, the following happens:

- `txg_sync_thread` begins waiting on `tx_sync_more_cv`.
- `txg_quiesce_thread` gets done with `txg_quiesce(dp, txg)`.
- `txg_sync_stop` gets `tx_sync_lock` first, calls `cv_broadcast`s with `tx_exiting` == 1, and waits for exits.
- `txg_sync_thread` wakes up first and exits.
- Finally, `txg_quiesce_thread` gets `tx_sync_lock`, and calls `cv_broadcast(&tx->tx_sync_more_cv)`, 
but `txg_sync_thread` is already gone, and the txg in `txg_quiesce(dp, txg)` above never gets synced.

Signed-off-by: Leap Second <leapsecond@protonmail.com>
This commit is contained in:
seekfirstleapsecond 2019-01-04 18:17:42 -08:00 committed by GitHub
parent 0b8e4418b6
commit 1e05119d5b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 13 additions and 2 deletions

View File

@ -508,6 +508,7 @@ txg_sync_thread(void *arg)
tx_state_t *tx = &dp->dp_tx;
callb_cpr_t cpr;
clock_t start, delta;
boolean_t checked_quiescing = B_FALSE;
(void) spl_fstrans_mark();
txg_thread_enter(tx, &cpr);
@ -549,8 +550,18 @@ txg_sync_thread(void *arg)
txg_thread_wait(tx, &cpr, &tx->tx_quiesce_done_cv, 0);
}
if (tx->tx_exiting)
txg_thread_exit(tx, &cpr, &tx->tx_sync_thread);
if (tx->tx_exiting) {
if (checked_quiescing)
txg_thread_exit(tx, &cpr, &tx->tx_sync_thread);
else {
while (tx->tx_threads != 1)
txg_thread_wait(tx, &cpr, &tx->tx_exit_cv, 0);
if (tx->tx_quiesced_txg)
checked_quiescing = B_TRUE;
else
txg_thread_exit(tx, &cpr, &tx->tx_sync_thread);
}
}
/*
* Consume the quiesced txg which has been handed off to