The previous change added a check to fall back to waiting forever if the
ZIL failed. This check was inverted; it actually caused it to always
enter a timed wait when it failed. When combined with the fact that the
last lwb issued likely would have failed quickly and so had a very small
latency, this caused effectively an infinite loop.
I initially fixed the check, but on further study I decided that this
loop doesn't need to exist. The way the whole logic falls out of the
original code in 2.1.5 is that if the lwb is OPENED, wait then issue it,
and if not (or post issue), wait forever. The loop will never see more
than two iterations, one for each half of the OPENED check, and it will
stop as soon as the waiter is signaled (zcw_done true), so it can be far
more simply expressed as a linear sequence:
if (!issued) {
wait a few
if (done)
return
issue IO
}
if (!done)
wait forever
This still holds when the ZIL fails, because zil_commit_waiter_timeout()
will check for failure under zl_issuer_lock, which zil_fail() will wait
for, and in turn, zil_fail() will wait on zcw_lock and then signal the
waiter before it releases zl_issuer_lock. Taken together, that means
that zil_commit_waiter_timeout() will do all it can under the
circumstances, and waiting forever the waiter to complete is all we can
past that point.
(cherry picked from commit c57c2ddd6f803f429da1e2b53abab277d781a5a3)