Fix zrele race in zrele_async that can cause hang
There is a race condition in zfs_zrele_async when we are checking if we would be the one to evict an inode. This can lead to a txg sync deadlock. Instead of calling into iput directly, we attempt to perform the atomic decrement ourselves, unless that would set the i_count value to zero. In that case, we dispatch a call to iput to run later, to prevent a deadlock from occurring. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Matthew Ahrens <mahrens@delphix.com> Signed-off-by: Paul Dagnelie <pcd@delphix.com> Closes #11527 Closes #11530
This commit is contained in:
parent
5bc4c39d70
commit
43eaef6de8
|
@ -87,15 +87,18 @@
|
||||||
* must be checked with ZFS_VERIFY_ZP(zp). Both of these macros
|
* must be checked with ZFS_VERIFY_ZP(zp). Both of these macros
|
||||||
* can return EIO from the calling function.
|
* can return EIO from the calling function.
|
||||||
*
|
*
|
||||||
* (2) zrele() should always be the last thing except for zil_commit()
|
* (2) zrele() should always be the last thing except for zil_commit() (if
|
||||||
* (if necessary) and ZFS_EXIT(). This is for 3 reasons:
|
* necessary) and ZFS_EXIT(). This is for 3 reasons: First, if it's the
|
||||||
* First, if it's the last reference, the vnode/znode
|
* last reference, the vnode/znode can be freed, so the zp may point to
|
||||||
* can be freed, so the zp may point to freed memory. Second, the last
|
* freed memory. Second, the last reference will call zfs_zinactive(),
|
||||||
* reference will call zfs_zinactive(), which may induce a lot of work --
|
* which may induce a lot of work -- pushing cached pages (which acquires
|
||||||
* pushing cached pages (which acquires range locks) and syncing out
|
* range locks) and syncing out cached atime changes. Third,
|
||||||
* cached atime changes. Third, zfs_zinactive() may require a new tx,
|
* zfs_zinactive() may require a new tx, which could deadlock the system
|
||||||
* which could deadlock the system if you were already holding one.
|
* if you were already holding one. This deadlock occurs because the tx
|
||||||
* If you must call zrele() within a tx then use zfs_zrele_async().
|
* currently being operated on prevents a txg from syncing, which
|
||||||
|
* prevents the new tx from progressing, resulting in a deadlock. If you
|
||||||
|
* must call zrele() within a tx, use zfs_zrele_async(). Note that iput()
|
||||||
|
* is a synonym for zrele().
|
||||||
*
|
*
|
||||||
* (3) All range locks must be grabbed before calling dmu_tx_assign(),
|
* (3) All range locks must be grabbed before calling dmu_tx_assign(),
|
||||||
* as they can span dmu_tx_assign() calls.
|
* as they can span dmu_tx_assign() calls.
|
||||||
|
@ -476,11 +479,18 @@ zfs_zrele_async(znode_t *zp)
|
||||||
ASSERT(atomic_read(&ip->i_count) > 0);
|
ASSERT(atomic_read(&ip->i_count) > 0);
|
||||||
ASSERT(os != NULL);
|
ASSERT(os != NULL);
|
||||||
|
|
||||||
if (atomic_read(&ip->i_count) == 1)
|
/*
|
||||||
|
* If decrementing the count would put us at 0, we can't do it inline
|
||||||
|
* here, because that would be synchronous. Instead, dispatch an iput
|
||||||
|
* to run later.
|
||||||
|
*
|
||||||
|
* For more information on the dangers of a synchronous iput, see the
|
||||||
|
* header comment of this file.
|
||||||
|
*/
|
||||||
|
if (!atomic_add_unless(&ip->i_count, -1, 1)) {
|
||||||
VERIFY(taskq_dispatch(dsl_pool_zrele_taskq(dmu_objset_pool(os)),
|
VERIFY(taskq_dispatch(dsl_pool_zrele_taskq(dmu_objset_pool(os)),
|
||||||
(task_func_t *)iput, ip, TQ_SLEEP) != TASKQID_INVALID);
|
(task_func_t *)iput, ip, TQ_SLEEP) != TASKQID_INVALID);
|
||||||
else
|
}
|
||||||
zrele(zp);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
/* ARGSUSED */
|
/* ARGSUSED */
|
||||||
|
|
Loading…
Reference in New Issue