zfs/module/zfs
Serapheim Dimitropoulos 37d5a3e04b Stop ganging due to past vdev write errors
= Problem

While examining a customer's system we noticed unreasonable space
usage from a few snapshots due to gang blocks. Under some further
analysis we discovered that the pool would create gang blocks because
all its disks had non-zero write error counts and they'd be skipped
for normal metaslab allocations due to the following if-clause in
`metaslab_alloc_dva()`:
```
	/*
	 * Avoid writing single-copy data to a failing,
	 * non-redundant vdev, unless we've already tried all
	 * other vdevs.
	 */
	if ((vd->vdev_stat.vs_write_errors > 0 ||
	    vd->vdev_state < VDEV_STATE_HEALTHY) &&
	    d == 0 && !try_hard && vd->vdev_children == 0) {
		metaslab_trace_add(zal, mg, NULL, psize, d,
		    TRACE_VDEV_ERROR, allocator);
		goto next;
	}
```

= Proposed Solution

Get rid of the predicate in the if-clause that checks the past
write errors of the selected vdev. We still try to allocate from
HEALTHY vdevs anyway by checking vdev_state so the past write
errors doesn't seem to help us (quite the opposite - it can cause
issues in long-lived pools like the one from our customer).

= Testing

I first created a pool with 3 vdevs:
```
$ zpool list -v volpool
NAME        SIZE  ALLOC   FREE
volpool    22.5G   117M  22.4G
  xvdb     7.99G  40.2M  7.46G
  xvdc     7.99G  39.1M  7.46G
  xvdd     7.99G  37.8M  7.46G
```

And used `zinject` like so with each one of them:
```
$ sudo zinject -d xvdb -e io -T write -f 0.1 volpool
```

And got the vdevs to the following state:
```
$ zpool status volpool
  pool: volpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.
...<cropped>..
action: Determine if the device needs to be replaced, and clear the
...<cropped>..
config:

	NAME        STATE     READ WRITE CKSUM
	volpool     ONLINE       0     0     0
	  xvdb      ONLINE       0     1     0
	  xvdc      ONLINE       0     1     0
	  xvdd      ONLINE       0     4     0

```

I also double-checked their write error counters with sdb:
```
sdb> spa volpool | vdev | member vdev_stat.vs_write_errors
(uint64_t)0  # <---- this is the root vdev
(uint64_t)2
(uint64_t)1
(uint64_t)1
```

Then I checked that I the problem was reproduced in my VM as I the
gang count was growing in zdb as I was writting more data:
```
$ sudo zdb volpool | grep gang
        ganged count:              1384

$ sudo zdb volpool | grep gang
        ganged count:              1393

$ sudo zdb volpool | grep gang
        ganged count:              1402

$ sudo zdb volpool | grep gang
        ganged count:              1414
```

Then I updated my bits with this patch and the gang count stayed the
same.

Reviewed-by: Mark Maybee <mark.maybee@delphix.com>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
Closes #14003
2022-11-01 12:36:25 -07:00
..
Makefile.in Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
THIRDPARTYLICENSE.cityhash OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
THIRDPARTYLICENSE.cityhash.descrip OpenZFS 8484 - Implement aggregate sum and use for arc counters 2018-06-06 09:35:59 -07:00
abd.c Avoid small buffer copying on write 2022-07-26 10:10:37 -07:00
aggsum.c More aggsum optimizations 2021-06-09 13:05:34 -07:00
arc.c Add Module Parameter Regarding Log Size Limit 2022-09-21 16:12:14 -07:00
blkptr.c Add zstd support to zfs 2020-08-20 10:30:06 -07:00
bplist.c Fast Clone Deletion 2019-07-26 10:54:14 -07:00
bpobj.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
bptree.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
bqueue.c zfs recv hangs if max recordsize is less than received recordsize 2022-09-19 09:39:07 -07:00
btree.c Add zfs_btree_verify_intensity kernel module parameter 2022-09-21 13:15:51 -07:00
dataset_kstats.c Introduce write-mostly sums 2021-06-09 13:05:34 -07:00
dbuf.c Revert "Reduce dbuf_find() lock contention" 2022-09-21 13:15:51 -07:00
dbuf_stats.c Revert "Reduce dbuf_find() lock contention" 2022-09-21 13:15:51 -07:00
ddt.c Tinker with slop space accounting with dedup 2021-09-14 12:38:05 -07:00
ddt_zap.c Refactor dnode dirty context from dbuf_dirty 2020-02-26 16:09:17 -08:00
dmu.c Bring per_txg_dirty_frees_percent back to 30 2022-11-01 12:32:40 -07:00
dmu_diff.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
dmu_object.c Introduce CPU_SEQID_UNSTABLE 2020-11-02 11:51:12 -08:00
dmu_objset.c Add options to zfs redundant_metadata property 2022-11-01 12:25:58 -07:00
dmu_recv.c Receive checks should allow unencrypted child datasets 2022-02-16 17:58:55 -08:00
dmu_redact.c Fix incorrect size given to bqueue_enqueue() call in dmu_redact.c 2022-09-21 13:15:51 -07:00
dmu_send.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
dmu_traverse.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
dmu_tx.c Refactor Log Size Limit 2022-09-26 14:55:27 -07:00
dmu_zfetch.c More speculative prefetcher improvements 2022-07-26 10:10:37 -07:00
dnode.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
dnode_sync.c Report dnodes with faulty bonuslen 2022-02-16 17:58:55 -08:00
dsl_bookmark.c Fix -Wattribute-warning in dsl layer 2022-07-27 13:38:56 -07:00
dsl_crypt.c Introduce a flag to skip comparing the local mac when raw sending 2022-02-04 16:14:56 -08:00
dsl_dataset.c Remove unneeded "extern inline" function declarations 2022-02-16 17:58:56 -08:00
dsl_deadlist.c Fix panic in dsl_process_sub_livelist for EINTR 2022-11-01 12:34:08 -07:00
dsl_deleg.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
dsl_destroy.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
dsl_dir.c Fix ENOSPC when unlinking multiple files from full pool 2022-03-08 11:46:03 -08:00
dsl_pool.c Refactor Log Size Limit 2022-09-26 14:55:27 -07:00
dsl_prop.c Add options to zfs redundant_metadata property 2022-11-01 12:25:58 -07:00
dsl_scan.c Fix scrub resume from newly created hole. 2022-07-26 10:10:37 -07:00
dsl_synctask.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
dsl_userhold.c Replace sprintf()->snprintf() and strcpy()->strlcpy() 2020-06-07 11:42:12 -07:00
edonr_zfs.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
fm.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
gzip.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
hkdf.c Encryption patch follow-up 2017-10-11 16:54:48 -04:00
lz4.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
lzjb.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
metaslab.c Stop ganging due to past vdev write errors 2022-11-01 12:36:25 -07:00
mmp.c Optimize small random numbers generation 2021-09-14 12:10:17 -07:00
multilist.c Optimize small random numbers generation 2021-09-14 12:10:17 -07:00
objlist.c Implement Redacted Send/Receive 2019-06-19 09:48:12 -07:00
pathname.c Replace ZFS on Linux references with OpenZFS 2020-10-08 20:10:13 -07:00
range_tree.c Several sorted scrub optimizations 2022-07-26 10:10:37 -07:00
refcount.c Export minimal zfs_refcount interfaces 2022-04-06 10:29:00 -07:00
rrwlock.c Rename refcount.h to zfs_refcount.h 2020-07-29 16:35:33 -07:00
sa.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
sha256.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
skein_zfs.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
spa.c Improve log spacemap load time 2022-07-26 10:10:37 -07:00
spa_boot.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
spa_checkpoint.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
spa_config.c Cleaning up uio headers 2021-02-20 20:16:50 -08:00
spa_errlog.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
spa_history.c Annotated dprintf as printf-like 2021-06-24 13:12:36 -07:00
spa_log_spacemap.c Improve log spacemap load time 2022-07-26 10:10:37 -07:00
spa_misc.c Remove refcount from spa_config_*() 2022-07-26 10:10:37 -07:00
spa_stats.c Remove pool io kstats 2021-06-10 10:50:16 -07:00
space_map.c Optimize small random numbers generation 2021-09-14 12:10:17 -07:00
space_reftree.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
txg.c Optimize txg_kick() process (#12274) 2022-09-21 16:12:14 -07:00
uberblock.c MMP interval and fail_intervals in uberblock 2019-03-21 12:47:57 -07:00
unique.c Reduce loaded range tree memory usage 2019-10-09 10:36:03 -07:00
vdev.c Improve too large physical ashift handling 2022-09-21 13:15:15 -07:00
vdev_cache.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_draid.c Improve too large physical ashift handling 2022-09-21 13:15:15 -07:00
vdev_draid_rand.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_indirect.c module/zfs: vdev_indirect: vdev_indirect_repair: remove unused variable 2022-05-02 15:42:58 -07:00
vdev_indirect_births.c Fixes: #8934 Large kmem_alloc 2019-07-10 15:54:49 -07:00
vdev_indirect_mapping.c Replace ASSERTV macro with compiler annotation 2019-12-05 12:37:00 -08:00
vdev_initialize.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
vdev_label.c Use fallthrough macro 2021-11-02 09:50:30 -07:00
vdev_mirror.c Improve too large physical ashift handling 2022-09-21 13:15:15 -07:00
vdev_missing.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
vdev_queue.c Avoid vq_lock drop in vdev_queue_aggregate() 2021-09-14 14:31:22 -07:00
vdev_raidz.c Improve too large physical ashift handling 2022-09-21 13:15:15 -07:00
vdev_raidz_math.c Initialize parity blocks before RAID-Z reconstruction benchmarking 2021-09-14 14:32:16 -07:00
vdev_raidz_math_aarch64_neon.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_aarch64_neon_common.h FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_aarch64_neonx2.c Linux 5.0 compat: SIMD compatibility 2019-07-12 09:31:20 -07:00
vdev_raidz_math_avx2.c FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_avx512bw.c Refactor ccompile.h to not include system headers 2020-07-25 20:09:50 -07:00
vdev_raidz_math_avx512f.c FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_impl.h Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_raidz_math_powerpc_altivec.c Prefix zfs internal endian checks with _ZFS 2020-07-28 13:02:49 -07:00
vdev_raidz_math_powerpc_altivec_common.h FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_scalar.c Use fallthrough macro 2021-11-02 09:50:30 -07:00
vdev_raidz_math_sse2.c FreeBSD: fix the build with Clang 11 2020-08-17 15:40:17 -07:00
vdev_raidz_math_ssse3.c Refactor ccompile.h to not include system headers 2020-07-25 20:09:50 -07:00
vdev_rebuild.c Fix sequential resilver drive failure race condition 2022-10-21 14:05:06 -07:00
vdev_removal.c Improve log spacemap load time 2022-07-26 10:10:37 -07:00
vdev_root.c Distributed Spare (dRAID) Feature 2020-11-13 13:51:51 -08:00
vdev_trim.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
zap.c Remove unneeded "extern inline" function declarations 2022-02-16 17:58:56 -08:00
zap_leaf.c Remove unneeded "extern inline" function declarations 2022-02-16 17:58:56 -08:00
zap_micro.c Remove unneeded "extern inline" function declarations 2022-02-16 17:58:56 -08:00
zcp.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
zcp_get.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
zcp_global.c OpenZFS 8600 - ZFS channel programs - snapshot 2018-02-08 15:29:24 -08:00
zcp_iter.c Fix typos in module/zfs/ 2019-09-02 17:56:41 -07:00
zcp_set.c Support setting user properties in a channel program 2020-02-14 13:41:42 -08:00
zcp_synctask.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
zfeature.c Throw const on some strings 2020-10-02 17:44:10 -07:00
zfs_byteswap.c Mark functions as static 2020-06-18 12:20:38 -07:00
zfs_fm.c fm: remove unused variables 2022-05-02 15:42:58 -07:00
zfs_fuid.c Fix regression in POSIX mode behavior 2021-03-19 22:50:46 -07:00
zfs_ioctl.c Delay ZFS_PROP_SHARESMB property to handle it for encrypted raw receive 2022-09-21 13:15:26 -07:00
zfs_log.c Add Module Parameter Regarding Log Size Limit 2022-09-21 16:12:14 -07:00
zfs_onexit.c file reference counts can get corrupted 2021-09-14 12:37:38 -07:00
zfs_quota.c File incorrectly zeroed when receiving incremental stream that toggles -L 2020-06-09 10:41:01 -07:00
zfs_ratelimit.c Change checksum & IO delay ratelimit values 2018-03-04 17:34:51 -08:00
zfs_replay.c Use fallthrough macro 2021-11-02 09:50:30 -07:00
zfs_rlock.c Add a "try" operation for range locks 2020-07-06 11:53:31 -07:00
zfs_sa.c Extending FreeBSD UIO Struct 2021-01-20 21:27:30 -08:00
zfs_vnops.c Revert behavior of 59eab109 on not-Linux 2022-08-02 10:05:14 -07:00
zil.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
zio.c Fix scrub resume from newly created hole. 2022-07-26 10:10:37 -07:00
zio_checksum.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
zio_compress.c module: zfs: fix unused, remove argsused 2022-02-16 17:58:56 -08:00
zio_inject.c Optimize small random numbers generation 2021-09-14 12:10:17 -07:00
zle.c Add include files for prototypes 2020-06-18 12:21:25 -07:00
zrlock.c Remove dead code 2020-06-18 12:21:18 -07:00
zthr.c Avoid memory allocations in the ARC eviction thread 2022-02-03 15:30:52 -08:00
zvol.c Add Module Parameter Regarding Log Size Limit 2022-09-21 16:12:14 -07:00