Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
#!/bin/bash
|
|
|
|
#
|
|
|
|
# ZPOOL fault verification test script.
|
|
|
|
#
|
|
|
|
# The current suite of fault tests should not be thought of an exhaustive
|
|
|
|
# list of failure modes. Rather it is simply an starting point which trys
|
|
|
|
# to cover the bulk the of the 'easy' and hopefully common, failure modes.
|
|
|
|
#
|
|
|
|
# Additional tests should be added but the current suite as new interesting
|
|
|
|
# failures modes are observed. Additional failure modes I'd like to see
|
|
|
|
# tests for include, but are not limited too:
|
|
|
|
#
|
|
|
|
# * Slow but successful IO.
|
|
|
|
# * SCSI sense codes generated as zevents.
|
|
|
|
# * 4k sectors
|
|
|
|
# * noise
|
|
|
|
# * medium error
|
|
|
|
# * recovered error
|
|
|
|
#
|
|
|
|
# The current infrastructure using the 'mdadm' faulty device and the
|
|
|
|
# 'scsi_debug' simulated scsi devices. The idea is to inject the error
|
|
|
|
# below the zfs stack to validate all the error paths. More targeted
|
|
|
|
# failure testing should be added using the 'zinject' command line util.
|
|
|
|
#
|
|
|
|
# Requires the following packages:
|
|
|
|
# * mdadm
|
|
|
|
# * lsscsi
|
|
|
|
# * sg3-utils
|
|
|
|
#
|
|
|
|
|
|
|
|
basedir="$(dirname $0)"
|
|
|
|
|
|
|
|
SCRIPT_COMMON=common.sh
|
|
|
|
if [ -f "${basedir}/${SCRIPT_COMMON}" ]; then
|
|
|
|
. "${basedir}/${SCRIPT_COMMON}"
|
|
|
|
else
|
|
|
|
echo "Missing helper script ${SCRIPT_COMMON}" && exit 1
|
|
|
|
fi
|
|
|
|
|
|
|
|
PROG=zfault.sh
|
|
|
|
|
|
|
|
usage() {
|
|
|
|
cat << EOF
|
|
|
|
USAGE:
|
2010-11-11 18:22:36 +00:00
|
|
|
$0 [hvcts]
|
Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
|
|
|
|
DESCRIPTION:
|
|
|
|
ZPOOL fault verification tests
|
|
|
|
|
|
|
|
OPTIONS:
|
|
|
|
-h Show this message
|
|
|
|
-v Verbose
|
|
|
|
-c Cleanup md+lo+file devices at start
|
|
|
|
-t <#> Run listed tests
|
|
|
|
-s <#> Skip listed tests
|
|
|
|
|
|
|
|
EOF
|
|
|
|
}
|
|
|
|
|
|
|
|
while getopts 'hvct:s:?' OPTION; do
|
|
|
|
case $OPTION in
|
|
|
|
h)
|
|
|
|
usage
|
|
|
|
exit 1
|
|
|
|
;;
|
|
|
|
v)
|
|
|
|
VERBOSE=1
|
|
|
|
;;
|
|
|
|
c)
|
|
|
|
CLEANUP=1
|
|
|
|
;;
|
|
|
|
t)
|
|
|
|
TESTS_RUN=($OPTARG)
|
|
|
|
;;
|
|
|
|
s)
|
|
|
|
TESTS_SKIP=($OPTARG)
|
|
|
|
;;
|
|
|
|
?)
|
|
|
|
usage
|
|
|
|
exit
|
|
|
|
;;
|
|
|
|
esac
|
|
|
|
done
|
|
|
|
|
|
|
|
if [ $(id -u) != 0 ]; then
|
|
|
|
die "Must run as root"
|
|
|
|
fi
|
|
|
|
|
2011-10-11 21:36:42 +00:00
|
|
|
# Initialize the test suite
|
|
|
|
init
|
|
|
|
|
Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
# Perform pre-cleanup is requested
|
|
|
|
if [ ${CLEANUP} ]; then
|
2010-11-11 18:17:02 +00:00
|
|
|
${ZFS_SH} -u
|
Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
cleanup_md_devices
|
|
|
|
cleanup_loop_devices
|
|
|
|
rm -f /tmp/zpool.cache.*
|
|
|
|
fi
|
|
|
|
|
|
|
|
# Check if we need to skip all md based tests.
|
|
|
|
MD_PARTITIONABLE=0
|
|
|
|
check_md_partitionable && MD_PARTITIONABLE=1
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
echo "Skipping tests 1-7 which require partitionable md devices"
|
|
|
|
fi
|
|
|
|
|
|
|
|
# Check if we need to skip all the scsi_debug tests.
|
|
|
|
SCSI_DEBUG=0
|
|
|
|
${INFOMOD} scsi_debug &>/dev/null && SCSI_DEBUG=1
|
|
|
|
if [ ${SCSI_DEBUG} -eq 0 ]; then
|
|
|
|
echo "Skipping tests 8-9 which require the scsi_debug module"
|
|
|
|
fi
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ] || [ ${SCSI_DEBUG} -eq 0 ]; then
|
|
|
|
echo
|
|
|
|
fi
|
|
|
|
|
|
|
|
printf "%40s%s\t%s\t%s\t%s\t%s\n" "" "raid0" "raid10" "raidz" "raidz2" "raidz3"
|
|
|
|
|
|
|
|
pass_nonewline() {
|
|
|
|
echo -n -e "${COLOR_GREEN}Pass${COLOR_RESET}\t"
|
|
|
|
}
|
|
|
|
|
|
|
|
skip_nonewline() {
|
|
|
|
echo -n -e "${COLOR_BROWN}Skip${COLOR_RESET}\t"
|
|
|
|
}
|
|
|
|
|
|
|
|
nth_zpool_vdev() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local DEVICE_TYPE=$2
|
|
|
|
local DEVICE_NTH=$3
|
|
|
|
|
|
|
|
${ZPOOL} status ${POOL_NAME} | grep ${DEVICE_TYPE} ${TMP_STATUS} | \
|
|
|
|
head -n${DEVICE_NTH} | tail -n1 | ${AWK} "{ print \$1 }"
|
|
|
|
}
|
|
|
|
|
|
|
|
vdev_status() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local VDEV_NAME=$2
|
|
|
|
|
|
|
|
${ZPOOL} status ${POOL_NAME} | ${AWK} "/${VDEV_NAME}/ { print \$2 }"
|
|
|
|
}
|
|
|
|
|
|
|
|
# Required format is x.yz[KMGTP]
|
|
|
|
expand_numeric_suffix() {
|
|
|
|
local VALUE=$1
|
|
|
|
|
|
|
|
VALUE=`echo "${VALUE/%K/*1000}"`
|
|
|
|
VALUE=`echo "${VALUE/%M/*1000000}"`
|
|
|
|
VALUE=`echo "${VALUE/%G/*1000000000}"`
|
|
|
|
VALUE=`echo "${VALUE/%T/*1000000000000}"`
|
|
|
|
VALUE=`echo "${VALUE/%P/*1000000000000000}"`
|
|
|
|
VALUE=`echo "${VALUE}" | bc | cut -d'.' -f1`
|
|
|
|
|
|
|
|
echo "${VALUE}"
|
|
|
|
}
|
|
|
|
|
|
|
|
vdev_read_errors() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local VDEV_NAME=$2
|
|
|
|
local VDEV_ERRORS=`${ZPOOL} status ${POOL_NAME} |
|
|
|
|
${AWK} "/${VDEV_NAME}/ { print \\$3 }"`
|
|
|
|
|
|
|
|
expand_numeric_suffix ${VDEV_ERRORS}
|
|
|
|
}
|
|
|
|
|
|
|
|
vdev_write_errors() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local VDEV_NAME=$2
|
|
|
|
local VDEV_ERRORS=`${ZPOOL} status ${POOL_NAME} |
|
|
|
|
${AWK} "/${VDEV_NAME}/ { print \\$4 }"`
|
|
|
|
|
|
|
|
expand_numeric_suffix ${VDEV_ERRORS}
|
|
|
|
}
|
|
|
|
|
|
|
|
vdev_cksum_errors() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local VDEV_NAME=$2
|
|
|
|
local VDEV_ERRORS=`${ZPOOL} status ${POOL_NAME} |
|
|
|
|
${AWK} "/${VDEV_NAME}/ { print \\$5 }"`
|
|
|
|
|
|
|
|
expand_numeric_suffix ${VDEV_ERRORS}
|
|
|
|
}
|
|
|
|
|
|
|
|
zpool_state() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
|
|
|
|
${ZPOOL} status ${POOL_NAME} | ${AWK} "/state/ { print \$2; exit }"
|
|
|
|
}
|
|
|
|
|
|
|
|
zpool_event() {
|
|
|
|
local EVENT_NAME=$1
|
|
|
|
local EVENT_KEY=$2
|
|
|
|
|
|
|
|
SCRIPT1="BEGIN {RS=\"\"; FS=\"\n\"} /${EVENT_NAME}/ { print \$0; exit }"
|
|
|
|
SCRIPT2="BEGIN {FS=\"=\"} /${EVENT_KEY}/ { print \$2; exit }"
|
|
|
|
|
|
|
|
${ZPOOL} events -vH | ${AWK} "${SCRIPT1}" | ${AWK} "${SCRIPT2}"
|
|
|
|
}
|
|
|
|
|
|
|
|
zpool_scan_errors() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
|
|
|
|
${ZPOOL} status ${POOL_NAME} | ${AWK} "/scan: scrub/ { print \$8 }"
|
|
|
|
${ZPOOL} status ${POOL_NAME} | ${AWK} "/scan: resilver/ { print \$7 }"
|
|
|
|
}
|
|
|
|
|
|
|
|
pattern_create() {
|
|
|
|
local PATTERN_BLOCK_SIZE=$1
|
|
|
|
local PATTERN_BLOCK_COUNT=$2
|
|
|
|
local PATTERN_NAME=`mktemp -p /tmp zpool.pattern.XXXXXXXX`
|
|
|
|
|
|
|
|
echo ${PATTERN_NAME}
|
|
|
|
dd if=/dev/urandom of=${PATTERN_NAME} bs=${PATTERN_BLOCK_SIZE} \
|
|
|
|
count=${PATTERN_BLOCK_COUNT} &>/dev/null
|
|
|
|
return $?
|
|
|
|
}
|
|
|
|
|
|
|
|
pattern_write() {
|
|
|
|
local PATTERN_NAME=$1
|
|
|
|
local PATTERN_BLOCK_SIZE=$2
|
|
|
|
local PATTERN_BLOCK_COUNT=$3
|
|
|
|
local DEVICE_NAME=$4
|
|
|
|
|
|
|
|
dd if=${PATTERN_NAME} of=${DEVICE_NAME} bs=${PATTERN_BLOCK_SIZE} \
|
|
|
|
count=${PATTERN_BLOCK_COUNT} oflag=direct &>/dev/null
|
|
|
|
return $?
|
|
|
|
}
|
|
|
|
|
|
|
|
pattern_write_bg() {
|
|
|
|
local PATTERN_NAME=$1
|
|
|
|
local PATTERN_BLOCK_SIZE=$2
|
|
|
|
local PATTERN_BLOCK_COUNT=$3
|
|
|
|
local DEVICE_NAME=$4
|
|
|
|
|
|
|
|
dd if=${PATTERN_NAME} of=${DEVICE_NAME} bs=${PATTERN_BLOCK_SIZE} \
|
|
|
|
count=${PATTERN_BLOCK_COUNT} oflag=direct &>/dev/null &
|
|
|
|
return $?
|
|
|
|
}
|
|
|
|
|
|
|
|
pattern_verify() {
|
|
|
|
local PATTERN_NAME=$1
|
|
|
|
local PATTERN_BLOCK_SIZE=$2
|
|
|
|
local PATTERN_BLOCK_COUNT=$3
|
|
|
|
local DEVICE_NAME=$4
|
|
|
|
local DEVICE_FILE=`mktemp -p /tmp zpool.pattern.XXXXXXXX`
|
|
|
|
|
|
|
|
dd if=${DEVICE_NAME} of=${DEVICE_FILE} bs=${PATTERN_BLOCK_SIZE} \
|
|
|
|
count=${PATTERN_BLOCK_COUNT} iflag=direct &>/dev/null
|
|
|
|
cmp -s ${PATTERN_NAME} ${DEVICE_FILE}
|
|
|
|
RC=$?
|
|
|
|
rm -f ${DEVICE_FILE}
|
|
|
|
|
|
|
|
return ${RC}
|
|
|
|
}
|
|
|
|
|
|
|
|
pattern_remove() {
|
|
|
|
local PATTERN_NAME=$1
|
|
|
|
|
|
|
|
rm -f ${PATTERN_NAME}
|
|
|
|
return $?
|
|
|
|
}
|
|
|
|
|
|
|
|
fault_set_md() {
|
|
|
|
local VDEV_FAULTY=$1
|
|
|
|
local FAULT_TYPE=$2
|
|
|
|
|
|
|
|
${MDADM} /dev/${VDEV_FAULTY} --grow --level=faulty \
|
|
|
|
--layout=${FAULT_TYPE} >/dev/null
|
|
|
|
return $?
|
|
|
|
}
|
|
|
|
|
|
|
|
fault_clear_md() {
|
|
|
|
local VDEV_FAULTY=$1
|
|
|
|
|
|
|
|
# Clear all failure injection.
|
|
|
|
${MDADM} /dev/${VDEV_FAULTY} --grow --level=faulty \
|
|
|
|
--layout=clear >/dev/null || return $?
|
|
|
|
${MDADM} /dev/${VDEV_FAULTY} --grow --level=faulty \
|
|
|
|
--layout=flush >/dev/null || return $?
|
|
|
|
return $?
|
|
|
|
}
|
|
|
|
|
|
|
|
fault_set_sd() {
|
|
|
|
local OPTS=$1
|
|
|
|
local NTH=$2
|
|
|
|
|
|
|
|
echo ${OPTS} >/sys/bus/pseudo/drivers/scsi_debug/opts
|
|
|
|
echo ${NTH} >/sys/bus/pseudo/drivers/scsi_debug/every_nth
|
|
|
|
}
|
|
|
|
|
|
|
|
fault_clear_sd() {
|
|
|
|
echo 0 >/sys/bus/pseudo/drivers/scsi_debug/every_nth
|
|
|
|
echo 0 >/sys/bus/pseudo/drivers/scsi_debug/opts
|
|
|
|
}
|
|
|
|
|
|
|
|
test_setup() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local ZVOL_NAME=$3
|
|
|
|
local TMP_CACHE=$4
|
|
|
|
|
|
|
|
${ZFS_SH} zfs="spa_config_path=${TMP_CACHE}" || fail 1
|
|
|
|
${ZPOOL_CREATE_SH} -p ${POOL_NAME} -c ${POOL_CONFIG} || fail 2
|
|
|
|
${ZFS} create -V 64M ${POOL_NAME}/${ZVOL_NAME} || fail 3
|
|
|
|
|
|
|
|
# Trigger udev and re-read the partition table to ensure all of
|
|
|
|
# this IO is out of the way before we begin injecting failures.
|
|
|
|
udev_trigger || fail 4
|
|
|
|
${BLOCKDEV} --rereadpt /dev/${POOL_NAME}/${ZVOL_NAME} || fail 5
|
|
|
|
}
|
|
|
|
|
|
|
|
test_cleanup() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local ZVOL_NAME=$3
|
|
|
|
local TMP_CACHE=$4
|
|
|
|
|
|
|
|
${ZFS} destroy ${POOL_NAME}/${ZVOL_NAME} || fail 101
|
|
|
|
${ZPOOL_CREATE_SH} -p ${POOL_NAME} -c ${POOL_CONFIG} -d || fail 102
|
|
|
|
${ZFS_SH} -u || fail 103
|
|
|
|
rm -f ${TMP_CACHE} || fail 104
|
|
|
|
}
|
|
|
|
|
|
|
|
test_write_soft() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Set soft write failure for first vdev device.
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md 1`
|
|
|
|
fault_set_md ${VDEV_FAULTY} write-transient
|
|
|
|
|
|
|
|
# The application must not observe an error.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
fault_clear_md ${VDEV_FAULTY}
|
|
|
|
|
|
|
|
# Soft errors will not be logged to 'zpool status'
|
|
|
|
local WRITE_ERRORS=`vdev_write_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${WRITE_ERRORS} -eq 0 || fail 13
|
|
|
|
|
|
|
|
# Soft errors will still generate an EIO (5) event.
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 14
|
|
|
|
|
|
|
|
# Verify the known pattern.
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 15
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 16
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# Soft write error.
|
|
|
|
test_1() {
|
|
|
|
test_write_soft tank lo-faulty-raid0 0
|
|
|
|
test_write_soft tank lo-faulty-raid10 1
|
|
|
|
test_write_soft tank lo-faulty-raidz 1
|
|
|
|
test_write_soft tank lo-faulty-raidz2 1
|
|
|
|
test_write_soft tank lo-faulty-raidz3 1
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 1 "soft write error"
|
|
|
|
|
|
|
|
test_write_hard() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Set hard write failure for first vdev device.
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md 1`
|
|
|
|
fault_set_md ${VDEV_FAULTY} write-persistent
|
|
|
|
|
|
|
|
# The application must not observe an error.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
fault_clear_md ${VDEV_FAULTY}
|
|
|
|
|
|
|
|
local WRITE_ERRORS=`vdev_write_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
# For redundant configurations hard errors will not be
|
|
|
|
# logged to 'zpool status' but will generate EIO events.
|
|
|
|
test ${WRITE_ERRORS} -eq 0 || fail 21
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 22
|
|
|
|
else
|
|
|
|
# For non-redundant configurations hard errors will be
|
|
|
|
# logged to 'zpool status' and generate EIO events. They
|
|
|
|
# will also trigger a scrub of the impacted sectors.
|
|
|
|
sleep 10
|
|
|
|
test ${WRITE_ERRORS} -gt 0 || fail 31
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 32
|
|
|
|
test `zpool_event "zfs.resilver.start" "ena"` != "" || fail 33
|
|
|
|
test `zpool_event "zfs.resilver.finish" "ena"` != "" || fail 34
|
|
|
|
test `zpool_scan_errors ${POOL_NAME}` -eq 0 || fail 35
|
|
|
|
fi
|
|
|
|
|
|
|
|
# Verify the known pattern.
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 41
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 42
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# Hard write error.
|
|
|
|
test_2() {
|
|
|
|
test_write_hard tank lo-faulty-raid0 0
|
|
|
|
test_write_hard tank lo-faulty-raid10 1
|
|
|
|
test_write_hard tank lo-faulty-raidz 1
|
|
|
|
test_write_hard tank lo-faulty-raidz2 1
|
|
|
|
test_write_hard tank lo-faulty-raidz3 1
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 2 "hard write error"
|
|
|
|
|
|
|
|
test_write_all() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Set all write failures for first vdev device.
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md 1`
|
|
|
|
fault_set_md ${VDEV_FAULTY} write-all
|
|
|
|
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
# The application must not observe an error.
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
else
|
|
|
|
# The application is expected to hang in the background until
|
|
|
|
# the faulty device is repaired and 'zpool clear' is run.
|
|
|
|
pattern_write_bg ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 13
|
|
|
|
sleep 10
|
|
|
|
fi
|
|
|
|
fault_clear_md ${VDEV_FAULTY}
|
|
|
|
|
|
|
|
local WRITE_ERRORS=`vdev_write_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
local VDEV_STATUS=`vdev_status ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
local POOL_STATE=`zpool_state ${POOL_NAME}`
|
|
|
|
# For all configurations write errors are logged to 'zpool status',
|
|
|
|
# and EIO events are generated. However, only a redundant config
|
|
|
|
# will cause the vdev to be FAULTED and pool DEGRADED. In a non-
|
|
|
|
# redundant config the IO will hang until 'zpool clear' is run.
|
|
|
|
test ${WRITE_ERRORS} -gt 0 || fail 14
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 15
|
|
|
|
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
test "${VDEV_STATUS}" = "FAULTED" || fail 21
|
|
|
|
test "${POOL_STATE}" = "DEGRADED" || fail 22
|
|
|
|
else
|
|
|
|
BLOCKED=`ps a | grep "${ZVOL_DEVICE}" | grep -c -v "grep"`
|
|
|
|
${ZPOOL} clear ${POOL_NAME} || fail 31
|
|
|
|
test ${BLOCKED} -eq 1 || fail 32
|
|
|
|
wait
|
|
|
|
fi
|
|
|
|
|
|
|
|
# Verify the known pattern.
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 41
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 42
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# All write errors.
|
|
|
|
test_3() {
|
|
|
|
test_write_all tank lo-faulty-raid0 0
|
|
|
|
test_write_all tank lo-faulty-raid10 1
|
|
|
|
test_write_all tank lo-faulty-raidz 1
|
|
|
|
test_write_all tank lo-faulty-raidz2 1
|
|
|
|
test_write_all tank lo-faulty-raidz3 1
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 3 "all write errors"
|
|
|
|
|
|
|
|
test_read_soft() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
local READ_ERRORS=0
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Create a pattern to be verified during a read error.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
|
|
|
|
# Set soft read failure for all the vdevs to ensure we hit it.
|
|
|
|
for (( i=1; i<=4; i++ )); do
|
|
|
|
fault_set_md `nth_zpool_vdev ${POOL_NAME} md $i` read-transient
|
|
|
|
done
|
|
|
|
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 13
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 14
|
|
|
|
|
|
|
|
# Clear all failure injection and sum read errors.
|
|
|
|
for (( i=1; i<=4; i++ )); do
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md $i`
|
|
|
|
local VDEV_ERRORS=`vdev_read_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
let READ_ERRORS=${READ_ERRORS}+${VDEV_ERRORS}
|
|
|
|
fault_clear_md ${VDEV_FAULTY}
|
|
|
|
done
|
|
|
|
|
|
|
|
# Soft errors will not be logged to 'zpool status'.
|
|
|
|
test ${READ_ERRORS} -eq 0 || fail 15
|
|
|
|
|
|
|
|
# Soft errors will still generate an EIO (5) event.
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 16
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# Soft read error.
|
|
|
|
test_4() {
|
|
|
|
test_read_soft tank lo-faulty-raid0 0
|
|
|
|
test_read_soft tank lo-faulty-raid10 1
|
|
|
|
test_read_soft tank lo-faulty-raidz 1
|
|
|
|
test_read_soft tank lo-faulty-raidz2 1
|
|
|
|
test_read_soft tank lo-faulty-raidz3 1
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 4 "soft read error"
|
|
|
|
|
|
|
|
test_read_hard() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
local READ_ERRORS=0
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Create a pattern to be verified during a read error.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
|
|
|
|
# Set hard read failure for the fourth vdev.
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md 4`
|
|
|
|
fault_set_md ${VDEV_FAULTY} read-persistent
|
|
|
|
|
|
|
|
# For a redundant pool there must be no IO error, for a non-redundant
|
|
|
|
# pool we expect permanent damage and an IO error during verify, unless
|
|
|
|
# we get exceptionally lucky and have just damaged redundant metadata.
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 21
|
|
|
|
local READ_ERRORS=`vdev_read_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${READ_ERRORS} -eq 0 || fail 22
|
|
|
|
else
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE}
|
|
|
|
${ZPOOL} scrub ${POOL_NAME} || fail 32
|
|
|
|
local READ_ERRORS=`vdev_read_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${READ_ERRORS} -gt 0 || fail 33
|
|
|
|
${ZPOOL} status -v ${POOL_NAME} | \
|
|
|
|
grep -A8 "Permanent errors" | \
|
|
|
|
grep -q "${POOL_NAME}" || fail 34
|
|
|
|
fi
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 41
|
|
|
|
|
|
|
|
# Clear all failure injection and sum read errors.
|
|
|
|
fault_clear_md ${VDEV_FAULTY}
|
|
|
|
|
|
|
|
# Hard errors will generate an EIO (5) event.
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 42
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# Hard read error.
|
|
|
|
test_5() {
|
|
|
|
test_read_hard tank lo-faulty-raid0 0
|
|
|
|
test_read_hard tank lo-faulty-raid10 1
|
|
|
|
test_read_hard tank lo-faulty-raidz 1
|
|
|
|
test_read_hard tank lo-faulty-raidz2 1
|
|
|
|
test_read_hard tank lo-faulty-raidz3 1
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 5 "hard read error"
|
|
|
|
|
|
|
|
# Fixable read error.
|
|
|
|
test_read_fixable() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
local READ_ERRORS=0
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Create a pattern to be verified during a read error.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
|
|
|
|
# Set hard read failure for the fourth vdev.
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md 4`
|
|
|
|
fault_set_md ${VDEV_FAULTY} read-fixable
|
|
|
|
|
|
|
|
# For a redundant pool there must be no IO error, for a non-redundant
|
|
|
|
# pool we expect permanent damage and an IO error during verify, unless
|
|
|
|
# we get exceptionally lucky and have just damaged redundant metadata.
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 21
|
|
|
|
local READ_ERRORS=`vdev_read_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${READ_ERRORS} -eq 0 || fail 22
|
|
|
|
else
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE}
|
|
|
|
${ZPOOL} scrub ${POOL_NAME} || fail 32
|
|
|
|
local READ_ERRORS=`vdev_read_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${READ_ERRORS} -gt 0 || fail 33
|
|
|
|
${ZPOOL} status -v ${POOL_NAME} | \
|
|
|
|
grep -A8 "Permanent errors" | \
|
|
|
|
grep -q "${POOL_NAME}" || fail 34
|
|
|
|
fi
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 41
|
|
|
|
|
|
|
|
# Clear all failure injection and sum read errors.
|
|
|
|
fault_clear_md ${VDEV_FAULTY}
|
|
|
|
|
|
|
|
# Hard errors will generate an EIO (5) event.
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 42
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# Read errors fixable with a write.
|
|
|
|
test_6() {
|
|
|
|
test_read_fixable tank lo-faulty-raid0 0
|
|
|
|
test_read_fixable tank lo-faulty-raid10 1
|
|
|
|
test_read_fixable tank lo-faulty-raidz 1
|
|
|
|
test_read_fixable tank lo-faulty-raidz2 1
|
|
|
|
test_read_fixable tank lo-faulty-raidz3 1
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 6 "fixable read error"
|
|
|
|
|
|
|
|
test_cksum() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local VDEV_DAMAGE="$4"
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
|
|
|
|
if [ ${MD_PARTITIONABLE} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Create a pattern to be verified.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
|
|
|
|
# Verify the pattern and that no vdev has cksum errors.
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 13
|
|
|
|
for (( i=1; i<4; i++ )); do
|
|
|
|
VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md ${i}`
|
|
|
|
CKSUM_ERRORS=`vdev_cksum_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${CKSUM_ERRORS} -eq 0 || fail 14
|
|
|
|
done
|
|
|
|
|
|
|
|
# Corrupt the bulk of a vdev with random garbage, we damage as many
|
|
|
|
# vdevs as we have levels of redundancy. For example for a raidz3
|
|
|
|
# configuration we can trash 3 vdevs and still expect correct data.
|
|
|
|
# This improves the odds that we read one of the damaged vdevs.
|
|
|
|
for VDEV in ${VDEV_DAMAGE}; do
|
|
|
|
VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} md $VDEV`
|
|
|
|
pattern_write /dev/urandom 1M 64 /dev/${VDEV_FAULTY}p1
|
|
|
|
done
|
|
|
|
|
|
|
|
# Verify the pattern is still correct. For non-redundant pools
|
|
|
|
# expect failure and for redundant pools success due to resilvering.
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 16
|
|
|
|
else
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} && fail 17
|
|
|
|
fi
|
|
|
|
|
|
|
|
CKSUM_ERRORS=`vdev_cksum_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${CKSUM_ERRORS} -gt 0 || fail 18
|
|
|
|
STATUS=`vdev_status ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test "${STATUS}" = "ONLINE" || fail 19
|
|
|
|
|
|
|
|
# The checksum errors must be logged as an event.
|
|
|
|
local CKSUM_ERRORS=`zpool_event "zfs.checksum" "zio_err"`
|
|
|
|
test ${CKSUM_ERRORS} = "0x34" || test ${CKSUM_ERRORS} = "0x0" || fail 20
|
|
|
|
|
|
|
|
# Verify permant errors for non-redundant pools, and for redundant
|
|
|
|
# pools trigger a scrub and check that all checksums have been fixed.
|
|
|
|
if [ ${POOL_REDUNDANT} -eq 1 ]; then
|
|
|
|
# Scrub the checksum errors and clear the faults.
|
|
|
|
${ZPOOL} scrub ${POOL_NAME} || fail 21
|
|
|
|
sleep 3
|
|
|
|
${ZPOOL} clear ${POOL_NAME} || fail 22
|
|
|
|
|
|
|
|
# Re-verify the pattern for fixed checksums.
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 23
|
|
|
|
CKSUM_ERRORS=`vdev_cksum_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${CKSUM_ERRORS} -eq 0 || fail 24
|
|
|
|
|
|
|
|
# Re-verify the entire pool for fixed checksums.
|
|
|
|
${ZPOOL} scrub ${POOL_NAME} || fail 25
|
|
|
|
CKSUM_ERRORS=`vdev_cksum_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${CKSUM_ERRORS} -eq 0 || fail 26
|
|
|
|
else
|
|
|
|
${ZPOOL} status -v ${POOL_NAME} | \
|
|
|
|
grep -A8 "Permanent errors" | \
|
|
|
|
grep -q "${POOL_NAME}/${ZVOL_NAME}" || fail 31
|
|
|
|
${ZPOOL} clear ${POOL_NAME} || fail 32
|
|
|
|
fi
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 41
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
# Silent data corruption
|
|
|
|
test_7() {
|
|
|
|
test_cksum tank lo-faulty-raid0 0 "1"
|
|
|
|
test_cksum tank lo-faulty-raid10 1 "1 3"
|
|
|
|
test_cksum tank lo-faulty-raidz 1 "4"
|
|
|
|
test_cksum tank lo-faulty-raidz2 1 "3 4"
|
|
|
|
test_cksum tank lo-faulty-raidz3 1 "2 3 4"
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 7 "silent data corruption"
|
|
|
|
|
|
|
|
# Soft write timeout at the scsi device layer.
|
|
|
|
test_write_timeout_soft() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local POOL_NTH=$4
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
|
|
|
|
if [ ${SCSI_DEBUG} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
# Set timeout(0x4) for every nth command.
|
|
|
|
fault_set_sd 4 ${POOL_NTH}
|
|
|
|
|
|
|
|
# The application must not observe an error.
|
|
|
|
local TMP_PATTERN=`pattern_create 1M 8` || fail 11
|
|
|
|
pattern_write ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 12
|
|
|
|
fault_clear_sd
|
|
|
|
|
|
|
|
# Intermittent write timeouts even with FAILFAST set may not cause
|
|
|
|
# an EIO (5) event. This is because how FAILFAST is handled depends
|
|
|
|
# a log on the low level driver and the exact nature of the failure.
|
|
|
|
# We will however see a 'zfs.delay' event logged due to the timeout.
|
|
|
|
VDEV_DELAY=`zpool_event "zfs.delay" "zio_delay"`
|
|
|
|
test `printf "%d" ${VDEV_DELAY}` -ge 30000 || fail 13
|
|
|
|
|
|
|
|
# Verify the known pattern.
|
|
|
|
pattern_verify ${TMP_PATTERN} 1M 8 ${ZVOL_DEVICE} || fail 14
|
|
|
|
pattern_remove ${TMP_PATTERN} || fail 15
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
test_8() {
|
|
|
|
test_write_timeout_soft tank scsi_debug-raid0 0 50
|
|
|
|
test_write_timeout_soft tank scsi_debug-raid10 1 100
|
|
|
|
test_write_timeout_soft tank scsi_debug-raidz 1 75
|
|
|
|
test_write_timeout_soft tank scsi_debug-raidz2 1 150
|
|
|
|
test_write_timeout_soft tank scsi_debug-raidz3 1 300
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 8 "soft write timeout"
|
|
|
|
|
|
|
|
# Persistent write timeout at the scsi device layer.
|
|
|
|
test_write_timeout_hard() {
|
|
|
|
local POOL_NAME=$1
|
|
|
|
local POOL_CONFIG=$2
|
|
|
|
local POOL_REDUNDANT=$3
|
|
|
|
local POOL_NTH=$4
|
|
|
|
local ZVOL_NAME="zvol"
|
|
|
|
local ZVOL_DEVICE="/dev/${POOL_NAME}/${ZVOL_NAME}"
|
|
|
|
local RESCAN=1
|
|
|
|
|
|
|
|
if [ ${SCSI_DEBUG} -eq 0 ]; then
|
|
|
|
skip_nonewline
|
|
|
|
return
|
|
|
|
fi
|
|
|
|
|
|
|
|
local TMP_CACHE=`mktemp -p /tmp zpool.cache.XXXXXXXX`
|
|
|
|
test_setup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
|
|
|
|
local TMP_PATTERN1=`pattern_create 1M 8`
|
|
|
|
local TMP_PATTERN2=`pattern_create 1M 8`
|
|
|
|
local TMP_PATTERN3=`pattern_create 1M 8`
|
|
|
|
|
|
|
|
# Create three partitions each one gets a unique pattern. The first
|
|
|
|
# pattern is written before the failure, the second pattern during
|
|
|
|
# the failure, and the third pattern while the vdev is degraded.
|
|
|
|
# All three patterns are verified while the vdev is degraded and
|
|
|
|
# then again once it is brought back online.
|
|
|
|
${PARTED} -s ${ZVOL_DEVICE} mklabel gpt || fail 11
|
|
|
|
${PARTED} -s ${ZVOL_DEVICE} mkpart primary 1M 16M || fail 12
|
|
|
|
${PARTED} -s ${ZVOL_DEVICE} mkpart primary 16M 32M || fail 13
|
|
|
|
${PARTED} -s ${ZVOL_DEVICE} mkpart primary 32M 48M || fail 14
|
|
|
|
|
|
|
|
wait_udev ${ZVOL_DEVICE}1 30
|
|
|
|
wait_udev ${ZVOL_DEVICE}2 30
|
|
|
|
wait_udev ${ZVOL_DEVICE}3 30
|
|
|
|
|
|
|
|
# Before the failure.
|
|
|
|
pattern_write ${TMP_PATTERN1} 1M 8 ${ZVOL_DEVICE}1 || fail 15
|
|
|
|
|
|
|
|
# Get the faulty vdev name.
|
|
|
|
local VDEV_FAULTY=`nth_zpool_vdev ${POOL_NAME} sd 1`
|
|
|
|
|
|
|
|
# Set timeout(0x4) for every nth command.
|
|
|
|
fault_set_sd 4 ${POOL_NTH}
|
|
|
|
|
|
|
|
# During the failure.
|
|
|
|
pattern_write ${TMP_PATTERN2} 1M 8 ${ZVOL_DEVICE}2 || fail 21
|
|
|
|
|
|
|
|
# Expect write errors to be logged to 'zpool status'
|
|
|
|
local WRITE_ERRORS=`vdev_write_errors ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test ${WRITE_ERRORS} -gt 0 || fail 22
|
|
|
|
|
|
|
|
local VDEV_STATUS=`vdev_status ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test "${VDEV_STATUS}" = "UNAVAIL" || fail 23
|
|
|
|
|
|
|
|
# Clear the error and remove it from /dev/.
|
|
|
|
fault_clear_sd
|
|
|
|
rm -f /dev/${VDEV_FAULTY}[0-9]
|
|
|
|
|
|
|
|
# Verify the first two patterns and write out the third.
|
|
|
|
pattern_write ${TMP_PATTERN3} 1M 8 ${ZVOL_DEVICE}3 || fail 31
|
|
|
|
pattern_verify ${TMP_PATTERN1} 1M 8 ${ZVOL_DEVICE}1 || fail 32
|
|
|
|
pattern_verify ${TMP_PATTERN2} 1M 8 ${ZVOL_DEVICE}2 || fail 33
|
|
|
|
pattern_verify ${TMP_PATTERN3} 1M 8 ${ZVOL_DEVICE}3 || fail 34
|
|
|
|
|
|
|
|
# Bring the device back online by rescanning for it. It must appear
|
|
|
|
# in lsscsi and be available to dd before allowing ZFS to bring it
|
|
|
|
# online. This is not required but provides additional sanity.
|
|
|
|
while [ ${RESCAN} -eq 1 ]; do
|
|
|
|
scsi_rescan
|
|
|
|
wait_udev /dev/${VDEV_FAULTY} 30
|
|
|
|
|
|
|
|
if [ `${LSSCSI} | grep -c "/dev/${VDEV_FAULTY}"` -eq 0 ]; then
|
|
|
|
continue
|
|
|
|
fi
|
|
|
|
|
|
|
|
dd if=/dev/${VDEV_FAULTY} of=/dev/null bs=8M count=1 &>/dev/null
|
|
|
|
if [ $? -ne 0 ]; then
|
|
|
|
continue
|
|
|
|
fi
|
|
|
|
|
|
|
|
RESCAN=0
|
|
|
|
done
|
|
|
|
|
|
|
|
# Bring the device back online. We expect it to be automatically
|
|
|
|
# resilvered without error and we should see minimally the zfs.io,
|
|
|
|
# zfs.statechange (VDEV_STATE_HEALTHY (0x7)), and zfs.resilver.*
|
|
|
|
# events posted.
|
2010-11-23 19:19:57 +00:00
|
|
|
${ZPOOL} online ${POOL_NAME} ${VDEV_FAULTY} || fail 51
|
Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
sleep 3
|
|
|
|
test `zpool_event "zfs.io" "zio_err"` = "0x5" || fail 52
|
|
|
|
test `zpool_event "zfs.statechange" "vdev_state"` = "0x7" || fail 53
|
|
|
|
test `zpool_event "zfs.resilver.start" "ena"` != "" || fail 54
|
|
|
|
test `zpool_event "zfs.resilver.finish" "ena"` != "" || fail 55
|
|
|
|
test `zpool_scan_errors ${POOL_NAME}` -eq 0 || fail 56
|
|
|
|
|
|
|
|
local VDEV_STATUS=`vdev_status ${POOL_NAME} ${VDEV_FAULTY}`
|
|
|
|
test "${VDEV_STATUS}" = "ONLINE" || fail 57
|
|
|
|
|
|
|
|
# Verify the known pattern.
|
|
|
|
pattern_verify ${TMP_PATTERN1} 1M 8 ${ZVOL_DEVICE}1 || fail 61
|
|
|
|
pattern_verify ${TMP_PATTERN2} 1M 8 ${ZVOL_DEVICE}2 || fail 62
|
|
|
|
pattern_verify ${TMP_PATTERN3} 1M 8 ${ZVOL_DEVICE}3 || fail 63
|
|
|
|
pattern_remove ${TMP_PATTERN1} || fail 64
|
|
|
|
pattern_remove ${TMP_PATTERN2} || fail 65
|
|
|
|
pattern_remove ${TMP_PATTERN3} || fail 66
|
|
|
|
|
|
|
|
test_cleanup ${POOL_NAME} ${POOL_CONFIG} ${ZVOL_NAME} ${TMP_CACHE}
|
|
|
|
pass_nonewline
|
|
|
|
}
|
|
|
|
|
|
|
|
test_9() {
|
|
|
|
skip_nonewline # Skip non-redundant config
|
|
|
|
test_write_timeout_hard tank scsi_debug-raid10 1 -50
|
|
|
|
test_write_timeout_hard tank scsi_debug-raidz 1 -50
|
|
|
|
test_write_timeout_hard tank scsi_debug-raidz2 1 -50
|
|
|
|
test_write_timeout_hard tank scsi_debug-raidz3 1 -50
|
|
|
|
echo
|
|
|
|
}
|
|
|
|
run_test 9 "hard write timeout"
|
|
|
|
|
|
|
|
exit 0
|