Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
#!/bin/bash
|
|
|
|
#
|
|
|
|
# 1 scsi_debug device for fault injection and 3 loopback devices
|
|
|
|
# on top of which is layered raidz.
|
|
|
|
#
|
|
|
|
|
|
|
|
SDSIZE=${SDSIZE:-256}
|
|
|
|
SDHOSTS=${SDHOSTS:-1}
|
|
|
|
SDTGTS=${SDTGTS:-1}
|
|
|
|
SDLUNS=${SDLUNS:-1}
|
|
|
|
LDMOD=/sbin/modprobe
|
2014-02-05 00:09:55 +00:00
|
|
|
FILEDIR=${FILEDIR:-/var/tmp}
|
|
|
|
FILES=${FILES:-"$FILEDIR/file-vdev0 $FILEDIR/file-vdev1 $FILEDIR/file-vdev2"}
|
Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
DEVICES=""
|
|
|
|
|
|
|
|
zpool_create() {
|
|
|
|
check_loop_utils
|
|
|
|
check_sd_utils
|
|
|
|
|
|
|
|
test `${LSMOD} | grep -c scsi_debug` -gt 0 && \
|
|
|
|
(echo 0 >/sys/module/scsi_debug/parameters/every_nth && \
|
|
|
|
${RMMOD} scsi_debug || exit 1)
|
|
|
|
udev_trigger
|
|
|
|
|
|
|
|
msg "${LDMOD} scsi_debug dev_size_mb=${SDSIZE} " \
|
|
|
|
"add_host=${SDHOSTS} num_tgts=${SDTGTS} " \
|
|
|
|
"max_luns=${SDLUNS}"
|
|
|
|
${LDMOD} scsi_debug \
|
|
|
|
dev_size_mb=${SDSIZE} \
|
|
|
|
add_host=${SDHOSTS} \
|
|
|
|
num_tgts=${SDTGTS} \
|
|
|
|
max_luns=${SDLUNS} || \
|
|
|
|
die "Error $? creating scsi_debug devices"
|
|
|
|
udev_trigger
|
|
|
|
|
|
|
|
SDDEVICE=`${LSSCSI} | ${AWK} '/scsi_debug/ { print $6; exit }'`
|
|
|
|
msg "${PARTED} -s ${SDDEVICE} mklabel gpt"
|
|
|
|
${PARTED} -s ${SDDEVICE} mklabel gpt || \
|
|
|
|
(${RMMOD} scsi_debug && die "Error $? creating gpt label")
|
|
|
|
|
|
|
|
for FILE in ${FILES}; do
|
|
|
|
LODEVICE=`unused_loop_device`
|
|
|
|
|
|
|
|
rm -f ${FILE} || exit 1
|
|
|
|
dd if=/dev/zero of=${FILE} bs=1024k count=0 seek=256 \
|
|
|
|
&>/dev/null || (${RMMOD} scsi_debug && \
|
|
|
|
die "Error $? creating ${FILE}")
|
|
|
|
|
|
|
|
# Setup the loopback device on the file.
|
|
|
|
msg "Creating ${LODEVICE} using ${FILE}"
|
|
|
|
${LOSETUP} ${LODEVICE} ${FILE} || (${RMMOD} scsi_debug \
|
|
|
|
die "Error $? creating ${LODEVICE} using ${FILE}")
|
|
|
|
|
|
|
|
DEVICES="${DEVICES} ${LODEVICE}"
|
|
|
|
done
|
|
|
|
|
|
|
|
DEVICES="${DEVICES} ${SDDEVICE}"
|
|
|
|
|
2014-07-22 21:43:22 +00:00
|
|
|
msg "${ZPOOL} create ${ZPOOL_FLAGS} ${ZPOOL_NAME} raidz ${DEVICES}"
|
|
|
|
${ZPOOL} create ${ZPOOL_FLAGS} ${ZPOOL_NAME} raidz ${DEVICES} || \
|
Add zfault zpool configurations and tests
Eleven new zpool configurations were added to allow testing of various
failure cases. The first 5 zpool configurations leverage the 'faulty'
md device type which allow us to simuluate IO errors at the block layer.
The last 6 zpool configurations leverage the scsi_debug module provided
by modern kernels. This device allows you to create virtual scsi
devices which are backed by a ram disk. With this setup we can verify
the full IO stack by injecting faults at the lowest layer. Both methods
of fault injection are important to verifying the IO stack.
The zfs code itself also provides a mechanism for error injection
via the zinject command line tool. While we should also take advantage
of this appraoch to validate the code it does not address any of the
Linux integration issues which are the most concerning. For the
moment we're trusting that the upstream Solaris guys are running
zinject and would have caught internal zfs logic errors.
Currently, there are 6 r/w test cases layered on top of the 'faulty'
md devices. They include 3 writes tests for soft/transient errors,
hard/permenant errors, and all writes error to the device. There
are 3 matching read tests for soft/transient errors, hard/permenant
errors, and fixable read error with a write. Although for this last
case zfs doesn't do anything special.
The seventh test case verifies zfs detects and corrects checksum
errors. In this case one of the drives is extensively damaged and
by dd'ing over large sections of it. We then ensure zfs logs the
issue and correctly rebuilds the damage.
The next test cases use the scsi_debug configuration to injects error
at the bottom of the scsi stack. This ensures we find any flaws in the
scsi midlayer or our usage of it. Plus it stresses the device specific
retry, timeout, and error handling outside of zfs's control.
The eighth test case is to verify that the system correctly handles an
intermittent device timeout. Here the scsi_debug device drops 1 in N
requests resulting in a retry either at the block level. The ZFS code
does specify the FAILFAST option but it turns out that for this case
the Linux IO stack with still retry the command. The FAILFAST logic
located in scsi_noretry_cmd() does no seem to apply to the simply
timeout case. It appears to be more targeted to specific device or
transport errors from the lower layers.
The ninth test case handles a persistent failure in which the device
is removed from the system by Linux. The test verifies that the failure
is detected, the device is made unavailable, and then can be successfully
re-add when brought back online. Additionally, it ensures that errors
and events are logged to the correct places and the no data corruption
has occured due to the failure.
2010-09-28 23:32:12 +00:00
|
|
|
(${RMMOD} scsi_debug && exit 1)
|
|
|
|
}
|
|
|
|
|
|
|
|
zpool_destroy() {
|
|
|
|
msg ${ZPOOL} destroy ${ZPOOL_NAME}
|
|
|
|
${ZPOOL} destroy ${ZPOOL_NAME}
|
|
|
|
|
|
|
|
for FILE in ${FILES}; do
|
|
|
|
LODEVICE=`${LOSETUP} -a | grep ${FILE} | head -n1|cut -f1 -d:`
|
|
|
|
msg "Removing ${LODEVICE} using ${FILE}"
|
|
|
|
${LOSETUP} -d ${LODEVICE} ||
|
|
|
|
die "Error $? destroying ${LODEVICE} using ${FILE}"
|
|
|
|
rm -f ${FILE} || exit 1
|
|
|
|
done
|
|
|
|
|
|
|
|
msg "${RMMOD} scsi_debug"
|
|
|
|
${RMMOD} scsi_debug || die "Error $? removing scsi_debug devices"
|
|
|
|
}
|