added ZFS events section

2019-06-06 11:32:11 -07:00 · 2019-06-06 11:32:11 -07:00 · bf6c2ffe00
parent bd1bebc2ff
commit bf6c2ffe00
1 changed files with 31 additions and 0 deletions
--- a/Troubleshooting.md
+++ b/Troubleshooting.md
@ -22,4 +22,35 @@ Log files of interest: [Generic Kernel Log](#generic-kernel-log), [ZFS Kernel Mo
 Important information: if a kernel thread is stuck, then a backtrace of the stuck thread can be in the logs.
 In some cases, the stuck thread is not logged until the deadman timer expires. See also [debug tunables](https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#debug)

+## ZFS Events
+ZFS uses an event-based messaging interface for communication of important events to
+other consumers running on the system. The ZFS Event Daemon (zed) is a userland daemon that
+listens for these events and processes them. zed is extensible so you can write shell scripts
+or other programs that subscribe to events and take action. For example, the script usually
+installed at `/etc/zfs/zed.d/all-syslog.sh` writes a formatted event message to `syslog.`
+See the man page for `zed(8)` for more information.
+
+A history of events is also available via the `zpool events` command. This history begins at
+ZFS kernel module load and includes events from any pool. These events are stored in RAM and
+limited in count to a value determined by the kernel tunable [zfs_event_len_max](https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters#zfs_zevent_len_max).
+`zed` has an internal throttling mechanism to prevent overconsumption of system resources
+processing ZFS events.
+
+More detailed information about events is observable using `zpool events -v`
+The contents of the verbose events is subject to change, based on the event and information
+available at the time of the event.
+
+Each event has a class identifier used for filtering event types. Commonly seen events are
+those related to pool management with class `sysevent.fs.zfs.*` including import, export,
+configuration updates, and `zpool history` updates.
+
+Events related to errors are reported as class `ereport.*` These can be invaluable for 
+troubleshooting. Some faults can cause multiple ereports as various layers of the software
+deal with the fault. For example, on a simple pool without parity protection, a faulty 
+disk could cause an `ereport.io` during a read from the disk that results in an 
+`erport.fs.zfs.checksum` at the pool level. These events are also reflected by the error
+counters observed in `zpool status` 
+If you see checksum or read/write errors in `zpool status` then there should be one or more
+corresponding ereports in the `zpool events` output.
+
 # DRAFT