Update dRAID pool creation syntax.

2020-03-04 15:48:56 -08:00 · 2020-03-04 15:48:56 -08:00 · 01d0da995b
parent 1fb5f10f2b
commit 01d0da995b
1 changed files with 39 additions and 97 deletions
--- a/dRAID-HOWTO.md
+++ b/dRAID-HOWTO.md
@ -24,127 +24,72 @@ The dRAID vdev must shuffle its child drives in a way that regardless of which d

 Parity declustering (the fancy term for shuffling drives) has been an active research topic, and many papers have been published in this area. The [Permutation Development Data Layout](http://www.cse.scu.edu/~tschwarz/TechReports/hpca.pdf) is a good paper to begin. The dRAID vdev driver uses a shuffling algorithm loosely based on the mechanism described in this paper.

-# Use dRAID
+# Using dRAID

-First get the code [here](https://github.com/zfsonlinux/zfs/pull/9558), build zfs with _configure --enable-debug_, and install. Then load the zfs kernel module with the following options:
-* zfs_vdev_scrub_max_active=10 zfs_vdev_async_write_min_active=4: These options help dRAID rebuild performance.
-* draid_debug_lvl=5: This option controls the verbosity level of dRAID debug traces, which is very useful for troubleshooting.
+First get the code [here](https://github.com/openzfs/zfs/pull/10102), build zfs with _configure --enable-debug_, and install. Then load the zfs kernel module with the following options which help dRAID rebuild performance.

-Again, very important to _configure_ both spl and zfs with _--enable-debug_.
+* zfs_vdev_scrub_max_active=10
+* zfs_vdev_async_write_min_active=4

 ## Create a dRAID vdev

-Unlike a raidz vdev, before a dRAID vdev can be created, a configuration file must be created with the _draidcfg_ command:
+Similar to raidz vdev a dRAID vdev can be created using the `zpool create` command:
+
 ```
-# draidcfg -p 1 -d 4 -s 2 -n 17 17.nvl
-Not enough entropy at /dev/random: read -1, wanted 8.
-Using /dev/urandom instead.
- Worst ( 3 x  5 +  2) x   544: 0.882
-Seed chosen: f0cbfeccac3071b0
+# zpool create <pool> draid[1,2,3][ <vdevs...>
 ```

-The command in the example above creates a configuration for a 17-drive dRAID1 vdev with 4 data blocks per strip and 2 distributed spares, and saves it to file _17.nvl_. Options:
-* p: parity level, can be 1, 2, or 3.
-* d: # data blocks per stripe.
-* s: # distributed spare
-* n: total # of drives
-* It's required that: (n - s) % (p + d) == 0
+Unlike raidz, additional options may be provided as part of the `draid` vdev type to specify an exact dRAID layout.  When unspecific reasonable defaults will be chosen.

-Note that:
-* Errors like "Not enough entropy at /dev/random" are harmless
-* In the future, the _draidcfg_ may get integrated into _zpool create_ so there'd be no separate step for configuration generation.
-
-The configuration file is binary, to examine the contents:
 ```
-# draidcfg -r 17.nvl 
-dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
-Using 32 base permutations
-   1,12,13, 5,15,11, 2, 6, 4,16, 9, 7,14,10, 3, 0, 8,
-   0, 1, 5,10, 8, 6,15, 4, 7,14, 2,13,12, 3,11,16, 9,
-   1, 7,11,13,14,16, 4,12, 0,15, 9, 2,10, 3, 6, 5, 8,
-   5,16, 3,15,10, 0,13,11,12, 8, 2, 9, 6, 4, 7, 1,14,
-   9,15, 6, 8,12,11, 7, 1, 3, 0,13, 5,16,14, 4,10, 2,
-  10, 1, 5,11, 3, 6,15, 2,12,13, 9, 4,16,14, 0, 7, 8,
-  10,16,12, 7, 1, 3, 9,14, 5,15, 4,11, 2, 0,13, 8, 6,
-   7,12, 4,13, 6,11, 9,15,14, 2,16, 3, 0, 1,10, 5, 8,
-  10, 5, 8, 2, 1,11,16,15,12, 3,13, 4, 0, 7, 9, 6,14,
-   1, 6,15, 0,14, 5, 9,11, 8,16,10, 2,13,12, 3, 4, 7,
-  14, 4, 2, 0,12, 7, 3, 6, 8,13,10, 1,11,16,15, 9, 5,
-   6,14, 8,10, 1, 0,15, 4, 5, 3,16,13, 9,12, 2, 7,11,
-  13, 5, 8,14, 1,10,16,11,15, 7, 0,12, 2, 9, 4, 6, 3,
-   9, 6, 3, 7,15, 1, 4, 8,14, 5, 0, 2,16,10,12,11,13,
-  12, 0, 6, 7, 1, 9,14, 8,11,16, 4, 2,13,15, 3, 5,10,
-  14, 6,12,10,15,13, 7, 0, 3,16, 5, 9, 2, 8, 4,11, 1,
-  15,16, 8,13, 6, 4, 7,11, 1, 2,14,12,10, 5, 9, 3, 0,
-   0,11,10,14,12, 1,16, 3,13, 9, 5, 7, 2, 4, 6,15, 8,
-   2,10,12, 4, 3, 5,15, 1,11, 0, 7,13, 6, 9,14, 8,16,
-  11, 8,16,12, 6,13,10, 9, 2, 7, 3, 4, 5, 0,14,15, 1,
-   4,16,12,15,14, 3, 7, 1, 9,10, 6, 8,11, 0,13, 2, 5,
-   5,16,13,11, 4, 6, 7,12, 0, 9,15, 1,14, 3, 8,10, 2,
-  12, 6, 7, 0,10,15, 8, 2,16,14,11, 1, 4, 5, 9,13, 3,
-   8, 4, 1,13, 6, 5, 0,15, 7, 3,11,14,16, 9,10,12, 2,
-  16,14,15, 2,10,11, 6,13, 4, 9, 8, 0, 5,12, 3, 1, 7,
-   9, 6, 8, 3,12,14,16,13,11,10, 4, 5, 7,15, 2, 0, 1,
-   3, 9,15, 0, 7, 1, 8,11,12, 2,10, 6,13,16, 5,14, 4,
-   0,14, 6,16, 1,10, 9,15,12, 8,11, 3, 2, 7,13, 5, 4,
-  12,13, 9, 5,11, 6, 3, 4,14,10, 1, 7, 8, 2, 0,16,15,
-  16, 9, 0, 2, 3,10, 1,11, 6, 4,13,12,14, 7, 5,15, 8,
-  16, 9, 6, 0, 1, 4,11,14,12, 3, 2,15,13,10, 5, 8, 7,
-   7, 8,11,14,10, 6,15,13, 1, 4,16, 9, 2, 3, 0,12, 5,
+# zpool create <pool> draid[1,2,3][:<groups>g][:<spares>s][:<data>d][:<iterations>] <vdevs...>
 ```

-Now a dRAID vdev can be created using the configuration. The only difference from a normal _zpool create_ is the addition of a configuration file in the vdev specification:
-```
-# zpool create -f tank draid1 cfg=17.nvl sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds sdt
-```
-Note that:
-* The total number of drives must equal the _-n_ option of _draidcfg_.
-* The parity level must match the _-p_ option, e.g. use draid3 for _draidcfg -p 3_
+ * groups - Number of redundancy groups (default: 1 group per 12 vdevs)
+ * spares - Number of distributed hot spares (default: 1)
+ * data - Number of data devices per group (default: determined by number of groups)
+ * iterations - Number of iterations to perform generating a valid dRAID mapping (default 3).

-When the numbers don't match, _zpool create_ will fail but with a generic error message, which can be confusing.
+_Notes_:
+* The default values are not set in stone and may change.
+* For the majority of common configurations we intend to provide pre-computed balanced dRAID mappings.
+* When _data_ is specified then: (draid_children - spares) % (parity + data) == 0, otherwise the pool creation will fail.

 Now the dRAID vdev is online and ready for IO:
+
 ```
-# zpool status
  pool: tank
 state: ONLINE
 config:

-        NAME            STATE     READ WRITE CKSUM
-        tank            ONLINE       0     0     0
-          draid1-0      ONLINE       0     0     0
-            sdd         ONLINE       0     0     0
-            sde         ONLINE       0     0     0
-            sdf         ONLINE       0     0     0
-            sdg         ONLINE       0     0     0
-            sdh         ONLINE       0     0     0
-            sdu         ONLINE       0     0     0
-            sdj         ONLINE       0     0     0
-            sdv         ONLINE       0     0     0
-            sdl         ONLINE       0     0     0
-            sdm         ONLINE       0     0     0
-            sdn         ONLINE       0     0     0
-            sdo         ONLINE       0     0     0
-            sdp         ONLINE       0     0     0
-            sdq         ONLINE       0     0     0
-            sdr         ONLINE       0     0     0
-            sds         ONLINE       0     0     0
-            sdt         ONLINE       0     0     0
-        spares
-          %draid1-0-s0  AVAIL   
-          %draid1-0-s1  AVAIL
+	NAME                 STATE     READ WRITE CKSUM
+	tank                 ONLINE       0     0     0
+	  draid2:4g:2s-0     ONLINE       0     0     0
+	    L0               ONLINE       0     0     0
+	    L1               ONLINE       0     0     0
+	    L2               ONLINE       0     0     0
+	    L3               ONLINE       0     0     0
+	    ...
+	    L50              ONLINE       0     0     0
+	    L51              ONLINE       0     0     0
+	    L52              ONLINE       0     0     0
+	spares
+	  s0-draid2:4g:2s-0  AVAIL   
+	  s1-draid2:4g:2s-0  AVAIL   
+
+errors: No known data errors
 ```

-There are two logical spare vdevs shown above at the bottom:
-* The names begin with a '%' followed by the name of the parent dRAID vdev.
-* These spare are logical, made from reserved blocks on all the 17 child drives of the dRAID vdev.
+There are two logical hot spare vdevs shown above at the bottom:
+* The names begin with a `s<id>-` followed by the name of the parent dRAID vdev.
+* These hot spares are logical, made from reserved blocks on all the 53 child drives of the dRAID vdev.
 * Unlike traditional hot spares, the distributed spare can only replace a drive in its parent dRAID vdev.

 The dRAID vdev behaves just like a raidz vdev of the same parity level. You can do IO to/from it, scrub it, fail a child drive and it'd operate in degraded mode.

 ## Rebuild to distributed spare

-When there's a bad/offlined/failed child drive, the dRAID vdev supports a completely new mechanism to reconstruct lost data/parity, in addition to the resilver. First of all, resilver is still supported - if a failed drive is replaced by another physical drive, the resilver process is used to reconstruct lost data/parity to the new replacement drive, which is the same as a resilver in a raidz vdev.
+When there's a failed/offline child drive, the dRAID vdev supports a completely new mechanism to reconstruct lost data/parity, in addition to the resilver. First of all, resilver is still supported - if a failed drive is replaced by another physical drive, the resilver process is used to reconstruct lost data/parity to the new replacement drive, which is the same as a resilver in a raidz vdev.

 But if a child drive is replaced with a distributed spare, a new process called rebuild is used instead of resilver:
 ```
@ -341,7 +286,4 @@ The dRAID1 vdev in this example shuffles three (4 data + 1 parity) redundancy gr

 # Troubleshooting

-Please report bugs to [the dRAID PR](https://github.com/zfsonlinux/zfs/pull/9558), as long as the code is not merged upstream. The following information would be useful:
-* dRAID configuration, i.e. the *.nvl file created by _draidcfg_ command.
-* Output of _zpool events -v_
-* dRAID debug traces, which by default goes to _dmesg_ via _printk()_. The dRAID debugging traces can also use _trace_printk()_, which is more preferable but unfortunately GPL only.
+Please report bugs to [the dRAID PR](https://github.com/zfsonlinux/zfs/pull/10102), as long as the code is not merged upstream.