Update dRAID pool creation syntax.
parent
1fb5f10f2b
commit
01d0da995b
136
dRAID-HOWTO.md
136
dRAID-HOWTO.md
|
@ -24,127 +24,72 @@ The dRAID vdev must shuffle its child drives in a way that regardless of which d
|
||||||
|
|
||||||
Parity declustering (the fancy term for shuffling drives) has been an active research topic, and many papers have been published in this area. The [Permutation Development Data Layout](http://www.cse.scu.edu/~tschwarz/TechReports/hpca.pdf) is a good paper to begin. The dRAID vdev driver uses a shuffling algorithm loosely based on the mechanism described in this paper.
|
Parity declustering (the fancy term for shuffling drives) has been an active research topic, and many papers have been published in this area. The [Permutation Development Data Layout](http://www.cse.scu.edu/~tschwarz/TechReports/hpca.pdf) is a good paper to begin. The dRAID vdev driver uses a shuffling algorithm loosely based on the mechanism described in this paper.
|
||||||
|
|
||||||
# Use dRAID
|
# Using dRAID
|
||||||
|
|
||||||
First get the code [here](https://github.com/zfsonlinux/zfs/pull/9558), build zfs with _configure --enable-debug_, and install. Then load the zfs kernel module with the following options:
|
First get the code [here](https://github.com/openzfs/zfs/pull/10102), build zfs with _configure --enable-debug_, and install. Then load the zfs kernel module with the following options which help dRAID rebuild performance.
|
||||||
* zfs_vdev_scrub_max_active=10 zfs_vdev_async_write_min_active=4: These options help dRAID rebuild performance.
|
|
||||||
* draid_debug_lvl=5: This option controls the verbosity level of dRAID debug traces, which is very useful for troubleshooting.
|
|
||||||
|
|
||||||
Again, very important to _configure_ both spl and zfs with _--enable-debug_.
|
* zfs_vdev_scrub_max_active=10
|
||||||
|
* zfs_vdev_async_write_min_active=4
|
||||||
|
|
||||||
## Create a dRAID vdev
|
## Create a dRAID vdev
|
||||||
|
|
||||||
Unlike a raidz vdev, before a dRAID vdev can be created, a configuration file must be created with the _draidcfg_ command:
|
Similar to raidz vdev a dRAID vdev can be created using the `zpool create` command:
|
||||||
|
|
||||||
```
|
```
|
||||||
# draidcfg -p 1 -d 4 -s 2 -n 17 17.nvl
|
# zpool create <pool> draid[1,2,3][ <vdevs...>
|
||||||
Not enough entropy at /dev/random: read -1, wanted 8.
|
|
||||||
Using /dev/urandom instead.
|
|
||||||
Worst ( 3 x 5 + 2) x 544: 0.882
|
|
||||||
Seed chosen: f0cbfeccac3071b0
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The command in the example above creates a configuration for a 17-drive dRAID1 vdev with 4 data blocks per strip and 2 distributed spares, and saves it to file _17.nvl_. Options:
|
Unlike raidz, additional options may be provided as part of the `draid` vdev type to specify an exact dRAID layout. When unspecific reasonable defaults will be chosen.
|
||||||
* p: parity level, can be 1, 2, or 3.
|
|
||||||
* d: # data blocks per stripe.
|
|
||||||
* s: # distributed spare
|
|
||||||
* n: total # of drives
|
|
||||||
* It's required that: (n - s) % (p + d) == 0
|
|
||||||
|
|
||||||
Note that:
|
|
||||||
* Errors like "Not enough entropy at /dev/random" are harmless
|
|
||||||
* In the future, the _draidcfg_ may get integrated into _zpool create_ so there'd be no separate step for configuration generation.
|
|
||||||
|
|
||||||
The configuration file is binary, to examine the contents:
|
|
||||||
```
|
```
|
||||||
# draidcfg -r 17.nvl
|
# zpool create <pool> draid[1,2,3][:<groups>g][:<spares>s][:<data>d][:<iterations>] <vdevs...>
|
||||||
dRAID1 vdev of 17 child drives: 3 x (4 data + 1 parity) and 2 distributed spare
|
|
||||||
Using 32 base permutations
|
|
||||||
1,12,13, 5,15,11, 2, 6, 4,16, 9, 7,14,10, 3, 0, 8,
|
|
||||||
0, 1, 5,10, 8, 6,15, 4, 7,14, 2,13,12, 3,11,16, 9,
|
|
||||||
1, 7,11,13,14,16, 4,12, 0,15, 9, 2,10, 3, 6, 5, 8,
|
|
||||||
5,16, 3,15,10, 0,13,11,12, 8, 2, 9, 6, 4, 7, 1,14,
|
|
||||||
9,15, 6, 8,12,11, 7, 1, 3, 0,13, 5,16,14, 4,10, 2,
|
|
||||||
10, 1, 5,11, 3, 6,15, 2,12,13, 9, 4,16,14, 0, 7, 8,
|
|
||||||
10,16,12, 7, 1, 3, 9,14, 5,15, 4,11, 2, 0,13, 8, 6,
|
|
||||||
7,12, 4,13, 6,11, 9,15,14, 2,16, 3, 0, 1,10, 5, 8,
|
|
||||||
10, 5, 8, 2, 1,11,16,15,12, 3,13, 4, 0, 7, 9, 6,14,
|
|
||||||
1, 6,15, 0,14, 5, 9,11, 8,16,10, 2,13,12, 3, 4, 7,
|
|
||||||
14, 4, 2, 0,12, 7, 3, 6, 8,13,10, 1,11,16,15, 9, 5,
|
|
||||||
6,14, 8,10, 1, 0,15, 4, 5, 3,16,13, 9,12, 2, 7,11,
|
|
||||||
13, 5, 8,14, 1,10,16,11,15, 7, 0,12, 2, 9, 4, 6, 3,
|
|
||||||
9, 6, 3, 7,15, 1, 4, 8,14, 5, 0, 2,16,10,12,11,13,
|
|
||||||
12, 0, 6, 7, 1, 9,14, 8,11,16, 4, 2,13,15, 3, 5,10,
|
|
||||||
14, 6,12,10,15,13, 7, 0, 3,16, 5, 9, 2, 8, 4,11, 1,
|
|
||||||
15,16, 8,13, 6, 4, 7,11, 1, 2,14,12,10, 5, 9, 3, 0,
|
|
||||||
0,11,10,14,12, 1,16, 3,13, 9, 5, 7, 2, 4, 6,15, 8,
|
|
||||||
2,10,12, 4, 3, 5,15, 1,11, 0, 7,13, 6, 9,14, 8,16,
|
|
||||||
11, 8,16,12, 6,13,10, 9, 2, 7, 3, 4, 5, 0,14,15, 1,
|
|
||||||
4,16,12,15,14, 3, 7, 1, 9,10, 6, 8,11, 0,13, 2, 5,
|
|
||||||
5,16,13,11, 4, 6, 7,12, 0, 9,15, 1,14, 3, 8,10, 2,
|
|
||||||
12, 6, 7, 0,10,15, 8, 2,16,14,11, 1, 4, 5, 9,13, 3,
|
|
||||||
8, 4, 1,13, 6, 5, 0,15, 7, 3,11,14,16, 9,10,12, 2,
|
|
||||||
16,14,15, 2,10,11, 6,13, 4, 9, 8, 0, 5,12, 3, 1, 7,
|
|
||||||
9, 6, 8, 3,12,14,16,13,11,10, 4, 5, 7,15, 2, 0, 1,
|
|
||||||
3, 9,15, 0, 7, 1, 8,11,12, 2,10, 6,13,16, 5,14, 4,
|
|
||||||
0,14, 6,16, 1,10, 9,15,12, 8,11, 3, 2, 7,13, 5, 4,
|
|
||||||
12,13, 9, 5,11, 6, 3, 4,14,10, 1, 7, 8, 2, 0,16,15,
|
|
||||||
16, 9, 0, 2, 3,10, 1,11, 6, 4,13,12,14, 7, 5,15, 8,
|
|
||||||
16, 9, 6, 0, 1, 4,11,14,12, 3, 2,15,13,10, 5, 8, 7,
|
|
||||||
7, 8,11,14,10, 6,15,13, 1, 4,16, 9, 2, 3, 0,12, 5,
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Now a dRAID vdev can be created using the configuration. The only difference from a normal _zpool create_ is the addition of a configuration file in the vdev specification:
|
* groups - Number of redundancy groups (default: 1 group per 12 vdevs)
|
||||||
```
|
* spares - Number of distributed hot spares (default: 1)
|
||||||
# zpool create -f tank draid1 cfg=17.nvl sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds sdt
|
* data - Number of data devices per group (default: determined by number of groups)
|
||||||
```
|
* iterations - Number of iterations to perform generating a valid dRAID mapping (default 3).
|
||||||
Note that:
|
|
||||||
* The total number of drives must equal the _-n_ option of _draidcfg_.
|
|
||||||
* The parity level must match the _-p_ option, e.g. use draid3 for _draidcfg -p 3_
|
|
||||||
|
|
||||||
When the numbers don't match, _zpool create_ will fail but with a generic error message, which can be confusing.
|
_Notes_:
|
||||||
|
* The default values are not set in stone and may change.
|
||||||
|
* For the majority of common configurations we intend to provide pre-computed balanced dRAID mappings.
|
||||||
|
* When _data_ is specified then: (draid_children - spares) % (parity + data) == 0, otherwise the pool creation will fail.
|
||||||
|
|
||||||
Now the dRAID vdev is online and ready for IO:
|
Now the dRAID vdev is online and ready for IO:
|
||||||
|
|
||||||
```
|
```
|
||||||
# zpool status
|
|
||||||
pool: tank
|
pool: tank
|
||||||
state: ONLINE
|
state: ONLINE
|
||||||
config:
|
config:
|
||||||
|
|
||||||
NAME STATE READ WRITE CKSUM
|
NAME STATE READ WRITE CKSUM
|
||||||
tank ONLINE 0 0 0
|
tank ONLINE 0 0 0
|
||||||
draid1-0 ONLINE 0 0 0
|
draid2:4g:2s-0 ONLINE 0 0 0
|
||||||
sdd ONLINE 0 0 0
|
L0 ONLINE 0 0 0
|
||||||
sde ONLINE 0 0 0
|
L1 ONLINE 0 0 0
|
||||||
sdf ONLINE 0 0 0
|
L2 ONLINE 0 0 0
|
||||||
sdg ONLINE 0 0 0
|
L3 ONLINE 0 0 0
|
||||||
sdh ONLINE 0 0 0
|
...
|
||||||
sdu ONLINE 0 0 0
|
L50 ONLINE 0 0 0
|
||||||
sdj ONLINE 0 0 0
|
L51 ONLINE 0 0 0
|
||||||
sdv ONLINE 0 0 0
|
L52 ONLINE 0 0 0
|
||||||
sdl ONLINE 0 0 0
|
spares
|
||||||
sdm ONLINE 0 0 0
|
s0-draid2:4g:2s-0 AVAIL
|
||||||
sdn ONLINE 0 0 0
|
s1-draid2:4g:2s-0 AVAIL
|
||||||
sdo ONLINE 0 0 0
|
|
||||||
sdp ONLINE 0 0 0
|
errors: No known data errors
|
||||||
sdq ONLINE 0 0 0
|
|
||||||
sdr ONLINE 0 0 0
|
|
||||||
sds ONLINE 0 0 0
|
|
||||||
sdt ONLINE 0 0 0
|
|
||||||
spares
|
|
||||||
%draid1-0-s0 AVAIL
|
|
||||||
%draid1-0-s1 AVAIL
|
|
||||||
```
|
```
|
||||||
|
|
||||||
There are two logical spare vdevs shown above at the bottom:
|
There are two logical hot spare vdevs shown above at the bottom:
|
||||||
* The names begin with a '%' followed by the name of the parent dRAID vdev.
|
* The names begin with a `s<id>-` followed by the name of the parent dRAID vdev.
|
||||||
* These spare are logical, made from reserved blocks on all the 17 child drives of the dRAID vdev.
|
* These hot spares are logical, made from reserved blocks on all the 53 child drives of the dRAID vdev.
|
||||||
* Unlike traditional hot spares, the distributed spare can only replace a drive in its parent dRAID vdev.
|
* Unlike traditional hot spares, the distributed spare can only replace a drive in its parent dRAID vdev.
|
||||||
|
|
||||||
The dRAID vdev behaves just like a raidz vdev of the same parity level. You can do IO to/from it, scrub it, fail a child drive and it'd operate in degraded mode.
|
The dRAID vdev behaves just like a raidz vdev of the same parity level. You can do IO to/from it, scrub it, fail a child drive and it'd operate in degraded mode.
|
||||||
|
|
||||||
## Rebuild to distributed spare
|
## Rebuild to distributed spare
|
||||||
|
|
||||||
When there's a bad/offlined/failed child drive, the dRAID vdev supports a completely new mechanism to reconstruct lost data/parity, in addition to the resilver. First of all, resilver is still supported - if a failed drive is replaced by another physical drive, the resilver process is used to reconstruct lost data/parity to the new replacement drive, which is the same as a resilver in a raidz vdev.
|
When there's a failed/offline child drive, the dRAID vdev supports a completely new mechanism to reconstruct lost data/parity, in addition to the resilver. First of all, resilver is still supported - if a failed drive is replaced by another physical drive, the resilver process is used to reconstruct lost data/parity to the new replacement drive, which is the same as a resilver in a raidz vdev.
|
||||||
|
|
||||||
But if a child drive is replaced with a distributed spare, a new process called rebuild is used instead of resilver:
|
But if a child drive is replaced with a distributed spare, a new process called rebuild is used instead of resilver:
|
||||||
```
|
```
|
||||||
|
@ -341,7 +286,4 @@ The dRAID1 vdev in this example shuffles three (4 data + 1 parity) redundancy gr
|
||||||
|
|
||||||
# Troubleshooting
|
# Troubleshooting
|
||||||
|
|
||||||
Please report bugs to [the dRAID PR](https://github.com/zfsonlinux/zfs/pull/9558), as long as the code is not merged upstream. The following information would be useful:
|
Please report bugs to [the dRAID PR](https://github.com/zfsonlinux/zfs/pull/10102), as long as the code is not merged upstream.
|
||||||
* dRAID configuration, i.e. the *.nvl file created by _draidcfg_ command.
|
|
||||||
* Output of _zpool events -v_
|
|
||||||
* dRAID debug traces, which by default goes to _dmesg_ via _printk()_. The dRAID debugging traces can also use _trace_printk()_, which is more preferable but unfortunately GPL only.
|
|
Loading…
Reference in New Issue