zfs/sys at 0b43696e6676391e5bee8ba49e76e599bac1d89d - zfs

History

Richard Yao 0b43696e66 random_get_pseudo_bytes() need not provide cryptographic strength entropy Perf profiling of dd on a zvol revealed that my system spent 3.16% of its time in random_get_pseudo_bytes(). No SPL consumers need cryptographic strength entropy, so we can reduce our overhead by changing the implementation to utilize a fast PRNG. The Linux kernel did not export a suitable PRNG function until it exported get_random_int() in Linux 3.10. While we could implement an autotools check so that we use it when it is available or even try to access the symbol on older kernels where it is not exported using the fact that it is exported on newer ones as justification, we can instead implement our own pseudo-random data generator. For this purpose, I have written one based on a 128-bit pseudo-random number generator proposed in a paper by Sebastiano Vigna that itself was based on work by the late George Marsaglia. http://vigna.di.unimi.it/ftp/papers/xorshiftplus.pdf Profiling the same benchmark with an earlier variant of this patch that used a slightly different generator (roughly same number of instructions) by the same author showed that time spent in random_get_pseudo_bytes() dropped to 0.06%. That is a factor of 50 improvement. This particular generator algorithm is also well known to be fast: http://xorshift.di.unimi.it/#speed The benchmark numbers there state that it runs at 1.12ns/64-bits or 7.14 GBps of throughput on an Intel Core i7-4770 in what is presumably a single-threaded context. Using it in `random_get_pseudo_bytes()` in the manner I have will probably not reach that level of performance, but it should be fairly high and many times higher than the Linux `get_random_bytes()` function that we use now, which runs at 16.3 MB/s on my Intel Xeon E3-1276v3 processor when measured by using dd on /dev/urandom. Also, putting this generator's seed into per-CPU variables allows us to eliminate overhead from both spin locks and CPU memory barriers, which is NUMA friendly. We could have alternatively modified consumers to use something like `gethrtime() % 3` as suggested by both Matthew Ahrens and Tim Chase, but that has a few potential problems that this approach avoids: 1. Switching to `gethrtime() % 3` in hot code paths today requires diverging from illumos-gate and does nothing about potential future patches from illumos-gate that call our slow `random_get_pseudo_bytes()` in different hot code paths. Reimplementing `random_get_pseudo_bytes()` with a per-CPU PRNG avoids both of those things entirely, which means less work for us in the future. 2. Looking at the code that implements `gethrtime()`, I think it is unlikely to be faster than this per-CPU PRNG implementation of `random_get_pseudo_bytes()`. It would be best to go with something fast now so that there is no point in revisiting this from a performance perspective. 3. `gethrtime() % 3` can vary in behavior from system to system based on kernel version, architecture and clock source. In comparison, this per-CPU PRNG is about ~40 lines of code in `random_get_pseudo_bytes()` that should behave consistently across all systems regardless of kernel version, system architecture or machine clock source. It is unlikely that we would ever need to revisit this per-CPU PRNG while the same cannot be said for `gethrtime() % 3`. 4. `gethrtime()` uses CPU memory barriers and maybe atomic instructions depending on the clock source, so replacing `random_get_pseudo_bytes()` with `gethrtime()` in hot code paths could still require a future person working on NUMA scalability to reimplement it anyway while this per-CPU PRNG would not by virtue of using neither CPU memory barriers nor atomic instructions. Note that I did not check various clock sources for the presence of atomic instructions. There is simply too much code to read and given the drawbacks versus this per-cpu PRNG, there is no point in being certain. 5. I have heard of instances where poor quality pseudo-random numbers caused problems for HPC code in ways that took more than a year to identify and were remedied by switching to a higher quality source of pseudo-random numbers. While filesystems are different than HPC code, I do not think it is impossible for us to have instances where poor quality pseudo-random numbers can cause problems. Opting for a well studied PRNG algorithm that passes tests for statistical randomness over changing callers to use `gethrtime() % 3` bypasses the need to think about both whether poor quality pseudo-random numbers can cause problems and the statistical quality of numbers from `gethrtime() % 3`. 6. `gethrtime()` calls `getrawmonotonic()`, which uses seqlocks. This is probably not a huge issue, but anyone using kgdb would never be able to step through a seqlock critical section, which is not a problem either now or with the per-CPU PRNG: https://en.wikipedia.org/wiki/Seqlock The only downside that I can see is that this code's memory requirement is O(N) where N is NR_CPUS, versus the current code and `gethrtime() % 3`, which are O(1), but that should not be a problem. The seeds will use 64KB of memory at the high end (i.e `NR_CPU == 4096`) and 16 bytes of memory at the low end (i.e. `NR_CPU == 1`). In either case, we should only use a few hundred bytes of code for text, especially since `spl_rand_jump()` should be inlined into `spl_random_init()`, which should be removed during early boot as part of "Freeing unused kernel memory". In either case, the memory requirements are minuscule. Signed-off-by: Richard Yao <ryao@gentoo.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes #372		2016-02-17 09:49:09 -08:00
..
fm	Kernel header installation should respect --prefix	2014-10-28 09:31:48 -07:00
fs	Kernel header installation should respect --prefix	2014-10-28 09:31:48 -07:00
sysevent	Kernel header installation should respect --prefix	2014-10-28 09:31:48 -07:00
Makefile.am	Additional dkio support for TRIM/Discard	2015-12-02 13:44:35 -08:00
acl.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
acl_impl.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
atomic.h	Remove atomic64_xchg() wrappers	2014-10-17 15:11:50 -07:00
attr.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
bitmap.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
bootconf.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
bootprops.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
buf.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
byteorder.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
callb.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
callo.h	Emulate illumos interface cv_timedwait_hires()	2013-11-04 09:49:24 -08:00
cmn_err.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
compress.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
condvar.h	Rename cv_wait_interruptible() to cv_wait_sig()	2015-06-10 16:36:12 -07:00
conf.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
console.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
cpupart.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
cpuvar.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
crc32.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
cred.h	Remove credential configure checks.	2014-10-17 15:11:51 -07:00
ctype.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
ddi.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
debug.h	Add IMPLY() and EQUIV() macros	2015-06-24 14:44:47 -07:00
dirent.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
disp.h	Add kpreempt() compatibility macro	2013-10-09 13:52:55 -07:00
dkio.h	Additional dkio support for TRIM/Discard	2015-12-02 13:44:35 -08:00
dkioc_free_util.h	Additional dkio support for TRIM/Discard	2015-12-02 13:44:35 -08:00
dklabel.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
dnlc.h	Remove shrink_{i,d}node_cache() wrappers	2014-10-17 15:11:51 -07:00
dumphdr.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
efi_partition.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
errno.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
extdirent.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
fcntl.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
file.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
idmap.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
int_limits.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
int_types.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
inttypes.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
isa_defs.h	_ILP32 is always defined on SPARC	2016-01-08 11:59:38 -08:00
kidmap.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
kmem.h	Turn on both PF_FSTRANS and PF_MEMALLOC_NOIO in spl_fstrans_mark	2016-01-20 11:38:31 -08:00
kmem_cache.h	Fix CPU hotplug	2015-10-13 09:50:40 -07:00
kobj.h	kobj_read_file: Return -1 on vn_rdwr() error	2016-01-23 10:10:44 -08:00
kstat.h	3537 add kstat_waitq_enter and friends	2013-10-25 13:41:52 -07:00
list.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
mkdev.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
mntent.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
modctl.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
mode.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
mount.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
mutex.h	Add new lock types MUTEX_NOLOCKDEP, and RW_NOLOCKDEP	2015-12-11 16:18:54 -08:00
note.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
open.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
param.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
pathname.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
policy.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
pool.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
priv_impl.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
proc.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
processor.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
pset.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
random.h	random_get_pseudo_bytes() need not provide cryptographic strength entropy	2016-02-17 09:49:09 -08:00
refstr.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
resource.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
rwlock.h	Add new lock types MUTEX_NOLOCKDEP, and RW_NOLOCKDEP	2015-12-11 16:18:54 -08:00
sdt.h	Define SET_ERROR()	2013-10-09 14:20:46 -07:00
sid.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
signal.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
stat.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
stropts.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
sunddi.h	Update code to use misc_register()/misc_deregister()	2014-10-17 15:07:28 -07:00
sunldi.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
sysdc.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
sysevent.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
sysmacros.h	sysmacros: Make P2ROUNDUP not trigger int overflow	2015-11-13 15:21:52 -08:00
systeminfo.h	Simplify hostid logic	2014-04-14 09:04:41 -07:00
systm.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
t_lock.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
taskq.h	Allow kicking a taskq to spawn more threads	2016-02-05 14:08:31 -08:00
thread.h	De-inline spl_kthread_create().	2014-04-09 19:17:12 -07:00
time.h	Fix build issue on some configured kernels	2015-12-11 15:27:53 -08:00
timer.h	Add ddi_time_after and friends	2014-04-14 09:32:01 -07:00
tsd.h	Use tsd to store tq for taskq_member	2016-01-20 13:07:45 -08:00
types.h	Linux 4.5 compat: pfn_t typedef	2016-01-20 11:39:18 -08:00
types32.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
u8_textprep.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
uio.h	Restructure uio to accommodate bio_vec	2015-08-24 10:10:21 -07:00
unistd.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
user.h	Fix race between getf() and areleasef()	2015-12-03 15:44:47 -08:00
va_list.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
varargs.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
vfs.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
vfs_opreg.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
vmem.h	Use spl_fstrans_mark instead of memalloc_noio_save	2015-12-18 13:24:52 -08:00
vmsystm.h	Include other sources of freeable memory in the freemem calculation	2015-08-19 09:25:30 -07:00
vnode.h	Implement areleasef()	2015-04-24 13:02:37 -07:00
zmod.h	Refresh links to web site	2013-03-04 19:09:34 -08:00
zone.h	Add crgetzoneid() stub	2015-04-02 09:49:55 -07:00