For all module/library functions ensure so stack frame exceeds 1024
bytes. Ideally this should be set lower to say 512 bytes but there
are still numerous functions which exceed even this limit. For now
this is set to 1024 to ensure we catch the worst offenders.
Additionally, set the limit for ztest to 1024 bytes since the idea
here is to catch stack issues in user space before we find them by
overrunning a kernel stack. This should also be reduced to 512
bytes as soon as all the trouble makes are fixed.
Finally, add -fstack-check to gcc build options when --enable-debug
is specified at configure time. This ensures that each page on the
stack will be touched and we will generate a segfault on stack
overflow.
Over time we can gradually fix the following functions:
536 zfs:dsl_deadlist_regenerate
536 zfs:dsl_load_sets
536 zfs:zil_parse
544 zfs:zfs_ioc_recv
552 zfs:dsl_deadlist_insert_bpobj
552 zfs:vdev_dtl_sync
584 zfs:copy_create_perms
608 zfs:ddt_class_contains
608 zfs:ddt_prefetch
608 zfs:__dprintf
616 zfs:ddt_lookup
648 zfs:dsl_scan_ddt
696 zfs:dsl_deadlist_merge
736 zfs:ddt_zap_walk
744 zfs:dsl_prop_get_all_impl
872 zfs:dnode_evict_dbufs
There was previous discussion of a race with joinable threads but to
be honest I can neither exactly remember the race, or recrease the
issue. I believe it may have had to do with pthread_create() returning
without having set kt->tid since this was done in the created thread.
If that was the race then I've 'fixed' it by ensuring the thread id
is set in the thread AND as the first pthread_create() argument. Why
this wasn't done originally I'm not sure, with luck Ricardo remembers.
Additionally, explicitly set a PAGESIZE guard frame at the end of the
stack to aid in detecting stack overflow. And add some conditional
logic to set STACK_SIZE correctly for Solaris.
Certain function must never be automatically inlined by gcc because
they are stack heavy or called recursively. This patch flags all
such functions I have found as 'noinline' to prevent gcc from making
the optimization.
This is a portability change which removes the dependence of the Solaris
thread library. All locations where Solaris thread API was used before
have been replaced with equivilant Solaris kernel style thread calls.
In user space the kernel style threading API is implemented in term of
the portable pthreads library. This includes all threads, mutexs,
condition variables, reader/writer locks, and taskqs.
While I would rather fix all the instances where something is shadowed
it complicates tracking the OpenSolaris code where they either don't
seem to care or have different conflicts. Anyway, this ends up being
more simply gratutous change than I care for.
In vn_open(), if fstat64() returned an error, the real errno
was being obscured by calling close().
Add error handling for both pwrite64() calls in vn_rdwr().
With this patch applied I get the following failure 100% of the time,
I'd prefer to debug it and keep moving forward but I do not have the
time right now so I'm reverting the patch to the version which worked.
Ricardo please fix.
(gdb) bt
0 ztest_dmu_write_parallel (za=0x2aaaac898960) at
../../cmd/ztest/ztest.c:2566
1 0x0000000000405a79 in ztest_thread (arg=<value optimized out>)
at ../../cmd/ztest/ztest.c:3862
2 0x00002b2e6a7a841d in zk_thread_helper (arg=<value optimized out>)
at ../../lib/libzpool/kernel.c:131
3 0x000000379be06367 in start_thread (arg=<value optimized out>)
at pthread_create.c:297
4 0x000000379b2d30ad in clone () from /lib64/libc.so.6
This resolves previous scalabily concerns about the cost of calling
curthread which previously required a list walk. The kthread address
is now tracked as thread specific data which can be quickly returned.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>