These changes are now taken care of by the fix-stack-traverse_impl
topic branch which not only solves the uninit problem but also
moves these locals off the stack and on to the heap.
Move dsl_dataset_t local variable from the stack to the heap.
This reduces the stack usage of this function from 2048 bytes
to 176 bytes for x84_64 arches.
For all module/library functions ensure so stack frame exceeds 1024
bytes. Ideally this should be set lower to say 512 bytes but there
are still numerous functions which exceed even this limit. For now
this is set to 1024 to ensure we catch the worst offenders.
Additionally, set the limit for ztest to 1024 bytes since the idea
here is to catch stack issues in user space before we find them by
overrunning a kernel stack. This should also be reduced to 512
bytes as soon as all the trouble makes are fixed.
Finally, add -fstack-check to gcc build options when --enable-debug
is specified at configure time. This ensures that each page on the
stack will be touched and we will generate a segfault on stack
overflow.
Over time we can gradually fix the following functions:
536 zfs:dsl_deadlist_regenerate
536 zfs:dsl_load_sets
536 zfs:zil_parse
544 zfs:zfs_ioc_recv
552 zfs:dsl_deadlist_insert_bpobj
552 zfs:vdev_dtl_sync
584 zfs:copy_create_perms
608 zfs:ddt_class_contains
608 zfs:ddt_prefetch
608 zfs:__dprintf
616 zfs:ddt_lookup
648 zfs:dsl_scan_ddt
696 zfs:dsl_deadlist_merge
736 zfs:ddt_zap_walk
744 zfs:dsl_prop_get_all_impl
872 zfs:dnode_evict_dbufs
There are 3 fixes in thie commit. First, update ztest_run() to store
the thread id and not the address of the kthread_t. This will be freed
on thread exit and is not safe to use. This is pretty close to how
things were done in the original ztest code before I got there.
Second, for extra paranoia update thread_exit() to return a special
TS_MAGIC value via pthread_exit(). This value is then verified in
pthread_join() to ensure the thread exited cleanly. This can be
done cleanly because the kthread doesn't provide a return code
mechanism we need to worry about.
Third, replace the ztest deadman thread with a signal handler. We
cannot use the previous approach because the correct behavior for
pthreads is to wait for all threads to exit before terminating the
process. Since the deadman thread won't call exit by design we
end up hanging in kernel_exit(). To avoid this we just setup a
SIGALRM signal handle and register a deadman alarm. IMHO this
is simpler and cleaner anyway.
Much to my surprise bcopy() under Linux appears to copy the data in
word sized chunks. It does the right thing but if you buffer is not
a multiple of the word size you will be reading past the end of your
buffer. Or at least that is what valgrind is reporting. We should
be using mempcy() anyway on Linux so replace bcopy() with memcpy()
to resolve the issue.
==305== Thread 211:
==305== Invalid read of size 8
==305== at 0x3BCD28357D: _wordcopy_fwd_dest_aligned (in /lib64/libc-2.11.1.so)
==305== by 0x3BCD282B05: bcopy (in /lib64/libc-2.11.1.so)
==305== by 0x58D7FEF: dmu_write (dmu.c:730)
==305== by 0x591C942: spa_history_write (spa_history.c:165)
==305== by 0x591D255: spa_history_log_sync (spa_history.c:277)
==305== by 0x591D545: log_internal (spa_history.c:450)
==305== by 0x591D5EC: spa_history_log_internal (spa_history.c:475)
==305== by 0x5902319: dsl_prop_set_sync (dsl_prop.c:707)
==305== by 0x5906A7D: dsl_sync_task_group_sync (dsl_synctask.c:199)
==305== by 0x58FF4EC: dsl_pool_sync (dsl_pool.c:376)
==305== by 0x591744C: spa_sync (spa.c:5365)
==305== by 0x5922C85: txg_sync_thread (txg.c:414)
Accidentally dropped the zeroing of this structure in the
gcc-missing-braces topic branch which was causing a fall positive
space leak in ztest. Ensure the structure is zero'ed before use.
This may not strictly be needed but it does keep gcc happy. We
should keep our eye on this though if the extra bcopy significantly
impacts performance. It may.
The following cleanup was missed in the first pass when the ZVOL
implementation was updated. An extra instance of a zvol_state_t
was removed from the stack and the error handling was simplified.
There was previous discussion of a race with joinable threads but to
be honest I can neither exactly remember the race, or recrease the
issue. I believe it may have had to do with pthread_create() returning
without having set kt->tid since this was done in the created thread.
If that was the race then I've 'fixed' it by ensuring the thread id
is set in the thread AND as the first pthread_create() argument. Why
this wasn't done originally I'm not sure, with luck Ricardo remembers.
Additionally, explicitly set a PAGESIZE guard frame at the end of the
stack to aid in detecting stack overflow. And add some conditional
logic to set STACK_SIZE correctly for Solaris.
There are cases where under Linux it is not safe to sleep in
taskq_dispatch(). Rather than adding Linux specific code to
detect these cases I opted to keep it simple and just never
allow a sleep here. The impact of this should be minimal.
I missed a instanse of removing the & operator when reducing the
stack usage in this function. This unfortunately doesn't cause
a compile warning but it is does cause ztest failures. Anyway,
update the topic branch to correct this mistake.
Certain function must never be automatically inlined by gcc because
they are stack heavy or called recursively. This patch flags all
such functions I have found as 'noinline' to prevent gcc from making
the optimization.