These changes are now taken care of by the fix-stack-traverse_impl
topic branch which not only solves the uninit problem but also
moves these locals off the stack and on to the heap.
Move dsl_dataset_t local variable from the stack to the heap.
This reduces the stack usage of this function from 2048 bytes
to 176 bytes for x84_64 arches.
Much to my surprise bcopy() under Linux appears to copy the data in
word sized chunks. It does the right thing but if you buffer is not
a multiple of the word size you will be reading past the end of your
buffer. Or at least that is what valgrind is reporting. We should
be using mempcy() anyway on Linux so replace bcopy() with memcpy()
to resolve the issue.
==305== Thread 211:
==305== Invalid read of size 8
==305== at 0x3BCD28357D: _wordcopy_fwd_dest_aligned (in /lib64/libc-2.11.1.so)
==305== by 0x3BCD282B05: bcopy (in /lib64/libc-2.11.1.so)
==305== by 0x58D7FEF: dmu_write (dmu.c:730)
==305== by 0x591C942: spa_history_write (spa_history.c:165)
==305== by 0x591D255: spa_history_log_sync (spa_history.c:277)
==305== by 0x591D545: log_internal (spa_history.c:450)
==305== by 0x591D5EC: spa_history_log_internal (spa_history.c:475)
==305== by 0x5902319: dsl_prop_set_sync (dsl_prop.c:707)
==305== by 0x5906A7D: dsl_sync_task_group_sync (dsl_synctask.c:199)
==305== by 0x58FF4EC: dsl_pool_sync (dsl_pool.c:376)
==305== by 0x591744C: spa_sync (spa.c:5365)
==305== by 0x5922C85: txg_sync_thread (txg.c:414)
On a Linux system simply use the native aprintf and vasprintf
functions respectively. Also update the call points to correctly
use va_copy() or va_start() as appropriate.
This may not strictly be needed but it does keep gcc happy. We
should keep our eye on this though if the extra bcopy significantly
impacts performance. It may.
The following are 3 cases where move than 2 pages are allocated
with a kmem_alloc()... but not a lot more. For now we just disable
the warning with KM_NODEBUG and this can be revisted latter to
see if it's worth shrinking the allocation or perhaps moving it
to a slab.
The following cleanup was missed in the first pass when the ZVOL
implementation was updated. An extra instance of a zvol_state_t
was removed from the stack and the error handling was simplified.
There are cases where under Linux it is not safe to sleep in
taskq_dispatch(). Rather than adding Linux specific code to
detect these cases I opted to keep it simple and just never
allow a sleep here. The impact of this should be minimal.
I missed a instanse of removing the & operator when reducing the
stack usage in this function. This unfortunately doesn't cause
a compile warning but it is does cause ztest failures. Anyway,
update the topic branch to correct this mistake.
Certain function must never be automatically inlined by gcc because
they are stack heavy or called recursively. This patch flags all
such functions I have found as 'noinline' to prevent gcc from making
the optimization.
Reduce stack usage in dsl_deleg_get, gcc flagged it as consuming a
whopping 1040 bytes or potentially 1/4 of a 4K stack. This patch
moves all the large structures and buffer off the stack and on to
the heap. This includes 2 zap_cursor_t structs each 52 bytes in
size, 2 zap_attribute_t structs each 280 bytes in size, and 1
256 byte char array. The total saves on the stack is 880 bytes
after you account for the 5 new pointers added.
Also the source buffer length has been increased from MAXNAMELEN
to MAXNAMELEN+strlen(MOS_DIR_NAME)+1 as described by the comment in
dsl_dir_name(). A buffer overrun may have been possible with the
slightly smaller buffer.
The upstream ZFS code has correctly moved to a faster native sha2
implementation. Unfortunately, under Linux that's going to be a little
problematic so we revert the code to the more portable version contained
in earlier ZFS releases. Using the native sha2 implementation in Linux
is possible but the API is slightly different in kernel version user
space depending on which libraries are used. Ideally, we need a fast
implementation of SHA256 which builds as part of ZFS this shouldn't be
that hard to do but it will take some effort.
Reduce kernel stack usage by lzjb_compress() by moving uint16 array
off the stack and on to the heap. The exact performance implications
of this I have not measured but we absolutely need to keep stack
usage to a minimum. If/when this becomes and issue we optimize.
Move xiou stat structures from a header to the dmu.c source as is
done with all the other kstat interfaces. This information is local
to dmu.c registered the xuio kstat and should stay that way.
If your only going to allow one allocator to be used and it is defined
at compile time there is no point including the others in the build.
This patch could/should be refined for Linux to make the metaslab
configurable at run time. That might be a bit tricky however since
you would need to quiese all IO. Short of that making it configurable
as a module load option would be a reasonable compromise.
In the linux kernel 'current' is defined to mean the current process
and can never be used as a local variable in a function. Simply
replace all usage of 'current' with 'curr' in this function.
Updated fix to detect if we are in an interrupt and only sleep if it
is safe to do some. I guess it must be safe to sleep under Solaris
this must be handled in a sort interrupt handler there
Additional minor memory related tweak to move certain large allocations
to virtual memory and in one case to simply suppress the warning message
since it is not that far over the warning limit.
Upstream they modified the ioctl code so we need to make similiar
updates since we modify the API ourselves to always pass a pointer
to file pointer around. This allows us to track per file handle
state which is used by the zevent code.
The ZVOL interfaces changed significantly with the latest update. I've
updated the Linux version of the code to handle this and it looks like
the net result has been a simpler implementation which is good! Plus,
I'm relatively sure the ZIL integration is right this time although it
needs some serious crash testing to verify that.
Also minor additions to vdev_disk for .hold and .rele callbacks.
Currently, they do nothing and I may be able to simply stub them out
with NULLs for Linux since opening the device in Linux should have
much the same effort. More investigation is needed though since
the ZFS interface may make some demands here I'm overlooking.
After such a large update many of the symbols which were previously
exported are no longer available, and several new symbols have been
added and are needed. Refresh to topic branch to reflect this.