The fletcher_4_native() and fletcher_4_byteswap() functions may only
safely use the vectorized implementations when the buffer is 128-bit
aligned. This is because both the AVX2 and SSE implementations process
four 32-bit words per iterations. Fallback to the scalar implementation
which only processes a single 32-bit word for unaligned buffers.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Issue #4330