zfs/config/toolchain-simd.m4

362 lines
8.1 KiB
Plaintext
Raw Normal View History

Support for vectorized algorithms on x86 This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4381
2016-02-29 18:42:27 +00:00
dnl #
dnl # Checks if host toolchain supports SIMD instructions
dnl #
AC_DEFUN([ZFS_AC_CONFIG_ALWAYS_TOOLCHAIN_SIMD], [
case "$host_cpu" in
x86_64 | x86 | i686)
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512F
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512CD
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512DQ
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512BW
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512IFMA
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VBMI
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512PF
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512ER
ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VL
Support for vectorized algorithms on x86 This is initial support for x86 vectorized implementations of ZFS parity and checksum algorithms. For the compilation phase, configure step checks if toolchain supports relevant instruction sets. Each implementation must ensure that the code is not passed to compiler if relevant instruction set is not supported. For this purpose, following new defines are provided if instruction set is supported: - HAVE_SSE, - HAVE_SSE2, - HAVE_SSE3, - HAVE_SSSE3, - HAVE_SSE4_1, - HAVE_SSE4_2, - HAVE_AVX, - HAVE_AVX2. For detecting if an instruction set can be used in runtime, following functions are provided in (include/linux/simd_x86.h): - zfs_sse_available() - zfs_sse2_available() - zfs_sse3_available() - zfs_ssse3_available() - zfs_sse4_1_available() - zfs_sse4_2_available() - zfs_avx_available() - zfs_avx2_available() - zfs_bmi1_available() - zfs_bmi2_available() These function should be called once, on module load, or initialization. They are safe to use from user and kernel space. If an implementation is using more than single instruction set, both compiler and runtime support for all relevant instruction sets should be checked. Kernel fpu methods: - kfpu_begin() - kfpu_end() Use __get_cpuid_max and __cpuid_count from <cpuid.h> Both gcc and clang have support for these. They also handle ebx register in case it is used for PIC code. Signed-off-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Chunwei Chen <tuxoko@gmail.com> Closes #4381
2016-02-29 18:42:27 +00:00
;;
esac
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE], [
AC_MSG_CHECKING([whether host toolchain supports SSE])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("xorps %xmm0, %xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE], 1, [Define if host toolchain supports SSE])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE2], [
AC_MSG_CHECKING([whether host toolchain supports SSE2])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pxor %xmm0, %xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE2], 1, [Define if host toolchain supports SSE2])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE3], [
AC_MSG_CHECKING([whether host toolchain supports SSE3])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
char v[16];
__asm__ __volatile__("lddqu %0,%%xmm0" :: "m"(v[0]));
}
]])], [
AC_DEFINE([HAVE_SSE3], 1, [Define if host toolchain supports SSE3])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSSE3], [
AC_MSG_CHECKING([whether host toolchain supports SSSE3])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pshufb %xmm0,%xmm1");
}
]])], [
AC_DEFINE([HAVE_SSSE3], 1, [Define if host toolchain supports SSSE3])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_1], [
AC_MSG_CHECKING([whether host toolchain supports SSE4.1])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pmaxsb %xmm0,%xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE4_1], 1, [Define if host toolchain supports SSE4.1])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_SSE4_2], [
AC_MSG_CHECKING([whether host toolchain supports SSE4.2])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
__asm__ __volatile__("pcmpgtq %xmm0, %xmm1");
}
]])], [
AC_DEFINE([HAVE_SSE4_2], 1, [Define if host toolchain supports SSE4.2])
AC_MSG_RESULT([yes])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX], [
AC_MSG_CHECKING([whether host toolchain supports AVX])
AC_LINK_IFELSE([AC_LANG_SOURCE([[
void main()
{
char v[32];
__asm__ __volatile__("vmovdqa %0,%%ymm0" :: "m"(v[0]));
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX], 1, [Define if host toolchain supports AVX])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX2], [
AC_MSG_CHECKING([whether host toolchain supports AVX2])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpshufb %ymm0,%ymm1,%ymm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX2], 1, [Define if host toolchain supports AVX2])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512F
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512F], [
AC_MSG_CHECKING([whether host toolchain supports AVX512F])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpandd %zmm0,%zmm1,%zmm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512F], 1, [Define if host toolchain supports AVX512F])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512CD
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512CD], [
AC_MSG_CHECKING([whether host toolchain supports AVX512CD])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vplzcntd %zmm0,%zmm1");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512CD], 1, [Define if host toolchain supports AVX512CD])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512DQ
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512DQ], [
AC_MSG_CHECKING([whether host toolchain supports AVX512DQ])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vandpd %zmm0,%zmm1,%zmm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512DQ], 1, [Define if host toolchain supports AVX512DQ])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512BW
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512BW], [
AC_MSG_CHECKING([whether host toolchain supports AVX512BW])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpshufb %zmm0,%zmm1,%zmm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512BW], 1, [Define if host toolchain supports AVX512BW])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512IFMA
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512IFMA], [
AC_MSG_CHECKING([whether host toolchain supports AVX512IFMA])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpmadd52luq %zmm0,%zmm1,%zmm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512IFMA], 1, [Define if host toolchain supports AVX512IFMA])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VBMI
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VBMI], [
AC_MSG_CHECKING([whether host toolchain supports AVX512VBMI])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpermb %zmm0,%zmm1,%zmm2");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512VBMI], 1, [Define if host toolchain supports AVX512VBMI])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512PF
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512PF], [
AC_MSG_CHECKING([whether host toolchain supports AVX512PF])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vgatherpf0dps (%rsi,%zmm0,4){%k1}");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512PF], 1, [Define if host toolchain supports AVX512PF])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512ER
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512ER], [
AC_MSG_CHECKING([whether host toolchain supports AVX512ER])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vexp2pd %zmm0,%zmm1");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512ER], 1, [Define if host toolchain supports AVX512ER])
], [
AC_MSG_RESULT([no])
])
])
dnl #
dnl # ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VL
dnl #
AC_DEFUN([ZFS_AC_CONFIG_TOOLCHAIN_CAN_BUILD_AVX512VL], [
AC_MSG_CHECKING([whether host toolchain supports AVX512VL])
AC_LINK_IFELSE([AC_LANG_SOURCE([
[
void main()
{
__asm__ __volatile__("vpabsq %zmm0,%zmm1");
}
]])], [
AC_MSG_RESULT([yes])
AC_DEFINE([HAVE_AVX512VL], 1, [Define if host toolchain supports AVX512VL])
], [
AC_MSG_RESULT([no])
])
])