为Aarch64进行交叉编译时出现_pywrap_tensorflow_internal. so链接器错误

u91tlkcl  于 2022-10-29  发布在  其他
关注(0)|答案(7)|浏览(323)

问题类型

构建/安装

来源

来源

tensorflow 版本

2.9.1

自定义代码

没有

操作系统平台和分发

Linux操作系统

移动的设备

  • 没有回应 *

Python版本

3.8

Bazel版本

5.2.0

GCC/编译器版本

7.5.0

CUDA/cuDNN版本

10.2/8.2.1

GPU型号和内存

NVIDIA ®泽维尔NX

当前行为?

I'm cross compiling Tensorflow v2.9.1 on my x86-64 machine for my aarch64 target device Xavier NX. My build machine and target device have matching GCC version of 7.5.0, and matching CUDA version, all installed from nvidia Jetpack.

Currently, build fails at linking `_pywrap_tensorflow_internal.so` because some of the static library files are x86-64 format rather than aarch64. Specifically the static libraries in `mlir_generated` directory, like `libnext_after_gpu_f32_f32_kernel_generator.a` and `libelu_gpu_f16_f16_kernel_generator.a`.

I build it with `--distinct_host_configuration=true` bazel option, and I expect the build script to compile those static libraries for the aarch64 target rather than my x86-64 host.

重现问题的独立代码

bazel \
--output_base=$(@D)/build \
build --verbose_failures --config=opt --config=cuda \
--define=tflite_with_xnnpack=false \
--config=noaws \
--config=nogcp \
--config=nohdfs \
--config=nonccl \
--config=monolithic \
--config=v2 \
--copt=-ftree-vectorize \
--copt=-funsafe-math-optimizations \
--copt=-ftree-loop-vectorize \
--copt=-fomit-frame-pointer \
--subcommands \
--noincompatible_do_not_split_linking_cmdline \
--crosstool_top=//third_party/toolchains/cpus/aarch64:toolchain \
--distinct_host_configuration=true \
--cpu=aarch64 \
//tensorflow/tools/pip_package:build_pip_package

### Relevant log output

```shell
ERROR: /home/dev/tf/build/tensorflow-v2.9.1/tensorflow/python/BUILD:3248:24: Linking tensorflow/python/_pywrap_tensorflow_internal.so failed: (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
  (cd /home/dev/tf/build/tensorflow-v2.9.1/build/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/home/dev/aarch64-linux-gnu/usr/local/cuda-10.2 \
    CUDNN_INSTALL_PATH=/home/dev/aarch64-linux-gnu/usr \
    GCC_HOST_COMPILER_PATH=/home/dev/aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/home/dev/python3.8 \
    PYTHON_LIB_PATH=/home/dev/python3.8/site-packages \
    TENSORRT_INSTALL_PATH=/home/dev/aarch64-linux-gnu/usr \
    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=7.2 \
    TF_CUDA_PATHS=/home/dev/aarch64-linux-gnu/usr/local/cuda-10.2 \
    TF_CUDA_VERSION=10.2 \
    TF_CUDNN_VERSION=8.2.1 \
    TF_TENSORRT_VERSION=8.0.1 \
  third_party/toolchains/cpus/aarch64/crosstool_wrapper_driver_is_not_gcc -shared -o bazel-out/aarch64-opt/bin/tensorflow/python/_pywrap_tensorflow_internal.so bazel-out/aarch64-opt/bin/tensorflow/python/_objs/_pywrap_tensorflow_internal.so/pywrap_tensorflow_internal.o -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/distributed_runtime/rpc/libgrpc_session.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/distributed_runtime/rpc/libgrpc_remote_master.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_nextafter_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libnext_after_gpu_f32_f32_kernel_generator.a -Wl,-no-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_relu_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libelu_gpu_f16_f16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libelu_gpu_f32_f32_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libelu_gpu_f64_f64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_f16_f16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_f32_f32_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_f64_f64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_i8_i8_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_i16_i16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_i64_i64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_ui8_ui8_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_ui16_ui16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_ui32_ui32_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/librelu_gpu_ui64_ui64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libselu_gpu_f16_f16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libselu_gpu_f32_f32_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libselu_gpu_f64_f64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libsoftmax_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libsoftmax_op_gpu.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libsoftplus_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libsoftplus_op_gpu.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_softplus_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libsoftplus_gpu_f16_f16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libsoftplus_gpu_f32_f32_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libsoftplus_gpu_f64_f64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libsoftsign_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libsoftsign_op_gpu.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libgpu_softsign_op.lo -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libsoftsign_gpu_f16_f16_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libsoftsign_gpu_f32_f32_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libsoftsign_gpu_f64_f64_kernel_generator.a -Wl,-no-whole-archive -Wl,-whole-archive bazel-out/aarch64-opt/bin/tensorflow/core/kernels/libtopk_op.lo -Wl,-no-whole-archive -Wl,-whole-archive -pthread -ldl -lm -lpthread -lm -lpthread -lm -lm -pthread -pthread -pthread -lstdc++ -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu')

# Configuration: 954605433930adc5a888fb615975817a98f0152151879f679ae1137fe14384fa

# Execution platform: @local_execution_config_platform//:platform

/home/dev/aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.5.0/../../../../aarch64-linux-gnu/bin/ld: bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libnext_after_gpu_f32_f32_kernel_generator.a(next_after_gpu_f32_f32_kernel_generator_kernel.o): Relocations in generic ELF (EM: 62)
/home/dev/aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.5.0/../../../../aarch64-linux-gnu/bin/ld: bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libnext_after_gpu_f32_f32_kernel_generator.a(next_after_gpu_f32_f32_kernel_generator_kernel.o): Relocations in generic ELF (EM: 62)
/home/dev/aarch64-linux-gnu/bin/../lib/gcc/aarch64-linux-gnu/7.5.0/../../../../aarch64-linux-gnu/bin/ld: bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libnext_after_gpu_f32_f32_kernel_generator.a(next_after_gpu_f32_f32_kernel_generator_kernel.o): error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build

# ar x /home/dev/tf/build/tensorflow-v2.9.1/build/execroot/org_tensorflow/bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/libnext_after_gpu_f32_f32_kernel_generator.a

# file next_after_gpu_f32_f32_kernel_generator_kernel.o

next_after_gpu_f32_f32_kernel_generator_kernel.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

SUBCOMMAND: # //tensorflow/core/kernels/mlir_generated:next_after_gpu_f32_f32_kernel_generator [action 'Generating kernel '//tensorflow/core/kernels/mlir_generated:next_after_gpu_f32_f32_kernel_generator'', configuration: 954605433930adc5a888fb615975817a98f0152151879f679ae1137fe14384fa, execution platform: @local_execution_config_platform//:platform]
  (cd /home/dev/tf/build/tensorflow-v2.9.1/build/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/home/dev/aarch64-linux-gnu/usr/local/cuda-10.2 \
    CUDNN_INSTALL_PATH=/home/dev/aarch64-linux-gnu/usr \
    GCC_HOST_COMPILER_PATH=/home/dev/aarch64-linux-gnu/bin/aarch64-linux-gnu-gcc \
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/home/dev/python3.8 \
    PYTHON_LIB_PATH=/home/dev/python3.8/site-packages \
    TENSORRT_INSTALL_PATH=/home/dev/aarch64-linux-gnu/usr \
    TF2_BEHAVIOR=1 \
    TF_CUDA_COMPUTE_CAPABILITIES=7.2 \
    TF_CUDA_PATHS=/home/dev/aarch64-linux-gnu/usr/local/cuda-10.2 \
    TF_CUDA_VERSION=10.2 \
    TF_CUDNN_VERSION=8.2.1 \
    TF_TENSORRT_VERSION=8.0.1 \
  bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/compiler/mlir/tools/kernel_gen/tf_to_kernel '--tile_sizes=1024' '--max-supported-rank=5' '--arch=compute_72' '--input=bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/next_after_gpu_f32_f32.mlir' '--output=bazel-out/aarch64-opt/bin/tensorflow/core/kernels/mlir_generated/next_after_gpu_f32_f32_kernel_generator_kernel.o' '--enable_ftz=True' '--cpu_codegen=False' '--jit=False')
uinbv5nw

uinbv5nw1#

将LLVM目标设置为Aarch 64似乎会导致tf_to_kernel生成aarch 64目标文件,而不是x86-64。

diff -rupEbwBN a/utils/bazel/llvm-project-overlay/llvm/config.bzl b/utils/bazel/llvm-project-overlay/llvm/config.bzl
--- a/utils/bazel/llvm-project-overlay/llvm/config.bzl	2022-08-02 00:59:22.004383681 +1000
+++ b/utils/bazel/llvm-project-overlay/llvm/config.bzl	2022-08-02 01:03:02.923416911 +1000
@@ -18,6 +18,20 @@ def native_arch_defines(arch, triple):
         r'LLVM_DEFAULT_TARGET_TRIPLE=\"{}\"'.format(triple),
     ]

+def cross_arch_defines(arch, host_triple, target_triple):
+    return [
+        r'LLVM_NATIVE_ARCH=\"{}\"'.format(arch),
+        "LLVM_NATIVE_ASMPARSER=LLVMInitialize{}AsmParser".format(arch),
+        "LLVM_NATIVE_ASMPRINTER=LLVMInitialize{}AsmPrinter".format(arch),
+        "LLVM_NATIVE_DISASSEMBLER=LLVMInitialize{}Disassembler".format(arch),
+        "LLVM_NATIVE_TARGET=LLVMInitialize{}Target".format(arch),
+        "LLVM_NATIVE_TARGETINFO=LLVMInitialize{}TargetInfo".format(arch),
+        "LLVM_NATIVE_TARGETMC=LLVMInitialize{}TargetMC".format(arch),
+        "LLVM_NATIVE_TARGETMCA=LLVMInitialize{}TargetMCA".format(arch),
+        r'LLVM_HOST_TRIPLE=\"{}\"'.format(host_triple),
+        r'LLVM_DEFAULT_TARGET_TRIPLE=\"{}\"'.format(target_triple),
+    ]
+
 posix_defines = [
     "LLVM_ON_UNIX=1",
     "HAVE_BACKTRACE=1",
@@ -86,7 +100,7 @@ llvm_config_defines = os_defines + selec
     "//llvm:macos_arm64": native_arch_defines("AArch64", "arm64-apple-darwin"),
     "@bazel_tools//src/conditions:darwin": native_arch_defines("X86", "x86_64-unknown-darwin"),
     "@bazel_tools//src/conditions:linux_aarch64": native_arch_defines("AArch64", "aarch64-unknown-linux-gnu"),
-    "//conditions:default": native_arch_defines("X86", "x86_64-unknown-linux-gnu"),
+    "//conditions:default": cross_arch_defines("AArch64", "x86_64-unknown-linux-gnu", "aarch64-unknown-linux-gnu"),
 }) + [
     # These shouldn't be needed by the C++11 standard, but are for some
     # platforms (e.g. glibc < 2.18. See
3npbholx

3npbholx2#

我发现我不能只将默认条件更改为Aarch64目标,因为bazel构建脚本也会为k8主机构建。
我使用的是自定义工具链//third_party/toolchains/cpus/aarch64:toolchain
检测当前构建目标是aarch64,以便设置cross_arch_defines的正确方法是什么?
任何帮助或指示是非常感谢。

cidc1ykv

cidc1ykv3#

仔细查看构建输出,它构建了两个版本的tf_to_kernel,一个用于为目标(aarch 64)生成目标文件,另一个用于为主机(x86-64)生成目标文件。
正在链接.../mlir/tools/内核生成器/tf_to_kernel; 13 s本地
在此处输出二进制文件:退出/k8-选择-执行-50 AE 0418
正在链接.../mlir/tools/kernel_gen/tf_to_kernel [用于主机]; 13 s本地
在此处输出二进制文件:退出/主机
我是bazel的新手,所以问题是,如何检测它正在运行哪个构建(对于主机或不),并相应地切换llvm arch定义。
我看到external/llvm-project/llvm/BUILD.bazel有这个规则。我如何在llvm_config_defines上执行select

cc_library(
    name = "config",
    hdrs = [
        "include/llvm/Config/VersionInfo.h",
        "include/llvm/Config/abi-breaking.h",
        "include/llvm/Config/llvm-config.h",
    ],
    copts = llvm_copts,
    defines = llvm_config_defines,
    includes = ["include"],
    textual_hdrs = [
        "include/llvm/Config/AsmParsers.def",
        "include/llvm/Config/AsmPrinters.def",
        "include/llvm/Config/Disassemblers.def",
        "include/llvm/Config/Targets.def",
        "include/llvm/Config/TargetMCAs.def",
        # Needed for include scanner to find execinfo.h
        "include/llvm/Config/config.h",
    ],
)
xfyts7mz

xfyts7mz4#

@gadagashwini我想不出如何基于selectllvm_config_defines执行或主机平台构建。
作为一个临时的解决方案,为每个执行和主机平台构建一个单独的llvm依赖项是否更容易?这样,我就可以为每个构建硬编码llvm_config_defines
您能否就如何解决或变通解决此问题给予一些指导?
先谢谢你了。

vlju58qv

vlju58qv5#

您好,请尝试TensorFlow SIG Build for Aarch 64。
Linux aarch 64 CPU Nightly.欲了解更多信息,请参阅此link.谢谢!

mepcadol

mepcadol6#

@gadagashwini感谢您的回复。每晚构建的aarch 64二进制文件仅适用于CPU?我希望将Tensorflow与CUDA和TensorRT一起使用。
要正确解决LLVM配置定义问题,理想的方法是使用config_setting,并为每个目标/主机/执行平台定义一个配置设置。问题是,我没有看到执行平台约束。Bazel似乎不支持主机/执行平台约束?当然,我们可以解决缺少主机/执行约束的问题,但这需要经历很多过程。

pxy2qtax

pxy2qtax7#

@jasaw谢谢你挖掘这个通过,我也遇到了完全相同的问题.
正如你所指出的,TF为主机(k8)和目标(aarch 64)构建了一些部分。我认为你遇到了正确的地方,我想可以用一种方式来修补它,两种模式都会选择正确的一个?
至于bazel是如何做到拱门切换的,我所做的让GPU工具链工作的是修补third_party/gpus/crosstool/BUILD.tpl,以添加如下所示的内容并定义:cc-compiler-aarch64-cross

diff --git a/third_party/gpus/crosstool/BUILD.tpl b/third_party/gpus/crosstool/BUILD.tpl
index 5a78654a90f..d6a1f164d87 100644
--- a/third_party/gpus/crosstool/BUILD.tpl
+++ b/third_party/gpus/crosstool/BUILD.tpl
@@ -29,6 +29,7 @@ cc_toolchain_suite(
         "x64_windows|msvc-cl": ":cc-compiler-windows",
         "x64_windows": ":cc-compiler-windows",
         "arm": ":cc-compiler-local",
+        "aarch64|cross_linux": ":cc-compiler-aarch64-cross",
         "aarch64": ":cc-compiler-local",
         "k8": ":cc-compiler-local",
         "piii": ":cc-compiler-local",
@@ -68,6 +69,40 @@ cc_toolchain_config(
     linker_bin_path = "%{linker_bin_path}",
     builtin_sysroot = "%{builtin_sysroot}",
     cuda_path = "%{cuda_toolkit_path}",
+    compiler = "%{compiler}",
+)
+
+cc_toolchain(
+    name = "cc-compiler-aarch64-cross",
+    all_files = "%{compiler_deps}",
+    compiler_files = "%{compiler_deps}",
+    ar_files = "%{compiler_deps}",
+    as_files = "%{compiler_deps}",
+    dwp_files = ":empty",
+    linker_files = "%{compiler_deps}",
+    objcopy_files = ":empty",
+    strip_files = ":empty",
+    # To support linker flags that need to go to the start of command line
+    # we need the toolchain to support parameter files. Parameter files are
+    # last on the command line and contain all shared libraries to link, so all
+    # regular options will be left of them.
+    supports_param_files = 1,
+    toolchain_identifier = "aarch64-linux-gnu",
+    toolchain_config = ":cc-compiler-aarch64-config",
+)
+
+cc_toolchain_config(
+    name = "cc-compiler-aarch64-config",
+    cpu = "aarch64-linux-gnu",
+    builtin_include_directories = [%{cxx_builtin_include_directories}],
+    extra_no_canonical_prefixes_flags = [%{extra_no_canonical_prefixes_flags}],
+    host_compiler_path = "%{host_compiler_path}",
+    host_compiler_prefix = "%{host_compiler_prefix}",
+    host_compiler_warnings = [%{host_compiler_warnings}],
+    host_unfiltered_compile_flags = [%{unfiltered_compile_flags}],
+    linker_bin_path = "%{linker_bin_path}",
+    builtin_sysroot = "%{builtin_sysroot}",
+    cuda_path = "%{cuda_toolkit_path}"
 )

 cc_toolchain(

不幸的是,术语host和target在这些工具链中的定义方式相当混乱。要么是我在2年后还没有完全理解Bazel的工作流程,要么是有些奇怪的地方,没有人费心去记录(公开?)。

相关问题