当前环境
PyTorch version: 1.13.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
GCC version: (GCC) 8.5.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17
Python version: 3.9.19 (main, May 6 2024, 19:43:03) [GCC 11.2.0] (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA A100-SXM4-80GB
GPU 1: NVIDIA A100-SXM4-80GB
GPU 2: NVIDIA A100-SXM4-80GB
GPU 3: NVIDIA A100-SXM4-80GB
GPU 4: NVIDIA A100-SXM4-80GB
GPU 5: NVIDIA A100-SXM4-80GB
GPU 6: NVIDIA A100-SXM4-80GB
GPU 7: NVIDIA A100-SXM4-80GB
Nvidia driver version: 535.104.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz
Stepping: 6
CPU MHz: 3490.909
BogoMIPS: 5807.31
Virtualization: VT-x
L1d cache: 48K
L1i cache: 32K
L2 cache: 1280K
L3 cache: 49152K
NUMA node0 CPU(s): 0-127
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==1.13.1
[pip3] torchaudio==0.13.1
[pip3] torchvision==0.14.1
[pip3] transformers==4.37.0
[pip3] transformers-stream-generator==0.0.5
[conda] blas 1.0 mkl
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2023.1.0 h213fc3f_46344
[conda] mkl-service 2.4.0 py39h5eee18b_1
[conda] mkl_fft 1.3.8 py39h5eee18b_0
[conda] mkl_random 1.2.4 py39hdb19cb5_0
[conda] numpy 1.26.4 py39h5f9d8c6_0
[conda] numpy-base 1.26.4 py39hb5e798b_0
[conda] pytorch 1.13.1 py3.9_cuda11.6_cudnn8.3.2_0 pytorch
[conda] pytorch-cuda 11.6 h867d48c_1 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchaudio 0.13.1 py39_cu116 pytorch
[conda] torchvision 0.14.1 py39_cu116 pytorch
[conda] transformers 4.37.0 pypi_0 pypi
[conda] transformers-stream-generator 0.0.5 pypi_0 pypi
ROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 PXB SYS SYS SYS 0-127 N/A N/A
GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 PXB SYS SYS SYS 0-127 N/A N/A
GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS 0-127 N/A N/A
GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 SYS PXB SYS SYS 0-127 N/A N/A
GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS PXB SYS 0-127 N/A N/A
GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS PXB SYS 0-127 N/A N/A
GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS SYS PXB 0-127 N/A N/A
GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS SYS PXB 0-127 N/A N/A
NIC0 PXB PXB SYS SYS SYS SYS SYS SYS X SYS SYS SYS
NIC1 SYS SYS PXB PXB SYS SYS SYS SYS SYS X SYS SYS
NIC2 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS X SYS
NIC3 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_bond_0
NIC1: mlx5_bond_1
NIC2: mlx5_bond_2
NIC3: mlx5_bond_3
🐛 描述bug
使用不同批次大小(1或其他数字)运行推理时,响应不同。我想知道原因以及如何保持响应一致?
具体来说,当我使用batchsize=1运行推理时,响应是A,而当我使用batchsize=2、10或20(但保持相同的提示)运行推理时,每个响应都是B。A与B不同。
1条答案
按热度按时间9udxz4iz1#
我也是