Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12 #111469

Closed
qianxifu opened this issue Oct 18, 2023 · 21 comments

Comments

@qianxifu
Copy link

qianxifu commented Oct 18, 2023

🐛 Describe the bug

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/torch/__init__.py", line 235, in <module>
    from torch._C import *  # noqa: F403
ImportError: /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12
>>> 

Versions

Collecting environment information...
PyTorch version: N/A
Is debug build: N/A
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31

Python version: 3.10.13 (main, Aug 25 2023, 13:20:03) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-162-generic-x86_64-with-glibc2.31
Is CUDA available: N/A
CUDA runtime version: 12.0.140
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA A10
Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.9.1
/usr/local/cuda-12.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.9.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: N/A

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8369B CPU @ 2.90GHz
Stepping: 6
CPU MHz: 2900.000
BogoMIPS: 5800.00
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 384 KiB
L1i cache: 256 KiB
L2 cache: 10 MiB
L3 cache: 48 MiB
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq monitor ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid arch_capabilities

Versions of relevant libraries:
[pip3] numpy==1.26.1
[pip3] torch==2.1.0
[pip3] torchaudio==2.1.0
[pip3] torchvision==0.16.0
[pip3] triton==2.1.0
[conda] Could not collect

--------------------------------nvidia-smi---------------------------------------------
1697625105315

--------------------------------cuda version---------------------------------------------
1697625161049

--------------------------------install torch command---------------------------------------------
pip3 install torch torchvision torchaudio

--------------------------------python lib---------------------------------------------
certifi 2019.11.28
chardet 3.0.4
command-not-found 0.3
dbus-python 1.2.16
distro 1.4.0
distro-info 0.23+ubuntu1.1
filelock 3.12.4
fsspec 2023.9.2
idna 2.8
Jinja2 3.1.2
language-selector 0.1
MarkupSafe 2.1.3
mpmath 1.3.0
netifaces 0.10.4
networkx 3.1
numpy 1.26.1
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.2.140
nvidia-nvtx-cu12 12.1.105
Pillow 10.1.0
pip 23.3
PyGObject 3.36.0
pymacaroons 0.13.0
PyNaCl 1.3.0
python-apt 2.0.1+ubuntu0.20.4.1
PyYAML 5.3.1
requests 2.22.0
requests-unixsocket 0.2.0
setuptools 45.2.0
six 1.14.0
ssh-import-id 5.10
sympy 1.12
torch 2.1.0
torchaudio 2.1.0
torchvision 0.16.0
triton 2.1.0
typing_extensions 4.8.0
ubuntu-advantage-tools 8001
ufw 0.36
unattended-upgrades 0.1
urllib3 1.25.8
wheel 0.34.2

@ptrblck
Copy link
Collaborator

ptrblck commented Oct 18, 2023

Not reproducible using pip install torch torchvision torchaudio:

pip install torch torchvision torchaudio
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting torch
  Downloading torch-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (25 kB)
Requirement already satisfied: torchvision in /usr/local/lib/python3.10/dist-packages (0.16.0a0)
Collecting torchaudio
  Downloading torchaudio-2.1.0-cp310-cp310-manylinux1_x86_64.whl.metadata (5.7 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch) (3.12.4)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch) (4.7.1)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch) (3.1.2)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch) (2023.6.0)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 27.2 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
...
# python -c "import torch; print(torch.__version__); print(torch.__path__)"
2.1.0+cu121
['/usr/local/lib/python3.10/dist-packages/torch']
# find /usr/ -name libnvJit*
/usr/local/lib/python3.10/dist-packages/nvidia/nvjitlink/lib/libnvJitLink.so.12

@qianxifu
Copy link
Author

thanks for your help.

@lee101
Copy link

lee101 commented Oct 19, 2023

Note that ptrblock is on cuda 12.1 and we are having this issue on cuda 12.0
Not sure if thats related but worth a try updating cuda versions

@lee101
Copy link

lee101 commented Oct 19, 2023

LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64

Managed to get this error message to go away by pointing this env var back to a previous cuda version (11.7 in my case instead of 12.0), not sure what this was about, also works if i just unset that variable so i'm not sure if we need to set that up with cuda 12.0 or not.

-Lee https://text-generator.io

@giovannibonisoli
Copy link

giovannibonisoli commented Oct 19, 2023

I had the same problem of this issue:
File "/home/user/.conda/envs/myenv/bin/torchrun", line 5, in <module> from torch.distributed.run import main File "/home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/__init__.py", line 235, in <module> from torch._C import * # noqa: F403 ImportError: /home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12

I tried the solution suggested by @ptrblck and the error is not fixed, yet!

@panpan0000
Copy link

panpan0000 commented Oct 20, 2023

I had the same problem of this issue: File "/home/user/.conda/envs/myenv/bin/torchrun", line 5, in <module> from torch.distributed.run import main File "/home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/__init__.py", line 235, in <module> from torch._C import * # noqa: F403 ImportError: /home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12

I tried the solution suggested by @ptrblck and the error is not fixed, yet!

May I ask which solution ? I saw the same issue with below trial ( python version == 3.10)

Below is my env setup and issue:

pip3 install torch torchvision torchaudio

pip list |grep torch
torch                     2.1.0
torchaudio                2.1.0
torchvision               0.16.0

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  525.105.17  Tue Mar 28 18:02:59 UTC 2023
GCC version:  gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

python3 -c "import torch;"

from torch._C import *  # noqa: F403
ImportError: /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12

Workaround:

When downgrade torch to 2.0.1 (pip3 install torch==2.0.1), issue gone.

@kk19990709
Copy link

I don't think this issue should be closed. It hasn't been solved yet. Same Error within torch==2.1.0

@kk19990709
Copy link

I had the same problem of this issue: File "/home/user/.conda/envs/myenv/bin/torchrun", line 5, in <module> from torch.distributed.run import main File "/home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/__init__.py", line 235, in <module> from torch._C import * # noqa: F403 ImportError: /home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12我遇到了同样的问题: File "/home/user/.conda/envs/myenv/bin/torchrun", line 5, in <module> from torch.distributed.run import main File "/home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/__init__.py", line 235, in <module> from torch._C import * # noqa: F403 ImportError: /home/user/.conda/envs/myenv/lib/python3.8/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12
I tried the solution suggested by @ptrblck and the error is not fixed, yet!我尝试了建议的解决方案,但错误尚未修复!

May I ask which solution ? I saw the same issue with below trial ( python version == 3.10)请问哪种解决方案?我在下面的试用版(python版本== 3.10)中看到同样的问题

Below is my env setup and issue:以下是我的环境设置和问题:

pip3 install torch torchvision torchaudio

pip list |grep torch
torch                     2.1.0
torchaudio                2.1.0
torchvision               0.16.0

cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  525.105.17  Tue Mar 28 18:02:59 UTC 2023
GCC version:  gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
python3 -c "import torch;"

from torch._C import *  # noqa: F403
ImportError: /usr/local/lib/python3.10/dist-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkAddData_12_1, version libnvJitLink.so.12

Workaround: 解决方法:

When downgrade torch to 2.0.1 (pip3 install torch==2.0.1), issue gone.将割炬降级到 2.0.1 ( pip3 install torch==2.0.1 ) 时,问题消失了。

This works, but some package require torch==2.1.0, such as xformers

@upenn-hughmac
Copy link

Same issue. In case it's useful for others, fixed for me by either:

export LD_LIBRARY_PATH=$HOME/path/to/my/venv3115/lib64/python3.11/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

or uninstalling 2.1.0 (stable) and installing the nightly dev preview:

python -m pip uninstall torch torchvision torchaudio
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

@conan1024hao
Copy link

conan1024hao commented Nov 7, 2023

When downgrade torch to 2.0.1 (pip3 install torch==2.0.1), issue gone.

torch wasn't the problem to me, downgrade torch audio to 2.0.1, issue gone.

@krodio
Copy link

krodio commented Nov 8, 2023

Same issue. In case it's useful for others, fixed for me by either:

export LD_LIBRARY_PATH=$HOME/path/to/my/venv3115/lib64/python3.11/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

or uninstalling 2.1.0 (stable) and installing the nightly dev preview:

python -m pip uninstall torch torchvision torchaudio
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

it did work, replace xxxx to the real python interpreter path
export LD_LIBRARY_PATH=$HOME/xxxx/python3.11/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

@NintendoLink
Copy link

it's work,thanks~ @upenn-hughmac

@surmount1
Copy link

Same issue. In case it's useful for others, fixed for me by either:

export LD_LIBRARY_PATH=$HOME/path/to/my/venv3115/lib64/python3.11/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

or uninstalling 2.1.0 (stable) and installing the nightly dev preview:

python -m pip uninstall torch torchvision torchaudio
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

Nice! I also solved this problem using this method.
export LD_LIBRARY_PATH=/data/home/user/anaconda3/envs/vllm/lib/python3.10
/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

@jianbo27
Copy link

I try to downgrade to python3.9, which works for me in conda virtual environment.

@wangbluo
Copy link

wangbluo commented Feb 2, 2024

Same issue on torch2.2, I have tried all above solutions but failed, this issue shouldn't be closed at all.

python version: python 3.10
PyTorch-cuda:12.1
$CUDA_HOME: /home/share/spack/opt/spack/linux-ubuntu20.04-icelake/gcc-9.4.0/cuda-12.1.1-uxo2fr2s3d6ge4m6bo46jslallfbluei

@osma
Copy link

osma commented Feb 9, 2024

I've also had this problem. In my case, it was apparently due to a compatibility issue w.r.t. CUDA 12.0.0 that I was using.

It appears that PyTorch 2.1.x and 2.2.0 have been compiled against CUDA 12.1.0 and they use new symbols introduced in 12.1 so they won't work with CUDA 12.0.0. Installing either CUDA 12.1.0 or the older version 11.8.0 fixes the problem for me. Downgrading to PyTorch 2.0.1 also works, as it's compatible with CUDA 12.0.0.

@yangfansun
Copy link

Installing PyTorch with the official CUDA 11.8 setup recommended by PyTorch can fix this problem.

@richardp4
Copy link

Hi,
I got the same problem.
My conditions are below.

  • OS : ubuntu 22.04
  • CUDA : 12.0
  • cudnn : 8.8
  • python : 3.9 anaconda env
  • pytorch : 2.2.2 -> 2.0.1 (down grade)

After downgrading the pytorch version from 2.2.2 to 2.0.1, import torch is good.
However, another error occurred like "ModuleNotFoundError: No module named 'torch._custom_ops'"

Please give me some tips to solve it.

@osma
Copy link

osma commented Apr 17, 2024

@richardp4 CUDA version 12.0 is your problem. Either upgrade it to 12.1+ or downgrade to 11.8.

@M3Dade
Copy link

M3Dade commented Apr 17, 2024

downgrade to 11.8.

@osma I meet the problem by downloading flash_attn-2.5.7.tar.gz. It's worked when I downgrade CUDA version from 12.0 to 11.8. Thanks for your suggestion.

@weifengpy
Copy link
Contributor

Same issue. In case it's useful for others, fixed for me by either:

export LD_LIBRARY_PATH=$HOME/path/to/my/venv3115/lib64/python3.11/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

or uninstalling 2.1.0 (stable) and installing the nightly dev preview:

python -m pip uninstall torch torchvision torchaudio
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121

the quoted answer works for me. in case people curious about how to find site-packages path
step 1: python3 -m pip list -v: this shows full path 'envs/pytorch-3.10/lib/python3.10/site-package'
step 2: export LD_LIBRARY_PATH=full/path/envs/pytorch-3.10/lib/python3.10/site-packages/nvidia/nvjitlink/lib:$LD_LIBRARY_PATH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests