Page 1 of 1
Failed to offload to two gpus on wsl2
Posted: Tue Aug 26, 2025 4:19 am
by Zhiyuan Yin
Hi,
I’m running VASP 6.5.1 (built with makefile.include.nvhpc_omp on my test workstation under WSL2. VASP runs fine with one GPU offloading(-np 1), but when I try -np 2, I consistently get the following error during initialization:
Orbital orthonormalization failed in the inversion of matrix
unknown argument kpoint: 0 spin: 0
Is this a known limitation/bug with VASP 6.5.1 GPU offload when using multiple GPUs on wsl2?
Could this be related to the NVHPC OpenMPI/UCX configuration under WSL2?
Regards,
Zhiyuan
Re: Failed to offload to two gpus on wsl2
Posted: Wed Aug 27, 2025 1:14 pm
by andreas.singraber
Hello Zhiyuan,
in principle the OpenACC GPU port should allow offloading to multiple GPUs (https://vasp.at/wiki/OpenACC_GPU_port_of_VASP) with one MPI rank per GPU, just as you intended to do. However, the error message is quite puzzling because a look into the source code reveals that the "unknown argument" string within the error message should never occur under normal circumstances (it's a fallback for the error message never used in the code). That hints at some rather severe problem in the initialization of GPUs or MPI. Personally, I do not have experience with multi-GPU runs in combination with WSL2 but I will discuss this with my colleagues. In the meantime, could you please upload a minimal reproducible example (with all useful input/output files, see forum posting guidelines, for -np = 1,2) so I can try to reproduce the problem? If available, please also provide the makefile.include you used for building VASP. Also it would be helpful to know your system configuration, maybe inside WSL2 you can execute nvidia-smi and post the output?
Thank you!
All the best,
Andreas Singraber
Re: Failed to offload to two gpus on wsl2
Posted: Fri Aug 29, 2025 9:01 am
by Zhiyuan Yin
Dear Andreas,
Here is my nvidia-smi and topo output:
$ nvidia-smi
Fri Aug 29 01:34:36 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.76.07 Driver Version: 581.08 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5090 On | 00000000:01:00.0 On | N/A |
| 40% 33C P8 18W / 600W | 139MiB / 32607MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 5090 On | 00000000:03:00.0 On | N/A |
| 40% 34C P8 18W / 600W | 1187MiB / 32607MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
zhyin@DESKTOP-BBSMTIC:~$ nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS N/A
GPU1 SYS X N/A
$ ldd vasp_gam
linux-vdso.so.1 (0x00007ffc7b7c5000)
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/extras/qd/lib/libqdmod.so.0 (0x0000120d4b800000)
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/extras/qd/lib/libqd.so.0 (0x0000120d4b400000)
liblapack_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/liblapack_lp64.so.0 (0x0000120d4a600000)
libblas_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libblas_lp64.so.0 (0x0000120d48600000)
libfftw3.so.3 => /lib/x86_64-linux-gnu/libfftw3.so.3 (0x0000120d48200000)
libfftw3_omp.so.3 => /lib/x86_64-linux-gnu/libfftw3_omp.so.3 (0x0000120d4ba6a000)
libmpi_usempif08.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi_usempif08.so.40 (0x0000120d47e00000)
libmpi_usempi_ignore_tkr.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi_usempi_ignore_tkr.so.40 (0x0000120d47a00000)
libmpi_mpifh.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi_mpifh.so.40 (0x0000120d47600000)
libmpi.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi.so.40 (0x0000120d47200000)
libscalapack_lp64.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libscalapack_lp64.so.2 (0x0000120d46a00000)
libnvhpcwrapcufft.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvhpcwrapcufft.so (0x0000120d46600000)
libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x0000120d35400000)
libcusolver.so.11 => /usr/local/cuda/lib64/libcusolver.so.11 (0x0000120d26a00000)
libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x0000120d20e00000)
libcudaforwrapnccl.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudaforwrapnccl.so (0x0000120d20a00000)
libnccl.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/nccl/lib/libnccl.so.2 (0x0000120d09a00000)
libcublas.so.12 => /usr/local/cuda/lib64/libcublas.so.12 (0x0000120d02800000)
libcublasLt.so.12 => /usr/local/cuda/lib64/libcublasLt.so.12 (0x0000120cd0400000)
libcudaforwrapblas.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudaforwrapblas.so (0x0000120cd0000000)
libcudaforwrapblas117.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudaforwrapblas117.so (0x0000120ccfc00000)
libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x0000120ccf800000)
libcudafor_128.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudafor_128.so (0x0000120ccd000000)
libcudafor.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudafor.so (0x0000120cccc00000)
libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libacchost.so (0x0000120ccc800000)
libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libaccdevaux.so (0x0000120ccc400000)
libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libaccdevice.so (0x0000120ccc000000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudadevice.so (0x0000120ccbc00000)
libcudafor2.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudafor2.so (0x0000120ccb800000)
libnvf.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvf.so (0x0000120ccb000000)
libnvhpcatm.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvhpcatm.so (0x0000120ccac00000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000120cca800000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvomp.so (0x0000120cc9600000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvcpumath.so (0x0000120cc9000000)
libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvc.so (0x0000120cc8c00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000120cc8800000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000120d4ba30000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000120d4b717000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000120d4ba2b000)
libatomic.so.1 => /lib/x86_64-linux-gnu/libatomic.so.1 (0x0000120d4b70c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000120d4ba24000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000120d4b707000)
libopen-rte.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libopen-rte.so.40 (0x0000120cc8400000)
libopen-pal.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libopen-pal.so.40 (0x0000120cc8000000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x0000120d4b700000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x0000120d4b6e4000)
/lib64/ld-linux-x86-64.so.2 (0x0000120d4ba86000)
libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x0000120cb0c00000)
P.S.: I acknowledge that the gaming card is not suitable for production calculations, but it is intended for testing purposes only.
Anyway, I have attached the input files and makefile.include.
Thank you again for your support.
Regards,
Zhiyuan
Re: Failed to offload to two gpus on wsl2
Posted: Thu Sep 11, 2025 12:57 pm
by andreas.singraber
Dear Zhiyuan,
sorry for the late reply... I was able to successfully run your calculation on 2x P100 GPUs and got the same result as with a single GPU. My simulations were run with VASP 6.4.2 because I noticed you mentioned in your messages VASP 6.5.1 but all your output files indicate 6.4.2. Could you please verify which version we should actually be discussing? Also, I used Rocky Linux 8 as operating system because I have no system with WSL2 available. So in principle this should work fine and unfortunately I do not have a lot more guidance to offer because I cannot reproduce the behavior. Anyway, at least two more things come to mind:
Make sure you did not mix makefile.includes from different VASP versions. In particular for the OpenACC port some preprocessor variables (CPP_OPTIONS) changed from 6.4.2 to 6.5.1 and missing some of them may cause weird errors.
I saw you used NVHPC 25.3 for compiling and on your machine you have a very new driver installed (CUDA version 13.0). As far as I know CUDA 13.0 was not available at the time NVHPC 25.3... then again, I saw you used anyway CUDA 12.8 in the makefile.include. Maybe, updating to a newer NVHPC would work? For my tests I used 25.7.
All the best,
Andreas Singraber