VASP Forum

Posted: **Tue Aug 26, 2025 4:19 am**

Hi,

I’m running VASP 6.5.1 (built with makefile.include.nvhpc_omp on my test workstation under WSL2. VASP runs fine with one GPU offloading(-np 1), but when I try -np 2, I consistently get the following error during initialization:

Orbital orthonormalization failed in the inversion of matrix
unknown argument kpoint: 0 spin: 0

Is this a known limitation/bug with VASP 6.5.1 GPU offload when using multiple GPUs on wsl2?
Could this be related to the NVHPC OpenMPI/UCX configuration under WSL2?

Regards,
Zhiyuan

Posted: **Wed Aug 27, 2025 1:14 pm**

Hello Zhiyuan,

in principle the OpenACC GPU port should allow offloading to multiple GPUs (https://vasp.at/wiki/OpenACC_GPU_port_of_VASP) with one MPI rank per GPU, just as you intended to do. However, the error message is quite puzzling because a look into the source code reveals that the "unknown argument" string within the error message should never occur under normal circumstances (it's a fallback for the error message never used in the code). That hints at some rather severe problem in the initialization of GPUs or MPI. Personally, I do not have experience with multi-GPU runs in combination with WSL2 but I will discuss this with my colleagues. In the meantime, could you please upload a minimal reproducible example (with all useful input/output files, see forum posting guidelines, for -np = 1,2) so I can try to reproduce the problem? If available, please also provide the makefile.include you used for building VASP. Also it would be helpful to know your system configuration, maybe inside WSL2 you can execute nvidia-smi and post the output?

Thank you!

All the best,
Andreas Singraber

Posted: **Fri Aug 29, 2025 9:01 am**

Dear Andreas,

Here is my nvidia-smi and topo output:

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
zhyin@DESKTOP-BBSMTIC:~$ nvidia-smi topo -m
GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X SYS N/A
GPU1 SYS X N/A

$ ldd vasp_gam
linux-vdso.so.1 (0x00007ffc7b7c5000)
libqdmod.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/extras/qd/lib/libqdmod.so.0 (0x0000120d4b800000)
libqd.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/extras/qd/lib/libqd.so.0 (0x0000120d4b400000)
liblapack_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/liblapack_lp64.so.0 (0x0000120d4a600000)
libblas_lp64.so.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libblas_lp64.so.0 (0x0000120d48600000)
libfftw3.so.3 => /lib/x86_64-linux-gnu/libfftw3.so.3 (0x0000120d48200000)
libfftw3_omp.so.3 => /lib/x86_64-linux-gnu/libfftw3_omp.so.3 (0x0000120d4ba6a000)
libmpi_usempif08.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi_usempif08.so.40 (0x0000120d47e00000)
libmpi_usempi_ignore_tkr.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi_usempi_ignore_tkr.so.40 (0x0000120d47a00000)
libmpi_mpifh.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi_mpifh.so.40 (0x0000120d47600000)
libmpi.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libmpi.so.40 (0x0000120d47200000)
libscalapack_lp64.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libscalapack_lp64.so.2 (0x0000120d46a00000)
libnvhpcwrapcufft.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvhpcwrapcufft.so (0x0000120d46600000)
libcufft.so.11 => /usr/local/cuda/lib64/libcufft.so.11 (0x0000120d35400000)
libcusolver.so.11 => /usr/local/cuda/lib64/libcusolver.so.11 (0x0000120d26a00000)
libnvJitLink.so.12 => /usr/local/cuda/lib64/libnvJitLink.so.12 (0x0000120d20e00000)
libcudaforwrapnccl.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudaforwrapnccl.so (0x0000120d20a00000)
libnccl.so.2 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/nccl/lib/libnccl.so.2 (0x0000120d09a00000)
libcublas.so.12 => /usr/local/cuda/lib64/libcublas.so.12 (0x0000120d02800000)
libcublasLt.so.12 => /usr/local/cuda/lib64/libcublasLt.so.12 (0x0000120cd0400000)
libcudaforwrapblas.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudaforwrapblas.so (0x0000120cd0000000)
libcudaforwrapblas117.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudaforwrapblas117.so (0x0000120ccfc00000)
libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x0000120ccf800000)
libcudafor_128.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudafor_128.so (0x0000120ccd000000)
libcudafor.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudafor.so (0x0000120cccc00000)
libacchost.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libacchost.so (0x0000120ccc800000)
libaccdevaux.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libaccdevaux.so (0x0000120ccc400000)
libaccdevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libaccdevice.so (0x0000120ccc000000)
libcudadevice.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudadevice.so (0x0000120ccbc00000)
libcudafor2.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libcudafor2.so (0x0000120ccb800000)
libnvf.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvf.so (0x0000120ccb000000)
libnvhpcatm.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvhpcatm.so (0x0000120ccac00000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x0000120cca800000)
libnvomp.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvomp.so (0x0000120cc9600000)
libnvcpumath.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvcpumath.so (0x0000120cc9000000)
libnvc.so => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/compilers/lib/libnvc.so (0x0000120cc8c00000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000120cc8800000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000120d4ba30000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000120d4b717000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000120d4ba2b000)
libatomic.so.1 => /lib/x86_64-linux-gnu/libatomic.so.1 (0x0000120d4b70c000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000120d4ba24000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000120d4b707000)
libopen-rte.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libopen-rte.so.40 (0x0000120cc8400000)
libopen-pal.so.40 => /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/comm_libs/12.8/hpcx/hpcx-2.22.1/ompi/lib/libopen-pal.so.40 (0x0000120cc8000000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x0000120d4b700000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x0000120d4b6e4000)
/lib64/ld-linux-x86-64.so.2 (0x0000120d4ba86000)
libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x0000120cb0c00000)

P.S.: I acknowledge that the gaming card is not suitable for production calculations, but it is intended for testing purposes only.

Anyway, I have attached the input files and makefile.include.

Thank you again for your support.

Regards,
Zhiyuan

Posted: **Thu Sep 11, 2025 12:57 pm**

Dear Zhiyuan,

sorry for the late reply... I was able to successfully run your calculation on 2x P100 GPUs and got the same result as with a single GPU. My simulations were run with VASP 6.4.2 because I noticed you mentioned in your messages VASP 6.5.1 but all your output files indicate 6.4.2. Could you please verify which version we should actually be discussing? Also, I used Rocky Linux 8 as operating system because I have no system with WSL2 available. So in principle this should work fine and unfortunately I do not have a lot more guidance to offer because I cannot reproduce the behavior. Anyway, at least two more things come to mind:

Make sure you did not mix makefile.includes from different VASP versions. In particular for the OpenACC port some preprocessor variables (CPP_OPTIONS) changed from 6.4.2 to 6.5.1 and missing some of them may cause weird errors.
I saw you used NVHPC 25.3 for compiling and on your machine you have a very new driver installed (CUDA version 13.0). As far as I know CUDA 13.0 was not available at the time NVHPC 25.3... then again, I saw you used anyway CUDA 12.8 in the makefile.include. Maybe, updating to a newer NVHPC would work? For my tests I used 25.7.

All the best,
Andreas Singraber

VASP Forum

Failed to offload to two gpus on wsl2

Failed to offload to two gpus on wsl2

Re: Failed to offload to two gpus on wsl2

Re: Failed to offload to two gpus on wsl2

Re: Failed to offload to two gpus on wsl2