Page 1 of 1

Possible Bugs in GPU-Vasp 6.4.1 using SOC and IVDW 20/21/202

Posted: Wed Oct 18, 2023 6:51 pm
by guorong_weng
I am getting different dispersion energies from CPU and GPU, for the same version of VASP (6.4.1) and the same system.
The statistics are as attached in the screenshot: I tested IVDW 10, 11, 12, 20, 21, and 202, with LSORBIT = .TRUE. (spin-orbit coupling).

For DFT-Dx corrections, the GPU and CPU runs are giving the same answers. However, for Tkatchenko-Scheffler and many-body corrections, the GPU results deviate from the CPU ones, or even crashed. My colleague tested version 6.4.0 on another GPU machine, and the number agreed with CPU (IVDW = 20). To the best of my observation, this issue does not appear if LSORBIT = .FALSE.

The four input files for my test system are attached here, as well as the makefile.include file that I used to compile the code on GPU. Could the VASP team, or anyone else who is interested, try running them and see if the same problem occurs? Thanks.

Code: Select all

# Default precompiler options
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Duse_bse_te \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -D_OPENMP \
              -D_OPENACC \
              -DUSENCCL -DUSENCCLP2P

CPP         = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX)  > $*$(SUFFIX)

# N.B.: you might need to change the cuda-version here
#       to one that comes with your NVIDIA-HPC SDK
FC          = mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -mp
FCL         = mpif90 -acc -gpu=cc60,cc70,cc80,cuda12.2 -mp -c++libs

FREE        = -Mfree

FFLAGS      = -Mbackslash -Mlarge_arrays

OFLAG       = -fast

DEBUG       = -Mfree -O0 -traceback

OBJECTS     = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

LLIBS       = -cudalib=cublas,cusolver,cufft,nccl -cuda

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o minimax_dependence.o
SOURCE_O2  := pead.o

# For what used to be vasp.5.lib
CPP_LIB     = $(CPP)
FC_LIB      = nvfortran
CC_LIB      = nvc -w
CFLAGS_LIB  = -O
FFLAGS_LIB  = -O1 -Mfixed
FREE_LIB    = $(FREE)

OBJECTS_LIB = linpack_double.o

# For the parser library
CXX_PARS    = nvc++ --no_warnings

##
## Customize as of this point! Of course you may change the preceding
## part of this file as well if you like, but it should rarely be
## necessary ...
##
# When compiling on the target machine itself , change this to the
# relevant target when cross-compiling for another architecture
VASP_TARGET_CPU ?= -tp host
FFLAGS     += $(VASP_TARGET_CPU)

# Specify your NV HPC-SDK installation (mandatory)
#... first try to set it automatically
NVROOT      = /home/libraries/nvhpc_2023_237_Linux_x86_64_cuda_12.2/hpc_sdk/Linux_x86_64/23.7

# If the above fails, then NVROOT needs to be set manually
#NVHPC      ?= /opt/nvidia/hpc_sdk
#NVVERSION   = 21.11
#NVROOT      = $(NVHPC)/Linux_x86_64/$(NVVERSION)

## Improves performance when using NV HPC-SDK >=21.11 and CUDA >11.2
OFLAG_IN   = -fast -Mwarperf
SOURCE_IN  := nonlr.o

# Software emulation of quadruple precsion (mandatory)
QD         ?= $(NVROOT)/compilers/extras/qd
LLIBS      += -L$(QD)/lib -lqdmod -lqd
INCS       += -I$(QD)/include/qd

# BLAS (mandatory)
BLAS        = -lblas

# LAPACK (mandatory)
LAPACK      = -llapack

# scaLAPACK (mandatory)
SCALAPACK   = -Mscalapack

LLIBS      += $(SCALAPACK) $(LAPACK) $(BLAS)

# FFTW (mandatory)
FFTW_ROOT  ?= /home/libraries/fftw/gpu/fftw_install
LLIBS      += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp
INCS       += -I$(FFTW_ROOT)/include

# HDF5-support (optional but strongly recommended)
CPP_OPTIONS+= -DVASP_HDF5
HDF5_ROOT  ?= /home/libraries/hdf5/gpu/hdf5_install
LLIBS      += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS       += -I$(HDF5_ROOT)/include

# For the VASP-2-Wannier90 interface (optional)
CPP_OPTIONS    += -DVASP2WANNIER90
WANNIER90_ROOT ?= /home/ibraries/wannier90-3.1.0/gpu
LLIBS          += -L$(WANNIER90_ROOT)/lib -lwannier


Re: Possible Bugs in GPU-Vasp 6.4.1 using SOC and IVDW 20/21/202

Posted: Fri Oct 20, 2023 6:35 am
by martin.schlipf
I ran the input files you provided on a Nvidia Quadro RTX 5000 with both VASP 6.4.1 and the current master. In both cases I obtain -22.16304336 which is consistent with the correct values.

Which compiler did you use? I have nvfortran 22.11 installed.

Re: Possible Bugs in GPU-Vasp 6.4.1 using SOC and IVDW 20/21/202

Posted: Fri Oct 20, 2023 6:57 pm
by guorong_weng
Hi Martin. The problematic data I reported were generated with VASP 6.4.1 compiled using NVHPC 23.7.
I just switched to 22.11, and recompilation, I am able to get the correct results with GPU and IVDW 20, 21, and 202.
Thank you for pointing me in the right direction to solve this problem.
martin.schlipf wrote: Fri Oct 20, 2023 6:35 am I ran the input files you provided on a Nvidia Quadro RTX 5000 with both VASP 6.4.1 and the current master. In both cases I obtain -22.16304336 which is consistent with the correct values.

Which compiler did you use? I have nvfortran 22.11 installed.