Page 1 of 1

CUDA tests fail for VASP61+OpenMPI+sequential MKL+CUDA-11.0

Posted: Tue Jul 28, 2020 3:16 pm
by ungur.liviu
Dear VASP developers,

I am trying to copmile VASP for my Ubuntu workstation. I use the following configuration:

Linux Precision-7820 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
mpirun (Open MPI) 4.0.1
Intel MKL (sequential)
CUDA-11.0
NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0

The workstation has one GPU card Nvidia Quadro P5000 (16GB).

Compilation worked fine using the attached makefile.include. Verification of the vasp_std, vasp_gam, vasp_ncl passed OK.
Verification of vasp_gpu and vasp_gpu_ncl fails for most of the tests. I am attaching here the log and err.
I run the tests as:

./runtest cuda.conf > test_cuda.log 2> test_cuda.err

with only modification of using a single MPI core.

Please advice about possible solutions. I wish to make use of this advanced GPU card.

Thank you in advance,

Liviu Ungur

Re: CUDA tests fail for VASP61+OpenMPI+sequential MKL+CUDA-11.0

Posted: Wed Jul 29, 2020 10:55 am
by ungur.liviu
I put here the makefile.include, the test log file and obtained error.

Code: Select all

# Precompiler options
CPP_OPTIONS= -DHOST=\"LinuxGNU\" \
			 -DMPI -DMPI_BLOCK=8000 -Duse_collective \
             -DCACHE_SIZE=4000 \
             -Davoidalloc \
             -Dvasp6 \
             -Duse_bse_te \
             -Dtbdyn \
			 -Dfock_dblbuf \
             -DVASP2WANNIER90v2 \
             -Dlibbeef

CPP        = gcc -E -P -C -w $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)

FC         = /opt/openmpi401_gcc730/bin/mpif90 -m64 -I/opt/intel/mkl/include -I/opt/intel/mkl/include/intel64/lp64
FCL        = /opt/openmpi401_gcc730/bin/mpif90 -m64 -I/opt/intel/mkl/include -I/opt/intel/mkl/include/intel64/lp64

FREE       = -ffree-form -ffree-line-length-none 

FFLAGS     = -w -m64
OFLAG      = -O2 -march=native -m64
OFLAG_IN   = $(OFLAG)
DEBUG      = -O0

LLIBS     += /opt/wannier90-2.1.0_gnu/libwannier.a
LLIBS     += /opt/libbeef_gnu/lib/libbeef.a


MKLROOT    = /opt/intel/mkl
MKL_PATH   = /opt/intel/mkl/lib/intel64
BLAS       = /opt/intel/mkl/lib/intel64/libmkl_blas95_lp64.a /opt/intel/mkl/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-group /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.a /opt/intel/mkl/lib/intel64/libmkl_sequential.a /opt/intel/mkl/lib/intel64/libmkl_core.a -Wl,--end-group -lpthread -lm -ldl
BLACS      =
SCALAPACK  =
LLIBS      += $(BLAS)
INCS       = -I/opt/intel/mkl/include/intel64/lp64 -I/opt/intel/mkl/include

LLIBS      += $(SCALAPACK) $(LAPACK) $(BLAS)


OBJECTS    = fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d.o
INCS       = -I/opt/intel/mkl/include/intel64/lp64 -I/opt/intel/mkl/include  -I/opt/intel/mkl/include/fftw

OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB     = $(FC)
CC_LIB     = gcc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB   = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# For the parser library
CXX_PARS   = g++
LIBS       += parser
LLIBS      += -Lparser -lparser -lstdc++

# Normally no need to change this
SRCDIR     = ../../src
BINDIR     = ../../bin

#================================================
# GPU Stuff

CPP_GPU    = -DCUDA_GPU -DRPROMU_CPROJ_OVERLAP -DUSE_PINNED_MEMORY -DCUFFT_MIN=28 -UscaLAPACK -Ufock_dblbuf

OBJECTS_GPU= fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d_gpu.o fftmpiw_gpu.o

CC         = gcc
CXX        = g++
CFLAGS     = -fPIC -DADD_ -Wall -fopenmp -DMAGMA_WITH_MKL -DMAGMA_SETAFFINITY -DGPUSHMEM=300 -DHAVE_CUBLAS

CUDA_ROOT  ?= /usr/local/cuda-11.0
NVCC       := $(CUDA_ROOT)/bin/nvcc -ccbin=g++

CUDA_LIB   := -L$(CUDA_ROOT)/lib64:$(CUDA_ROOT)/lib64/stubs -lnvToolsExt -lcudart -lcuda  -lcufft -lcublas  -I$(CUDA_ROOT)/include -lpthread -lm -ldl

GENCODE_ARCH    := -gencode=arch=compute_61,code=\"sm_61,compute_61\" 
MPI_INC    = /opt/openmpi401_gcc730/include/

A chunk form the test_cuda.log

Code: Select all

==================================================================
VASP TESTSUITE SHA:

Reference files have been generated with 4 MPI ranks.
Note that tests might fail if an other number of ranks is used!

Executables and additional INCAR tags used for this test:

VASP_TESTSUITE_EXE_STD="mpirun -np 1 /home/liviu/source/VASP61/vasp.6.1.0/testsuite/../bin/vasp_gpu"
VASP_TESTSUITE_EXE_NCL="mpirun -np 1 /home/liviu/source/VASP61/vasp.6.1.0/testsuite/../bin/vasp_gpu_ncl"

Executed at: 23_09_07/28/20
==================================================================

------------------------------------------------------------------
CASE: bulk_AlNC_RPR
------------------------------------------------------------------
bulk_AlNC_RPR step STD
------------------------------------------------------------------
Using device 0 (rank 0, local rank 0, local size 1) : Quadro P5000
 running on    1 total cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 using from now: INCAR     
  
 *******************************************************************************
  You are running the GPU port of VASP! When publishing results obtained with
  this version, please cite:
   - M. Hacene et al., http://dx.doi.org/10.1002/jcc.23096
   - M. Hutchinson and M. Widom, http://dx.doi.org/10.1016/j.cpc.2012.02.017
  
  in addition to the usual required citations (see manual).
  
  GPU developers: A. Anciaux-Sedrakian, C. Angerer, and M. Hutchinson.
 *******************************************************************************
  
 vasp.6.1.0 28Jan20 (build Jul 28 2020 22:33:10) complex                         
 POSCAR found :  2 types and       4 ions
 -----------------------------------------------------------------------------
|                                                                             |
|               ----> ADVICE to this user running VASP <----                  |
|                                                                             |
|     You have a (more or less) 'small supercell' and for smaller cells       |
|     it is recommended to use the reciprocal-space projection scheme!        |
|     The real-space optimization is not efficient for small cells and it     |
|     is also less accurate ...                                               |
|     Therefore, set LREAL=.FALSE. in the INCAR file.                         |
|                                                                             |
 -----------------------------------------------------------------------------

 LDA part: xc-table for Ceperly-Alder, standard interpolation
 POSCAR, INCAR and KPOINTS ok, starting setup
creating 32 CUDA streams...
creating 32 CUFFT plans with grid size 16 x 16 x 16...
 FFT: planning ...
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)

ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files energy_outcar and energy_outcar.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

ERROR: the test yields different results for the forces, please check
---------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files force and force.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

ERROR: the stress tensor is different, please check
---------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files stress and stress.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------


CASE: bulk_Alortho_RPR
------------------------------------------------------------------
bulk_Alortho_RPR step STD
------------------------------------------------------------------
Using device 0 (rank 0, local rank 0, local size 1) : Quadro P5000
 running on    1 total cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 using from now: INCAR     
  
 *******************************************************************************
  You are running the GPU port of VASP! When publishing results obtained with
  this version, please cite:
   - M. Hacene et al., http://dx.doi.org/10.1002/jcc.23096
   - M. Hutchinson and M. Widom, http://dx.doi.org/10.1016/j.cpc.2012.02.017
  
  in addition to the usual required citations (see manual).
  
  GPU developers: A. Anciaux-Sedrakian, C. Angerer, and M. Hutchinson.
 *******************************************************************************
  
 vasp.6.1.0 28Jan20 (build Jul 28 2020 22:33:10) complex                         
 POSCAR found :  2 types and       4 ions
 LDA part: xc-table for Ceperly-Alder, standard interpolation
 POSCAR, INCAR and KPOINTS ok, starting setup
creating 32 CUDA streams...
creating 32 CUFFT plans with grid size 18 x 16 x 16...
 FFT: planning ...
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)

ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files energy_outcar and energy_outcar.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

ERROR: the test yields different results for the forces, please check
---------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files force and force.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

ERROR: the stress tensor is different, please check
---------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files stress and stress.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------


CASE: bulk_AlPAWNC_RPR
------------------------------------------------------------------
bulk_AlPAWNC_RPR step STD
------------------------------------------------------------------
Using device 0 (rank 0, local rank 0, local size 1) : Quadro P5000
 running on    1 total cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 using from now: INCAR     
  
 *******************************************************************************
  You are running the GPU port of VASP! When publishing results obtained with
  this version, please cite:
   - M. Hacene et al., http://dx.doi.org/10.1002/jcc.23096
   - M. Hutchinson and M. Widom, http://dx.doi.org/10.1016/j.cpc.2012.02.017
  
  in addition to the usual required citations (see manual).
  
  GPU developers: A. Anciaux-Sedrakian, C. Angerer, and M. Hutchinson.
 *******************************************************************************
  
 vasp.6.1.0 28Jan20 (build Jul 28 2020 22:33:10) complex                         
 POSCAR found :  2 types and       4 ions
LDA part: xc-table for Ceperly-Alder, standard interpolation
 POSCAR, INCAR and KPOINTS ok, starting setup
creating 32 CUDA streams...
creating 32 CUFFT plans with grid size 16 x 16 x 16...
 FFT: planning ...
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)

CUDA Error in cuda_mem.cu, line 68: pointer does not correspond to a registered memory region
 Failed to unregister pinned memory!
 
ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files energy_outcar and energy_outcar.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

ERROR: the test yields different results for the forces, please check
---------------------------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files force and force.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

ERROR: the stress tensor is different, please check
---------------------------------------------------
 ---------------------------------------------------------------------------
 WARNING: Number of rows and/or columns in files stress and stress.ref disagree.
 Please check! Continuing using the smaller number of columns and/or rows.
 ---------------------------------------------------------------------------

A chunk form the test_cuda.err

Code: Select all


CASE: bulk_AlNC_RPR
entering run_recipe bulk_AlNC_RPR
bulk_AlNC_RPR step STD
entering run_vasp

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fdd7c65e2ed in ???
#1  0x7fdd7c65d503 in ???
#2  0x7fdd6ac60fcf in ???
#3  0x562d98ff9434 in ???
#4  0x562d990306cb in ???
#5  0x562d994803e0 in ???
#6  0x562d9946608f in ???
#7  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Precision-7820 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
exiting run_vasp
exiting run_recipe bulk_AlNC_RPR
CASE: bulk_Alortho_RPR
entering run_recipe bulk_Alortho_RPR
bulk_Alortho_RPR step STD
entering run_vasp

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f9aace012ed in ???
#1  0x7f9aace00503 in ???
#2  0x7f9a9b403fcf in ???
#3  0x562449917434 in ???
#4  0x56244994e6cb in ???
#5  0x562449d9e3e0 in ???
#6  0x562449d8408f in ???
#7  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Precision-7820 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
exiting run_vasp
exiting run_recipe bulk_Alortho_RPR
CASE: bulk_AlPAWNC_RPR
entering run_recipe bulk_AlPAWNC_RPR
bulk_AlPAWNC_RPR step STD
entering run_vasp

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f17562082ed in ???
#1  0x7f1756207503 in ???
#2  0x7f174480afcf in ???
#3  0x55e41efbf0a7 in _Z12__cuda_error9cudaErrorPKciS1_
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_globals.h:61
#4  0x55e41efbf16e in _Z12__cuda_error9cudaErrorPKciS1_
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_mem.cu:77
#5  0x55e41efbf16e in nvpinnedfree_C
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_mem.cu:68
#6  0x55e41a28ff0e in ???
#7  0x55e41a63c734 in ???
#8  0x55e41a6756cb in ???
#9  0x55e41aac53e0 in ???
#10  0x55e41aaab08f in ???
#11  0x55e41aac5b19 in ???
#12  0x7f17447edb96 in ???
#13  0x55e41a1ca259 in ???
#14  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Precision-7820 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
exiting run_vasp
exiting run_recipe bulk_AlPAWNC_RPR
CASE: bulk_AlPAW_RPR
entering run_recipe bulk_AlPAW_RPR
bulk_AlPAW_RPR step STD
entering run_vasp

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f6f021bc2ed in ???
#1  0x7f6f021bb503 in ???
#2  0x7f6ef07befcf in ???
#3  0x558da8cbf0a7 in _Z12__cuda_error9cudaErrorPKciS1_
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_globals.h:61
#4  0x558da8cbf16e in _Z12__cuda_error9cudaErrorPKciS1_
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_mem.cu:77
#5  0x558da8cbf16e in nvpinnedfree_C
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_mem.cu:68
#6  0x558da3f8ff0e in ???
#7  0x558da433c760 in ???
#8  0x558da43756cb in ???
#9  0x558da47c53e0 in ???
#10  0x558da47ab08f in ???
#11  0x558da47c5b19 in ???
#12  0x7f6ef07a1b96 in ???
#13  0x558da3eca259 in ???
#14  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Precision-7820 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
exiting run_vasp
exiting run_recipe bulk_AlPAW_RPR
CASE: bulk_AlPAWUS_RPR
entering run_recipe bulk_AlPAWUS_RPR
bulk_AlPAWUS_RPR step STD
entering run_vasp

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fd5ce36d2ed in ???
#1  0x7fd5ce36c503 in ???
#2  0x7fd5bc96ffcf in ???
#3  0x55838c76f0a7 in _Z12__cuda_error9cudaErrorPKciS1_
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_globals.h:61
#4  0x55838c76f16e in _Z12__cuda_error9cudaErrorPKciS1_
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_mem.cu:77
#5  0x55838c76f16e in nvpinnedfree_C
	at /home/liviu/source/VASP61/vasp.6.1.0/build/gpu/CUDA/cuda_mem.cu:68
#6  0x558387a3ff0e in ???
#7  0x558387dec734 in ???
#8  0x558387e256cb in ???
#9  0x5583882753e0 in ???
#10  0x55838825b08f in ???
#11  0x558388275b19 in ???
#12  0x7fd5bc952b96 in ???
#13  0x55838797a259 in ???
#14  0xffffffffffffffff in ???
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node Precision-7820 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I attempted to attach a zip file to this message, but it didn't work: "Sorry, the board attachment quota has been reached."
I am open to providing additional information about this error if needed.