I am running VASP 6.5.1 on a 4xA100 GPU node on Perlmutter (https://docs.nersc.gov/systems/perlmutter/architecture) and frequently run into an error that is specific to this machine when ALGO = All.
The error seen in the stderr is:
Code: Select all
-----------------------------------------------------------------------------
| _ ____ _ _ _____ _ |
| | | | _ \ | | | | / ____| | | |
| | | | |_) | | | | | | | __ | | |
| |_| | _ < | | | | | | |_ | |_| |
| _ | |_) | | |__| | | |__| | _ |
| (_) |____/ \____/ \_____| (_) |
| |
| internal error in: rot.F at line: 822 |
| |
| EDWAV: internal error, the gradient is not orthogonal 1 2 6.446e-4 |
| |
| If you are not a developer, you should not encounter this problem. |
| Please submit a bug report. |
| |
-----------------------------------------------------------------------------I recognize that this is a problem that the VASP team has not been able to independently reproduce in the past, but I am reporting another instance of it in case it helps.
You can find one of many examples attached. I compiled VASP with the Cray-specific PrgEnv-nvidia module. I believe this error is very specific to how VASP is compiled. In the past, I have seen this error be sensitive to the optimization flag when compiling VASP. However, changing OFLAG from -fast to -O2 did not change anything.
My makefile.include is also attached:
Code: Select all
# Precompiler options
CPP_OPTIONS= -DHOST=\"LinuxNV_CrayMPICH\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dqd_emulate \
-Dfock_dblbuf \
-D_OPENMP \
-DACC_OFFLOAD \
-DNVCUDA \
-DUSENCCL \
-DPROFILING \
-DVASP_HDF5 \
-DVASP2WANNIER90 \
-Dlibvaspml \
-DVASPML_USE_CBLAS
# -DDFTD4
### Disabled for GPU build:
# -Dsysv \
# -DUSELIBXC \
# -Dlibbeef \
CPP = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX)
CC = nvc -mp -acc -gpu=cc80 #,cuda11.8
FC = ftn -mp -acc=gpu -gpu=cc80
FCL = ftn -v -mp -acc=gpu -gpu=cc80 -c++libs
FREE = -Mfree
FFLAGS = -Mbackslash -Mlarge_arrays
OFLAG = -fast
DEBUG = -Mfree -O0 -traceback
#OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o
LLIBS = -cudalib=cublas,cusolver,cufft,nccl -cuda
# Redefine the standard list of O1 and O2 objects
SOURCE_O1 := pade_fit.o minimax_dependence.o wave_window.o
SOURCE_O2 := pead.o
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = $(CC)
CFLAGS_LIB = -O -w
FFLAGS_LIB = -O1 -Mfixed
FREE_LIB = $(FREE)
OBJECTS_LIB = linpack_double.o
# For the parser library
CXX_PARS = nvc++ --no_warnings
#######
# Specify your NV HPC-SDK installation (mandatory)
NVROOT =$(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')
# Software emulation of quadruple precision (mandatory)
QD ?= $(NVROOT)/compilers/extras/qd
LLIBS += -L$(QD)/lib -lqdmod -lqd -Wl,-rpath=$(QD)/lib
INCS += -I$(QD)/include/qd
LLIBS += -L$(NVROOT)/math_libs/lib64 -Wl,-rpath=$(NVROOT)/math_libs/lib64
# mandatory
BLAS = -lblas
LAPACK = -llapack
SCALAPACK = -L/global/cfs/cdirs/omp/local/scalapack-2.1.0/nvidia-22.5/milan -lscalapack
LLIBS += $(SCALAPACK) $(LAPACK) $(BLAS)
# use cray-fftw module for FFTs
# optional packages:
# Use cusolvermp (not on Perlmutter as does not work with SS11/libfabric)
# supported as of NVHPC-SDK 24.1 (and needs CUDA-11.8)
#CPP_OPTIONS+= -DCUSOLVERMP
#LLIBS += -cudalib=cusolvermp
#CFLAGS_LIB += -cudalib=cusolvermp
#OBJECTS_LIB+= cal_mpi.o
# NCCL (GPU builds only)
LLIBS += -L$(NCCL_DIR)/lib -Wl,-rpath=$(NCCL_DIR)/lib
INCS += -I$(NCCL_DIR)/include
# HDF5 (vasp >6.2.0 only)
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# fftlib
#FCL += fftlib.o
#CXX_FFTLIB = nvc++ -mp --no_warnings -std=c++11 -DFFTLIB_THREADSAFE
#INCS_FFTLIB = -I./include -I$(FFTW_ROOT)/include
#LIBS += fftlib
#LLIBS += -ldl
# For machine learning library vaspml (experimental)
CXX_ML = CC
CXXFLAGS_ML = -O3 --c++17 --no_warnings
# PATH FOR PLUGIN BUILDS
EXTLIBDIR = $(CFS)/omp/local
# wannier90
WANNIER90_ROOT = /global/common/software/nersc9/vasp/dependencies/wannier90/nvidia-23.9/3.1.0/lib
LLIBS += -L$(WANNIER90_ROOT) -lwannier

