Page 1 of 1


Posted: Thu Feb 25, 2016 5:52 pm
by kachme
Hello dear VASP team,

last week I compiled the GPU version of VASP with this Makefile:

Code: Select all

# Precompiler options
CPP_OPTIONS= -DMPI -DHOST=\"Lichteb-5.41-gpu-half\" -DIFC \
             -DNGXhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
             -DMPI_BLOCK=8000 -Duse_collective \
             -DnoAugXCmeta -Duse_bse_te \
             -Duse_shmem -Dkind8

CPP        = fpp -f_com=no -free -w0  $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

FC         = mpiifort
FCL        = mpiifort -mkl -lstdc++

FREE       = -free -names lowercase

FFLAGS     = -assume byterecl
OFLAG      = -O2
DEBUG      = -O0

MKL_PATH   = $(MKLROOT)/lib/intel64
BLAS       =
LAPACK     =
BLACS      = -lmkl_blacs_intelmpi_lp64
SCALAPACK  = $(MKL_PATH)/libmkl_scalapack_lp64.a $(BLACS)

OBJECTS    = fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d.o \

INCS       =-I$(MKLROOT)/include/fftw


#OBJECTS_O1 += fft3dfurth.o fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB     = $(FC)
CC_LIB     = icc

OBJECTS_LIB= linpack_double.o getshmem.o

# Normally no need to change this
SRCDIR     = ../../src
BINDIR     = ../../bin

# GPU Stuff


OBJECTS_GPU = fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d_gpu.o fftmpiw_gpu.o \

CUDA_ROOT  := /shared/apps/cuda/7.5
NVCC       := $(CUDA_ROOT)/bin/nvcc
CUDA_LIB   := -L$(CUDA_ROOT)/lib64 -L$(CUDA_ROOT)/lib64/stubs -lnvToolsExt -lcudart -lcuda -lcufft -lcublas

GENCODE_ARCH    := -gencode=arch=compute_30,code=\"sm_30,compute_30\" -gencode=arch=compute_35,code=\"sm_35,compute_35\"

MPI_INC    =/shared/apps/intel/2016u2/compilers_and_libraries_2016.2.181/linux/mpi/intel64/include
However, when I try to run a calculation on one node (16 cores 32 GByte main memory, 2x “NVIDIA® Tesla™ K20X”) I get the following error message for a system of 318 atoms.

Code: Select all

creating 32 CUFFT plans with grid size 126 x 126 x 126...

 Failed to create CUFFT plan!
which refers to some kind of problem with memory (although in the cpu version it runs without problems). And the memory usage is small (from the the LSF outfile)

Code: Select all

Exited with exit code 255.

Resource usage summary:

    CPU time :               35.24 sec.
    Max Memory :             195 MB
    Average Memory :         195.00 MB
    Total Requested Memory : 28016.00 MB
    Delta Memory :           27821.00 MB
    (Delta: the difference between total requested memory and actual max usage.)
    Max Processes :          8
    Max Threads :            9

If I reduce the system size (40 atoms, 2x2x2 KP) it runs without errors, but very slowly: ~10times slower than the cpu version. I even aborted the 4x4x4 KPOINTS run, because it was just too slow. Playing around with NSIM doesn*t seem to change much.

My guess is that is has something to do with the compilation. I would like to experiment with values given in the first part of the Makefile, the CPP_OPTIONS (-DCACHE_SIZE=4000, -DMPI_BLOCK=8000), but I have no idea which values to plug in.

Help would be very much appreciated, thank you,
Kai Meyer


Posted: Thu Dec 21, 2017 3:29 am
by zhouych
Hi Kai,

I encounter the same issue. Have you solved it? Thanks.

Yecheng Zhou


Posted: Wed Jun 13, 2018 4:19 pm
by jperaltac
Hello, I have the same issue with vasp 5.4.4 and cuda 9.1

There is any hint or solution to this issue?

Thanks in advance


Posted: Thu Jul 11, 2019 12:35 am
by hacret
You can try re-complie the source adding the followings at the GPU stuff.

#GPU Stuff
OBJECTS_GPU = fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d_gpu.o fftmpiw_gpu.o
CC = icc
CXX = icpc

CUDA_ROOT := /usr/cuda
NVCC := $(CUDA_ROOT)/bin/nvcc -ccbin=icc -std=c++11
CUDA_LIB := -L$(CUDA_ROOT)/lib64 -lnvToolsExt -lcudart -lcuda -lcufft -lcublas
GENCODE_ARCH := -gencode=arch=compute_30,code=\"sm_30,compute_30\"
MPI_INC = $(I_MPI_ROOT)/intel64/include


Posted: Mon Nov 25, 2019 5:10 pm
by guiyang_huang1
The CPP_OPTIONS (-DCACHE_SIZE=4000, -DMPI_BLOCK=8000) should be irrelevant for your problem.

You can set any value, and compile it to check the effects on the performance. For example, 1024,2048 ......
According to my tests, they have a small influence on the performance.

There are many makefile.includes in the website for the intel cpu and gpu. You can find one.

I can not access system with intel cpu and gpu. I can only access the IBM power 9 with V100 gpu systems.

According my tests, magma has a very large influence on the speed of gpu vasp. I recommend you to install one.

Your problem seems to be related to insufficient memory. When the system size is smaller, it can run without error.

If the compilation has no problem, you can reduce the number of mpi /gpu.
Less number of mpi, it requires a less amount of memory.
For example, you can use the same number of cpu as that of gpu. That is to say, only one mpi is used for one gpu.
If memory is insufficient, KPAR can be decreased to 1.

If using magma, multi threading setting can improve the performance significantly.
If magma is not used, multi threading may have negative effects.

NSIM can be tuned. If memory is insufficient, a smaller NSIM can be used. I set it to NSIM=14.