VASP GPU Performance

Message

dominic_varghese · #1 Post by **dominic_varghese** » Tue Jun 30, 2026 1:31 pm

Hi everyone,

I am running VASP/6.5.1-nvhpc-gpu to speed up my AIMD calculations on metal with dense k-grid on a V100 GPU. The following is my submit script :

Code: Select all

#SBATCH --gpus=v100-32:8           
#SBATCH --ntasks=8                  
#SBATCH --cpus-per-task=4            

ulimit -s unlimited

module purge
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/packages/hdf5/hdf5-2.0.0/nvhpc/lib
module use -a /opt/packages/nvhpc/v25.5/modulefiles
module load nvhpc-hpcx-cuda12/25.5 intel-mkl/2023.2.0 VASP/6.5.1-nvhpc-gpu
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

mpirun -np 8 vasp_std > log

and the INCAR for the NPT run at 300K on a 2x2x2 supercell with 72 atoms:

Code: Select all

ISTART = 0

# Hardware & Performance
NCORE   = 8
ALGO    = Normal
PREC    = Normal
NSIM = 16

# Electronic Optimization (Matched to paper)
ENCUT   = 450
EDIFF   = 1.0e-8
NELM    = 100
NELMIN  = 4
GGA     = PS

# Smearing 
ISMEAR  = 0
SIGMA   = 0.01

LREAL  = A      # (Projection operators: automatic)

ML_ISTART = 0     # Start from scratch

ISYM    = 0      # Essential for MD

# --- MD & NPT Settings ---
IBRION  = 0
NSW     = 1000
POTIM   = 1.0
TEBEG   = 300
TEEND   = 300

MDALGO  = 3      # Langevin
ISIF    = 3      # Variable cell (NPT)

# Friction coefficients 
LANGEVIN_GAMMA   = 10.0 10.0 10.0 
LANGEVIN_GAMMA_L = 10.0
PMASS            = 100

ML_LMLFF = .TRUE. # Enable Machine Learning Force Field
ML_MODE  = TRAIN  # Train on the fly

Is this the best way to get the maximum performance and speed-up from running the code on GPU compared to CPU?

Are there any suggestions/mistakes which I am making ?

Thanks
Dominic

#2 Post by **michael_wolloch** » Tue Jun 30, 2026 2:32 pm

Dear Dominic,

Unfortunately, performance optimization is no easy task, and you will have to do some benchmark calculations to test out your settings.

Make sure you read our guide on Optimizing the parallelization.

To benchmark, I would use the same system you are trying to run MD on, but only do around 15-20 SCF steps and no ionic updates. Make sure that you only tweak one setting at a time (e.g., NSIM, or OMP_NUM_THREADS) and systematically go through combinations.

Some general points:

KPAR is for sure your best friend in increasing performance if you have more than 1 kpoint. 72 atoms is a pretty small system, so you might not run on the Gamma point only (please provide all input files as a zipped archive in the future as per the posting guidelines). So, if you have N kpoints, run on N GPUs (if possible) and use KPAR=N. Note that your VRAM usage will increase if you increase KPAR, so on 32GB V100 GPUs, this could become an issue at some point.
ALGO=Fast is usually better than ALGO=Normal for performance on GPU, especially if more GPUs are used. You will have to test for your system, however.
Try fewer GPUs if you don't have as many kpoints as GPUs. VASP generally has a hard time utilizing the compute of large GPUs in most calculations and is memory bandwidth bound.
NCORE will be set to 1 for all GPU runs internally. Don't bother with it.
The number of MPI ranks should always be equal to the number of GPUs you are running on.
Compile with NCCL if you have not already done so.
Your SIGMA is pretty small; you might need fewer SCF steps per ionic step if you increase it. But be sure to check if your entropy term is still reasonable.

Let me know if you have more questions,
Cheers, Michael

VASP Forum

VASP GPU Performance

VASP GPU Performance

Re: VASP GPU Performance