VASP test tun: Inconsistency in the energy output (TOTEN)

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
mokhotjwa_dhalamini
Newbie
Newbie
Posts: 3
Joined: Mon Jul 14, 2025 1:42 pm

VASP test tun: Inconsistency in the energy output (TOTEN)

#1 Post by mokhotjwa_dhalamini » Tue Aug 26, 2025 10:00 am

Dear colleague,

We have recently purchased licenses for VASP which has since been installed on our HPC clusters.
I decided to test with a simple job, of 64 atoms Si supercell, and to obtain the final total energy (TOTEN). The following are my observations: with 10 cpus, I obtained the following final total energy:

CPU_10: 970.5012073 eV
However, I decided to run the same job with exactly the same set of parameters thrice, and on each of these instances, I obtained 3 different results, as below:

CPU_10_run1: 980.21186019 eV
CPU_10_run2: 977.75296928 eV
CPU_10_run3 966.45036876 eV

Even more bizarre was when I simply decided to increase the number of CPUs (for the same job) to 20, 30 and 40, I obtained different TOTEN for each, as follows:

CPU_20: 963.50386533 eV
CPU_30 972.72262986 eV
CPU_40 972.36611149 eV

I emphasize that for each of the instances above, exactly the same set of parameters were used. The only difference is that the jobs were either run with a different number of CPUs or were run at different times. Below, I provide the INCAR file use for all the jobs:

ENCUT = 520
GGA = PE
####IVDW = 0
ALGO = All
EDIFF = 1e-3
ISMEAR = 0
LASPH = .TRUE.
PREC = Normal
SIGMA = 0.2
EDIFFG = -0.01
IBRION = 2
ISIF = 3
ISYM = 1
NSW = 1000

Thank you.


andreas.singraber
Global Moderator
Global Moderator
Posts: 371
Joined: Mon Apr 26, 2021 7:40 am

Re: VASP test tun: Inconsistency in the energy output (TOTEN)

#2 Post by andreas.singraber » Wed Aug 27, 2025 12:48 pm

Hello!

It seems the structure optimization you are running (IBRION = 2) is not converging with the given settings (maybe because of EDIFF, EDIFFG or ISIF...). However, without further information I can only speculate about the origin of the issue. Please upload all required input (e.g. INCAR, POTCAR, KPOINTS, POSCAR, submit script,...) and useful output files (OUTCAR, OSZICAR, screen output,...) according to the forum posting guidelines so I can have a closer look. Thank you!

All the best,
Andreas Singraber


mokhotjwa_dhalamini
Newbie
Newbie
Posts: 3
Joined: Mon Jul 14, 2025 1:42 pm

Re: VASP test tun: Inconsistency in the energy output (TOTEN)

#3 Post by mokhotjwa_dhalamini » Thu Aug 28, 2025 10:10 am

Dear VASP team,

thanks for your response.
The attached (.zip) folder are the inputs and the output files for the VASP jobs described earlier. The job submission file, i.e., 'job_script_vasp', is also part of the zipped file.

Thank you.

You do not have the required permissions to view the files attached to this post.

andreas.singraber
Global Moderator
Global Moderator
Posts: 371
Joined: Mon Apr 26, 2021 7:40 am

Re: VASP test tun: Inconsistency in the energy output (TOTEN)

#4 Post by andreas.singraber » Fri Aug 29, 2025 10:15 pm

Hello!

Unfortunately I could not reproduce the problem you described. I was able to use a very similar compiler (Intel oneAPI 2022) with both, Intel MPI and OpenMPI. Also, I tried various numbers of MPI tasks including 10, 20 and 30 tasks and I only use the input files you sent previously. So everything seems very similar and indeed the OUTCAR files are almost identical... except for one contribution labelled "-V(xc)+E(xc) XCENC" which already differs in the first iteration (see also the attached OUTCAR file).

outcar-comparison.png

The huge difference continues in all iterations and prevents convergence in your case. In my tests there was no noticeable difference in energies between different executions. All trials converged within two steps to the same TOTEN energy.

Currently I have no clue where this discrepancy could come from... my only guess is that there could be an issue with the optimization level used during compilation. Usually we pick -O2 as default optimization flag but for some source files we even need to resort to the less aggressive -O1. Maybe your combination of compiler and hardware requires further restrictions on the optimization level. I will discuss this possibility with my colleagues but we would need some more information from your side. Could you please provide the makefile.include used for the VASP build. Also, please mention which hardware you are using. Were you able to successfully run the testsuite after building VASP?

Thank you very much!

Best regards,
Andreas Singraber

You do not have the required permissions to view the files attached to this post.

mokhotjwa_dhalamini
Newbie
Newbie
Posts: 3
Joined: Mon Jul 14, 2025 1:42 pm

Re: VASP test tun: Inconsistency in the energy output (TOTEN)

#5 Post by mokhotjwa_dhalamini » Fri Sep 12, 2025 4:32 pm

Thanks,

The hardware we are using is the following:

HPE DL360 Gen9 servers
2 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz with 14 cores on each socket.
256GB memory.

The most important change we have done with respect to the installation is that we recompiled with the -O1 optimization level. With this level of compilation, we successfully ran the testsuite after building VASP. Also, we also obtained consistent results (with varying number of cpus) for the 64-atom silicon relaxation job. Recall that you had tested the same job, and I can confirm to you now that your result and ours now look similar. So, from all indication, the -O1 optimization seems to be working fine. Recall that the -O2 optimization level was given erratic and inconsistent results (with changing number of CPUs), moreover, the VASP testsuites jobs failed with that optimization.

However, we are facing a different problem with the -O1 optimization, i.e, the time taken for the jobs to be completed are lower when we compared to the referenced VASP testsuite jobs. Perhaps, you have any suggestion on why this is the case or how we can achieve similar fast completion time.

I have attached with this mail the make.include (“makefile.include_-O1-level”) and the test log file (“testsuite.log_-O1-level”) for -O1 optimization.

In the meantime, I have also attached the makefile (“makefile.include_-O2-level”) and the testsuite log file (testsuite.log_-O2-level) for the -O2 optimization that was given the erratic results as earlier reported. Also, with this level of optimization, the tests could not be successfully completed as you will find in the log.

Best regards.

You do not have the required permissions to view the files attached to this post.

andreas.singraber
Global Moderator
Global Moderator
Posts: 371
Joined: Mon Apr 26, 2021 7:40 am

Re: VASP test tun: Inconsistency in the energy output (TOTEN)

#6 Post by andreas.singraber » Sat Sep 13, 2025 10:11 pm

Hello!

Thank you for uploading the files, this helped a lot! Yes, of course compiling the whole code with -O1 will significantly deteriorate the performance, the idea was to selectively lower the optimization level for individual source files which cause the problem...

Anyway, this is not necessary here, luckily I have found a better solution :)! I was able to reproduce the issue on a similar machine (Intel E5-2697A v4 16-core CPU) with the same or very similar compiler (Intel oneAPI 2022.0.1) using your makefile.include. The crucial point here is that the Intel oneAPI kit at that time offered two distinct Fortran compilers:

  1. The Intel C/C++ and Fortran Compiler Classic with the compiler binaries icc, icpc, ifort and the MPI wrappers mpiicc, mpiicpc, mpiifort.
  2. The Intel oneAPI C/C++ and Fortran Compiler with the compiler binaries icx, icpx, ifx and the MPI wrappers mpiicx, mpiicpx, mpiifx. At oneAPI 2022 this compiler was relatively new and the MPI wrappers were actually only available later, in the beginning one had to use command line arguments like this: mpiifort -fc=ifx (see also https://www.vasp.at/forum/viewtopic.php ... ifx#p30437).

In my previous tests I was using the classic compiler and only now I saw in your makefile.include that you were using the newer ifx compiler. So I also tried the newer one and indeed I got the same wrong results as you showed before. Hence, it seems that the issue is only present with the ifx compiler but not with ifort. Then, I tried the next oneAPI kit I had available which is oneAPI 2023.2.1. There, also the ifx compiler gave me the correct results! Hence, my conclusion is as follows: I do not know exactly the origin of the wrong results but all the symptoms point to an issue with the ifx compiler in Intel's oneAPI 2022 which seems to be resolved in the oneAPI 2023.

So you have two options available:

  1. If you do not have a newer compiler available then switch to the classic compiler, i.e., use the older ifort, icc and icpc compilers (makefile.include.intel instead of makefile.include.oneapi)
  2. Or, just switch to a newer Intel oneAPI where the Fortran compiler should not produce the issues.

Hope this works for you as well!

All the best,
Andreas Singraber


Post Reply