Page 1 of 1

VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Fri Feb 06, 2026 11:31 am
by pablog._lustemberg1

Hello,

I am running vibrational frequencies with HSE06 in VASP 6.5.1 (Intel/IMPI). The job fails immediately/early with job.*err, while the same system and workflow runs fine with VASP 5.4.4 on the same machine.

In VASP 6.5.1, Slurm reports multiple oom_kill events and srun: Out Of Memory, and the step gets cancelled.

I can reproduce the same OOM-kill behavior in VASP 6.5.1 using 112, 224, 336, and 448 MPI ranks (tested on the same system and input). The job is terminated by Slurm with oom_kill / Out Of Memory messages.

Key environment/modules:
module load mkl impi intel hdf5/1.10.11 ucx vasp/6.5.1 and srun .../vasp_std.
Job layout: 2 nodes, 224 MPI tasks.
System: 42 atoms (Mg/O/C), k-mesh 1×2×1 (Gamma-centered).
Main INCAR:
ISTART = 0
ICHARG = 2
ALGO = All
NELMDL = -15
EDIFF = 1E-8
ENCUT = 600
EDIFFG = -0.01
IBRION = 5 ! Quasi Newton, faster-efficient close local minima
ISIF = 2
NFREE = 2
NELMIN = 8 ! Use together with IBRION = 1
NSW = 1
POTIM = 0.015
ISPIN = 1
#MAGMOM = 60*0.6
LORBIT = 11
NEDOS = 10001
PREC = Accurate
ISMEAR = 0
SIGMA = 0.05
LREAL= .TRUE.
#Hybrid Functional calculations:
LHFCALC = .TRUE. ! A hybrid XC potential should be used
TIME = 0.40 ! Trial time step for IALGO=5X
LMAXFOCK = 4 ! It might be required to increase, if the system contains f-electrons.
HFSCREEN= 0.207 ! Switch from the PBE0 to HSE03

NPAR = 224 ! IF

VASP OUTPUTs
NWRITE = 2
LCHARG = .TRUE.
LWAVE = .TRUE.

Parallelization: I tested NPAR=224 (current), and also tried NPAR=1 / other values: VASP 6.5.1 still gets OOM-killed.

job.*err excerpt (summary):
slurmstepd: error: Detected ... oom_kill events ...
srun: error: ... tasks ... Out Of Memory
I will attach a minimal reproducible zip

Specific questions
1. Is the OOM behavior in VASP 6.5.1 expected for hybrid + frequencies (IBRION=5)?
I am surprised because the same setup runs with VASP 5.4.4, and the system is small (42 atoms, 1×2×1 k-points).

2. Is my parallelization setting inappropriate for hybrid calculations in 6.5.1?
In particular: should NPAR be removed (or forced to 1) and instead use NCORE?
Could large MPI task counts (224 ranks) trigger excessive replicated-memory overhead in 6.5.1 for exact exchange?

3. Is there any recommended low-memory setting for HSE06 in VASP 6.5.1 for force-constant/frequency runs?
e.g., recommended combinations of MPI ranks vs. OMP threads, or specific tags that reduce Fock-memory footprint.


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Fri Feb 06, 2026 2:47 pm
by alex

Hi Pablo,

my guess is that if your machines are not well equipped with memory you'll never get through.

A workaround is to use less tasks per physical box, e.g. only half or even quarter to allow processes to allocate more memory per process.

Sorry for the bad news,

alex


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Fri Feb 06, 2026 6:39 pm
by pablog._lustemberg1

Hi Alex,

thanks for the quick reply.

I agree that an OOM suggests “not enough memory per MPI rank”, but I am not fully convinced this is simply “the machine has too little memory”: on MareNostrum 5 GPP each node has 256 GB RAM for 112 cores, and I can reproduce the same OOM-kill behavior with 112, 224, 336, and 448 MPI ranks (i.e., from ~ 1 node up to 4 nodes). So I did try reducing tasks per job substantially already.

Given that the same input runs fine with VASP 5.4.4 on the same system, my working hypothesis is that VASP 6.5.1 has a higher memory footprint for HSE06+frequencies (IBRION=5), or that the parallelization defaults in 6.5.1 (MPI distribution / exact-exchange data structures) lead to more replicated allocations per rank.

Could you point me to a specific VASP 6.x recommendation for reducing memory for hybrid + forces/frequencies, beyond “use fewer ranks”? For example:

  • Should I remove NPAR entirely and use NCORE (or set NCORE=1/2/…) in VASP 6.5.1?
  • Is there any known memory-sensitive behavior for IBRION=5 with hybrids in 6.5.1 (e.g., force-constant bookkeeping / Fock allocations) compared to 5.4.4?
  • Would you recommend running with fewer MPI ranks + OpenMP threads (e.g., 28–56 MPI ranks per node with OMP_NUM_THREADS=2–4) as the preferred low-memory setup for hybrids in 6.5.1?

If you think this is expected, do you know whether there is a version note / change explaining the increased memory demand from 5.4.4 to 6.5.1 for exact exchange / hybrid workflows?

Best regards,
Pablo


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Sun Feb 08, 2026 3:45 pm
by alex

Hello Pablo,

honestly, I didn't dig that deep into it.
I'm sorry, but I'm not of any further help here.

Best regards,

alex


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Mon Feb 16, 2026 11:30 am
by andreas.singraber

Hello Pablo!

Sorry for the late reply! I did some testing on our systems with the setup you provided. I can confirm your finding that the memory footprint increased going from VASP 5.4.4 to 6.5.1 for your particular type of calculation. Unfortunately, I do not know the reason for this increased demand given identical settings, probably there has been an update on how to balance CPU vs. memory (towards being more "greedy" for memory). However, I can suggest some minor change which should avoid the OOM errors on your machines. For my tests, I did not have the same machine available (2x Intel 8480+ 56-cores, 256 GB) but an AMD one with 2x EYPC 7773X, 64 cores, 1TB. Although I was able to run the job (utilising the entire machine with 128 cores), the memory demand was high with roughly 350 GB. That already exceeds the amount of memory you have available and thus explains the "out-of-memory" error you observed.

As you already mentioned, setting NPAR is discouraged in favor of the NCORE tag. Since you did not set KPAR (which defaults to 1) and set NPAR = 224 on 2 nodes you effectively selected NCORE = 1, i.e., the default value. However, as mentioned on the INCAR tag's Wiki page this comes with a high memory footprint. So I removed NPAR and replaced it with NCORE = 16. That brings down the memory consumption to approximately 48 GB in total!

Before you try this out immediately: setting NCORE = 16 makes sense on my machine because each AMD CPU has 64 cores arranged in 4 NUMA domains (which also overlap with the L3 cache domains), therefore a maximum of 64 / 4 = 16 cores share a fast connection to their adjacent memory and L3 cache. So it is a good starting point on my machine and I could further investigate whether a smaller (or even larger) value improves performance. Your Intel CPUs are similar in size with 56 cores each and most likely also set up in such way that 4 NUMA domains exist (you can check this with numactl -H or lscpu). Hence, I would suggest you start with NCORE = 14 on your machines!

Another option you could try is an OpenMP/MPI-hybrid parallelization. I tested this on the AMD machine and ended up with even further reduced memory consumption to roughly 17 to 20 GB:

Code: Select all

 MPI ranks | OpenMP threads | NCORE | memory [GB]
-----------|----------------|-------|------------
    128    |       1        |   1   |    350
    128    |       1        |  16   |     48
     16    |       8        |   1   |     20
      8    |      16        |   1   |     17

Please note that hybrid parallelization requires careful pinning of MPI ranks and OMP threads! The details are explained on the Wiki page linked above. Following the guidelines there I used this mpirun command for the 8-MPI-ranks/16-OMP-threads job:

Code: Select all

mpirun -np 8 -genv I_MPI_PIN_DOMAIN=omp -genv I_MPI_PIN=yes -genv OMP_NUM_THREADS=16 -genv OMP_STACKSIZE=512m -genv OMP_PLACES=cores -genv OMP_PROC_BIND=close -genv I_MPI_DEBUG=4 /path/to/vasp/executable

Hope this helps you getting the jobs to run on your HPC setup!

All the best,
Andreas Singraber


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Mon Feb 16, 2026 1:19 pm
by pablog._lustemberg1

Hi Andreas,

many thanks for the thorough testing and the very clear explanation — this is extremely helpful.

I followed your suggestion on MareNostrum 5 GPP (Intel Sapphire Rapids, 2×56 cores/node, 256 GB) and **replaced `NPAR` by `NCORE=14` (KPAR=1)**. With this change, the **HSE06 frequency run (IBRION=5)** that was previously killed by Slurm OOM in **VASP 6.5.1** is now **running successfully** on our nodes.

This strongly supports your conclusion that the **memory footprint increased from 5.4.4 to 6.5.1** for this workflow and that the **default NCORE=1 behavior can be very memory-hungry** in 6.5.1.

Thanks also for the OpenMP/MPI hybrid option and the pinning notes — I will test that next for performance once the production runs finish.

Best regards,
Pablo


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Mon Feb 16, 2026 2:05 pm
by pablog._lustemberg1

Hi Andreas,

quick update: after switching to NCORE=14 (and not setting NPAR anywhere in my INCAR), the job now aborts with the following VASP error:

Code: Select all

 gam= 0.585 trial= 0.389  step=  0.3285 mean=  0.3891
 final diagonalization occupied
   1 F= -.28757424E+03 E0= -.28757424E+03  d E =-.807786E-11
 Finite differences POTIM= 0.01500 DOF=  18
 -----------------------------------------------------------------------------
|                                                                             |
|     EEEEEEE  RRRRRR   RRRRRR   OOOOOOO  RRRRRR      ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     EEEEE    RRRRRR   RRRRRR   O     O  RRRRRR       #       #       #      |
|     E        R   R    R   R    O     O  R   R                               |
|     E        R    R   R    R   O     O  R    R      ###     ###     ###     |
|     EEEEEEE  R     R  R     R  OOOOOOO  R     R     ###     ###     ###     |
|                                                                             |
|     VASP internal routines have requested a change of the k-point set.      |
|     Unfortunately, this is only possible if NPAR=number of nodes.           |
|     Please remove the tag NPAR from the INCAR file and restart the          |
|     calculation.                                                            |
|                                                                             |
|       ---->  I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <----       |
|                                                                             |

So the confusing part is that NPAR is not present in my INCAR anymore, and I am already running with KPAR=1 (default) and NCORE=14.

This makes me suspect that, in our VASP 6.5.1 module/build, NPAR might be enforced via a wrapper, an environment/default, or there is a build-specific behavior/patch that triggers this check even when NPAR is not set by the user.

I will therefore contact the MN5 administrators to request a fresh VASP 6.5.x build (or an updated module) that avoids this issue.

If you have seen this behavior before in 6.5.1 (k-point set change requiring NPAR=number of nodes), I’d appreciate any hint on what typically triggers it.

Best regards,
Pablo


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Mon Feb 16, 2026 2:55 pm
by andreas.singraber

Hello Pablo!

Hmm.. I did not observe this error on my side. However, I may not have run long enough for this to occur. I will try again, continue the simulation and let you know. The message is probably very old and misleadingly using the term NPAR even if only NCORE was set in the INCAR. There may still be some kind of incompatibility that enforces a specific value of NCORE. Technically, it could be possible that the source code was modified (I doubt that) but I would rule out that a default value automatically added to the INCAR file could trigger this behavior because then NPAR would just take precedence and you would not encounter this error.

Did you run your simulations from a clean directory or was there any existing VASP data (WAVECAR, CHGCAR)?

All the best,
Andreas Singraber


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Mon Feb 16, 2026 4:45 pm
by andreas.singraber

Hello again!

Ok, I got the same error message as you when I repeated the NCORE = 16 run and actually waited for the first ionic step to complete (the memory consumption numbers are still correct). I have to discuss this with my colleagues but it seems that what the error actually wants to tell us is that we cannot leave the default ISYM = 3 (for LHFCALC = .TRUE.) together with IBRION = 5. So basically you have two options to avoid this:

  1. Disable symmetry by setting ISYM = 0 if you intend to use NCORE > 1.

  2. Use hybrid parallelization as mentioned before which automatically sets NCORE = 1 and seems to work with ISYM = 3.

I will try to confirm that option 1. is really the only way to go forward besides hybrid parallelization and whether the error message should be changed/moved to disallow the combination we encountered here.

All the best,
Andreas Singraber


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Mon Feb 16, 2026 7:20 pm
by pablog._lustemberg1

Hi Andreas,

thanks a lot — that explanation makes perfect sense.

I tested **option (1)** on our MN5 setup (**`ISYM=0` with `NCORE>1`**) and the calculation is now running smoothly. It has already completed **7/36 displacements** without issues.

Many thanks for your dedication and for taking the time to reproduce and diagnose this — you (and your colleagues) are amazing.

Best regards,
Pablo


Re: VASP 6.5.1: HSE06 vibrational frequencies (IBRION=5) killed by Slurm OOM (works in 5.4.4)

Posted: Thu Feb 19, 2026 4:35 pm
by andreas.singraber

Hey Pablo,

I am glad it works now! Thanks for your kind words, we really appreciate that! I discussed this issue with my colleagues and we agreed that there should be mechanism to prevent the problematic combination of ISYM, IBRION and NCORE which stops the code and give a helpful warning message before even running the first iteration. I cannot promise it for the upcoming release but it is definitely on our ToDo list.

All the best,
Andreas Singraber