Page 1 of 1

LRF_COMMUTATOR internal error

Posted: Sun Sep 28, 2025 12:41 am
by jasius

A complex error first time I am seeing, I am trying to calculate frequencies and Raman intesnities using LEPSILON=TRUE:

1 F= -.25195713E+04 E0= -.25195713E+04 d E =-.827521E-13 mag= 5.0000
Linear response reoptimize wavefunctions to high precision
DAV: 1 -0.248668774626E+04 0.32884E+02 -0.18179E-07 5600 0.288E-03
DAV: 2 -0.250114746980E+04 -0.14460E+02 -0.88217E-01 5536 0.180E-02
DAV: 3 -0.362254671999E+04 -0.11214E+04 -0.56507E+01 5168 0.601E-01
Linear response G [H, r] |phi>, progress :
Direction: 1
N E dE d eps ncg rms
LRF_COMMUTATOR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthog
onal to |phi(0)> (423.055018455741,-1.475325195343657E-009)
LRF_COMMUTATOR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthog


Re: LRF_COMMUTATOR internal error

Posted: Tue Sep 30, 2025 8:08 am
by alexey.tal

Dear jasius,

Thank you for your question.

I ran this job with VASP 5.4.4 on one of our machines and I wasn't able to reproduce the issue.
Although, I had to reduce the number of MPI ranks to 16 as I don't have access to a large computer to run it on 256 MPI-ranks as you did.
Could you try to run this job on 16 ranks and see if the error persists?

What seems to the problem is this:

Linear response reoptimize wavefunctions to high precision
DAV: 1 -0.248668774626E+04 0.32884E+02 -0.18179E-07 5600 0.288E-03
DAV: 2 -0.250114746980E+04 -0.14460E+02 -0.88217E-01 5536 0.180E-02
DAV: 3 -0.362254671999E+04 -0.11214E+04 -0.56507E+01 5168 0.601E-01

In this step, the wavefunctions should be optimized further after the convergence was achieved. However, in the second iteration of your calculation something fails and the the energy change is much too large. In my calculations I get a smooth optimization further to <1E-8.

What compilers and libraries do you use?
Also, VASP 5.4.4 is really old at that point. Do you have access to a more recent version?

Best wishes,
Alexey


Re: LRF_COMMUTATOR internal error

Posted: Wed Oct 01, 2025 2:39 am
by jasius

Hi Alexey, that is what I did, I have 16 MPI on 256 node. It handles all my VASP jobs but this error is completely new. I also did some other LEPSILON=TRUE jobs on the same node no problem. Is there anything else I need to specify in INCAR besides NCORE? I think those are the nodes below

One 128-core AMD Epyc 9745 256 2,304 GB DDR5 ECC 3TB SSD AVX, AVX2, AVX512, AMD, EPYC9745, TURIN, CERES25

#-------------------
#parallel stuff
NCORE = 16

#!/bin/bash
#SBATCH -C CERES25
#SBATCH --nodes=1
#SBATCH --tasks-per-node=256
#SBATCH -t 300:00:00
#SBATCH -o vasp-%j.out
#SBATCH -e vasp-%j.err
#SBATCH -p ceres
#SBATCH -A urea_kinetics
#SBATCH --mail-type=ALL
#SBATCH --mail-user=job314@lehigh.edu
#SBATCH -J /90daydata/urea_kinetics/struvite/Fedoped/VASP/supercell2x2x2_FeOH/fulloptg/NUPDOWN5_lowest/fulloptg/freq_400eV_normal_gamma_LREALfalse/intens

module unload intel
module load vasp
exe=`which vasp_std`
processors=$(( $SLURM_NNODES * $SLURM_NTASKS_PER_NODE ))

ulimit -s unlimited # remove limit on stack size

export VASP_RAMAN_RUN='mpirun -np 256 vasp_std &> job.out'
export VASP_RAMAN_PARAMS='01_463_2_0.01'

python /90daydata/urea_kinetics/struvite/Fedoped/VASP/supercell2x2x2_FeOH/fulloptg/NUPDOWN5_lowest/fulloptg/freq_400eV_normal_gamma_LREALfalse/intens/vasp_raman_frozen2.py > vasp_raman.out


Re: LRF_COMMUTATOR internal error

Posted: Wed Oct 01, 2025 9:50 am
by alexey.tal

So you are running this job on 256 MPI ranks, right?

export VASP_RAMAN_RUN='mpirun -np 256 vasp_std &> job.out'

To run it on 16 MPI ranks you would need to change this line to:

export VASP_RAMAN_RUN='mpirun -np 16 vasp_std &> job.out'

NCORE determines the internal parallelization scheme, but it doesn't change the total number of MPI ranks that perform the calculation.


Re: LRF_COMMUTATOR internal error

Posted: Wed Oct 01, 2025 1:19 pm
by jasius

I submitted, but does that mean I will only run it on 16 cores ouf of my 256?

export TMPDIR=/tmp
export TMOUT=5400
export SINGULARITY_TMPDIR=/tmp
running on 16 total cores
distrk: each k-point on 16 cores, 1 groups
distr: one band on 16 cores, 1 groups
using from now: INCAR
vasp.5.4.4.18Apr17-6-g9f103f2a35 (build May 01 2020 14:29:13) complex

POSCAR found : 6 types and 463 ions
scaLAPACK will be used


Re: LRF_COMMUTATOR internal error

Posted: Wed Oct 01, 2025 3:22 pm
by alexey.tal

Yes, but it is just a test to make sure that this issue is reproducible on a different number of ranks.
Furthermore, it is not guarantee that any job will run faster on 256 MPI than 16 ranks. It depends on hardware, job size, and parallelization settings.


Re: LRF_COMMUTATOR internal error

Posted: Thu Oct 02, 2025 8:07 pm
by jasius

Intreesintg thing: I switched to two older nodes with 96 cores each and the problem went away. It is there on a brand new 256 cpre node

#!/bin/bash
#SBATCH -C CERES20
#SBATCH --nodes=2
#SBATCH --tasks-per-node=96
#SBATCH -t 264:00:00
#SBATCH -o vasp-%j.out
#SBATCH -e vasp-%j.err
#SBATCH -p ceres
#SBATCH -A urea_kinetics
#SBATCH --mail-type=ALL
#SBATCH --mail-user=job314@lehigh.edu
#SBATCH -J /90daydata/urea_kinetics/struvite/Fedoped/VASP/supercell2x2x2_FeOH/fulloptg/NUPDOWN5_lowest/fulloptg/freq_400eV_normal_gamma_LREALfalse/intens

module unload intel
module load vasp
exe=`which vasp_std`
processors=$(( $SLURM_NNODES * $SLURM_NTASKS_PER_NODE ))

ulimit -s unlimited # remove limit on stack size

export VASP_RAMAN_RUN='mpirun -np 192 vasp_std &> job.out'
export VASP_RAMAN_PARAMS='01_463_2_0.01'

python /90daydata/urea_kinetics/struvite/Fedoped/VASP/supercell2x2x2_FeOH/fulloptg/NUPDOWN5_lowest/fulloptg/freq_400eV_normal_gamma_LREALfalse/intens/vasp_raman_frozen2.py > vasp_raman.out


Re: LRF_COMMUTATOR internal error

Posted: Fri Oct 03, 2025 8:58 am
by alexey.tal

Great! So you changed machines and used fewer MPI ranks and now it works. I think that reducing the number of MPI ranks did the trick. Unfortunately, I can't think of a more general solution to this problem as I can't reproduce it on our machines.