Page 1 of 1

Rot.F: EDWAV not orthogonal

Posted: Mon Mar 02, 2026 1:18 pm
by Nick_Gerrits

Dear VASP developers,

As you are well aware, users have been getting errors when using ALGO=Conjugate, getting the message "internal error in: rot.F at line: 822; EDWAV: internal error, the gradient is not orthogonal". Here I am providing my experiences with this issue, as well as an example where I have encountered this, since as far as I understood from the other topics in this forum it is still an unclear issue.

Generally speaking, I work with slabs and molecules (on/near the surface or in the gas phase) in the same calculation, which typically do not exhibit this error when using Conjugate. However, I encountered this issue really a lot when performing calculations on small bulk systems. I am not sure if there is even a specific pattern in those calculations when it fails. I also don't believe ISEARCH has an influence on this, other than that the "randomness" of when it exactly fails changes.

I have included several examples (Li, Na, V computed with M06L) with complete inputs and outputs (except of course the POTCAR), as well as some of the scripts and job files to run the calculations. These sets of calculations have been used to compute lattice constants for fcc and bcc solids through EOS fitting of single points, after first roughly determining the equilibrium constant through cell optimization (ISIF=3). In some cases, the issues became so severe that I had to drop the prior equilibration entirely and simply do only the single points for the EOS fit while hoping enough single points would run fine, because convergence is impossible for ALGO=Normal and Conjugate exhibits the aforementioned issue. I have used VASP 6.5.1. Note that the setup of these calculations might not be perfect (e.g., ISIF=4 for cubic systems) or sometimes even down right dirty, but they have done the trick for a large amount of calculations regardless.

PS. I know that M06L is a terrible DF to use for these kind of calculations (see the included examples). But that is exactly the point of running those specific calculations. Nevertheless, I have this problem with all kinds of functionals and their flavours (GGA, mGGA, with(out) D3, with(out) vdW-DF correlation).

PS2. I also managed to get the "internal error in: us.F at line: 1367; internal ERROR SETYLM_AUG" often enough. The way I can "reliably" reproduce this error is to perform a geometry optimization on a molecule present in S22 with a numerically hard to converge DF (like M06L, which might be one of the worst) and ALGO=Normal, with standard mixing parameters. In those cases, you typically cannot converge the SCF at all (like many orders of magnitude difference in energy going from one SCF cycle to another), causing the geometry optimization to work with terrible forces and therefore probably also at some point generating bad geometries that cause this specific error. One could argue it is a user/input error (the calculation doesn't converge at all, so what do you expect?), but in the error message it clearly states "If you are not a developer, you should not encounter this problem.; Please submit a bug report.". Therefore, I have also included an example for this case as well.


Re: Rot.F: EDWAV not orthogonal

Posted: Mon Mar 02, 2026 4:55 pm
by ahampel

Hi Nick,

Thank you for reaching out to us on the official VASP forum. I believe you forgot to attach the examples. If you upload them I am happy to take a look at them and see if we can do something about these issues.

Best,
Alex


Re: Rot.F: EDWAV not orthogonal

Posted: Tue Mar 03, 2026 7:26 am
by Nick_Gerrits

My apologies. I did attach them, but it seems something went wrong. Here are the files.

rot.F_error.tar.gz
us.F_error.tar.gz

Re: Rot.F: EDWAV not orthogonal

Posted: Tue Mar 03, 2026 10:01 am
by ahampel

Hi Nick,

I tried a two of these calculations now. Let me provide some feedback. I understand that this problem is very annoying but we do not have a simple fix for doing CGA calculations for metaGGAs. Especially the M06L is prone to very rough energy surfaces and it is hard to properly converge this, so using ALGO=Conjugate / All is the best solution often. Usually just changing a bit input parameters should fix this issue. If there is one that is very stubborn I can have a look. However, currently I could not reproduce the ZHEEVX failure for the final diagonalization step you saw. But I have some advice on how to make convergence better and maybe circumvent these problems all together.

  • please turn of LMIXTAU=.FALSE. , this is error prone and might not always work as expected

  • I would recommend starting the metaGGA calculation from a PBE calculation (see Formamide_dimer example) this improves convergence usually drastically. With this the dimer converged actually relatively quickly and consistent

  • you can also try the libxc implementation by using:
    METAGGA = LIBXC
    LIBXC1 = MGGA_X_M06_L
    LIBXC2 = MGGA_C_M06_L
    sometimes this helps/

For the Na calculation I did not encounter your problem, which LAPACK libraries did you use and compiler optimizations? Can you provide me with the exact INCAR, POSCAR, KPOINTS (I had a hard time figuring out which input files to use, because I will not just run your python script for security reasons.).

Please have a look at my attachments. For the Dimer I just did a pbe calculation and then used the CHGCAR and WAVECAR to start the metaGGA calculation which worked pretty well.

Best,
Alex

Na_attempt.zip
Formamide_dimer_my_attempt.zip

Re: Rot.F: EDWAV not orthogonal

Posted: Tue Mar 03, 2026 11:53 am
by Nick_Gerrits

Hi Alex,

Thanks for checking the calculations and replying so quickly.

Yes, I am fully aware that restarting mGGA or other difficult to converge calculations from converged densities like PBE can drastically improve things. Especially in the case of a DF like M06L, which is really really bad numerically. My point was not so much about how to get these calculations to work, but rather about the fact that users are getting error messages that state we should not be getting them. Like I previously said, the provided setup is not always great (of course you can do a much better job!), but it seems to highlight the EDWAV error. So perhaps there might be a connection between ALGO=A and having bad/less than ideal solutions? It would also explain why I haven't seen these issues (much) with molecules on surfaces, because I normally use much better behaved DFs and setups.

One point that is still confusing to me is the statement of using both the WAVECAR and CHGCAR as initial input. I have seen this remark on the wiki as well. However, if you use the WAVECAR, why would you use the CHGCAR? To the best of my knowledge, when you use the wave functions, you compute the density from those. Similarly, if you start with a density, you compute the potential from that, from which you then obtain the wave functions. The key issue here is that we are dealing with a loop (wave function -> density -> potential -> wave function) that needs to become self-consistent. However, that also means you need to enter that loop somewhere, therefore making a choice whether you start with the wave functions or the densities, not both. The OUTCAR you provided with the Formadide calculation suggests exactly the same, stating, e.g., "charge density for first step will be calculated from the start-wavefunctions" . Although, your INCAR does not include ISTART or ICHARG, so it defaults to ISTART=1 (if the WAVECAR is present) and ICHARG=0 anyway (see also your OUTCAR). So what am I missing here or mistaken about?

I completely forgot about the "PZHEEVX" error. That one I have also seen sometimes, but I don't remember the pattern behind that one.

About LMIXTAU: Then the wiki should be updated. To the best of my knowledge, this is not generally known. Even more troubling, it is suggested to turn it on when having convergence issues with density mixing, possibly leading people to believe it might be even the better option.

What I find really problematic is your statement regarding LibXC. Regardless of which code is used to compute the exchange-correlation energy, it should give you the same answer or you might end up being as problematic as Gaussian. Effectively what you are saying is that LibXC and VASP do not necessarily give the same answer. Moreover, I have also used the LibXC implementation, but, as far as I am aware, it does not really matter in the grand scheme of things. Many calculations still fail with this error.

About the requested files, they are in rot.F.tar.gz (see also Job, which is the job file for slurm and thus the calculation):
For KPOINTS: KPOINTS_large corresponds to the one used in the single point calculations for the EOS. KPOINTS_small was used for the rough prior cell relaxation. KPOINTS_atomic speaks for itself.
For INCAR: The same holds true. INCAR_initial_relax is the rough calculation. INCAR_fixedvolume_relax is the single point calculations. INCAR_atomic is of course the free atom in a box.
POSCAR: Again, the names are self explanatory. Nevertheless, POSCAR_bcc is the POSCAR used for these specific bulk calculations, POSCAR_atomic for the free atom. In POSCAR_bcc, you need to replace _latticeconstant_ with the lattice constant you want VASP to compute in the single points (see also the relevant loop in Job as well as the example OUTCARs) or start with during the initial relaxation.
The only python script included is FitEOS.py. You can forgo this completely as it is only used for the second to last calculation, and only to obtain the ideal lattice constant from the EOS fit. The script makes use of the results of the single point calculations performed for the fit. So this part is not interesting in the context of the errors we are discussing, since it is only used for the cohesive energy.

About Na: The issue is that you performed the calculation that did run fine (the first supposedly initial equilibration, but in that specific case I set IBRION=-1 and NSW=0, so not terribly relevant). slurm-369385.out gives you the stdout of the latest run using all the files provided. data_fixed_volume.dat tells you which single point lattice constant calculations failed. With these details, you can exactly repeat my calculations without needing to run a python script (except the ideal bulk one, but you can still do it by hand if you would really want to).

KR,
Nick


Re: Rot.F: EDWAV not orthogonal

Posted: Tue Mar 03, 2026 12:24 pm
by ahampel

Hi Nick,

Let's try this to not get too heated. I am myself a full novice when it comes to any metaGGA but I am very well aware of the different electronic minimizers. So maybe if you help me to guide through this I can see if I see from my "new" perspective any potential problems. You write:

About the requested files, they are in rot.F.tar.gz (see also Job, which is the job file for slurm and thus the calculation):
For KPOINTS: KPOINTS_large corresponds to the one used in the single point calculations for the EOS. KPOINTS_small was used for the rough prior cell relaxation. KPOINTS_atomic speaks for itself.
For INCAR: The same holds true. INCAR_initial_relax is the rough calculation. INCAR_fixedvolume_relax is the single point calculations. INCAR_atomic is of course the free atom in a box.
POSCAR: Again, the names are self explanatory. Nevertheless, POSCAR_bcc is the POSCAR used for these specific bulk calculations, POSCAR_atomic for the free atom. In POSCAR_bcc, you need to replace latticeconstant with the lattice constant you want VASP to compute in the single points (see also the relevant loop in Job as well as the example OUTCARs) or start with during the initial relaxation.
The only python script included is FitEOS.py. You can forgo this completely as it is only used for the second to last calculation, and only to obtain the ideal lattice constant from the EOS fit. The script makes use of the results of the single point calculations performed for the fit. So this part is not interesting in the context of the errors we are discussing, since it is only used for the cohesive energy.

About Na: The issue is that you performed the calculation that did run fine (the first supposedly initial equilibration, but in that specific case I set IBRION=-1 and NSW=0, so not terribly relevant). slurm-369385.out gives you the stdout of the latest run using all the files provided. data_fixed_volume.dat tells you which single point lattice constant calculations failed. With these details, you can exactly repeat my calculations without needing to run a python script (except the ideal bulk one, but you can still do it by hand if you would really want to).

This is a lot. Can you please give me one INCAR, POSCAR, POTCAR, KPOINTS file that produces the error you are talking about? I am still not sure I understand which slurm output file I should look at to see which of your Job steps have been run to produce which error. Fact is that when using the wavefunctions from a PBE run I solved convergence issues. I think this is a very viable solution and you can easily adopt this. Anyway let's go back to errors that you should not see, so please tell me which one you mean. I grepped for EDWAV in all .out files and could not find what you mean. But by providing me one set of input files with an example output with the error we can make sure we talk about the same problem. Please do that.

One point that is still confusing to me is the statement of using both the WAVECAR and CHGCAR as initial input. I have seen this remark on the wiki as well. However, if you use the WAVECAR, why would you use the CHGCAR? To the best of my knowledge, when you use the wave functions, you compute the density from those. Similarly, if you start with a density, you compute the potential from that, from which you then obtain the wave functions. The key issue here is that we are dealing with a loop (wave function -> density -> potential -> wave function) that needs to become self-consistent. However, that also means you need to enter that loop somewhere, therefore making a choice whether you start with the wave functions or the densities, not both. The OUTCAR you provided with the Formadide calculation suggests exactly the same, stating, e.g., "charge density for first step will be calculated from the start-wavefunctions" . Although, your INCAR does not include ISTART or ICHARG, so it defaults to ISTART=1 (if the WAVECAR is present) and ICHARG=0 anyway (see also your OUTCAR). So what am I missing here or mistaken about?

Sry I am not exact here. I always do this because of laziness, but you are right only the WAVECAR is needed in my workflow. That should be always sufficient. If you see anything else let me know.

About LMIXTAU: Then the wiki should be updated. To the best of my knowledge, this is not generally known. Even more troubling, it is suggested to turn it on when having convergence issues with density mixing, possibly leading people to believe it might be even the better option.

That is a very very recent development. We will update the wiki once we understood this better and know what is going on.

What I find really problematic is your statement regarding LibXC. Regardless of which code is used to compute the exchange-correlation energy, it should give you the same answer or you might end up being as problematic as Gaussian. Effectively what you are saying is that LibXC and VASP do not necessarily give the same answer. Moreover, I have also used the LibXC implementation, but, as far as I am aware, it does not really matter in the grand scheme of things. Many calculations still fail with this error.

Well, I agree. But if tiny changes in the way things are calculated; definitions of constants, library calls, the resulting search path will be different.

So perhaps there might be a connection between ALGO=A and having bad/less than ideal solutions? It would also explain why I haven't seen these issues (much) with molecules on surfaces, because I normally use much better behaved DFs and setups.

This gets a bit into the details on how to find the ground state charge density. I think it would be good to have a look here: https://doi.org/10.1103/PhysRevB.54.11169 specifically it is important to understand the searching for the ground state density by actually minimizing the functional (ALGO=All/conjugate) is usually less efficient than simple forward iteration with broyden mixing. See figures in the paper. This is why VASP by default uses forward-iteration with advanced mixing in favor of actually calculating a gradient of the total energy and minimizing it. Other user should free to jump in here. However, in certain cases (when the initial guess is very broken, or the energy surface is very rough) it might be beneficial to use gradient algorithms on the total energy directly. Enter mGGAs or molecules etc. Hence, the recommendation of the wiki to not use these unless converge is not achieved otherwise.

I hope I could tackle a few of your questions. let's get back to this example you are referring. give me one set of input files and I will try to understand better what is going on.

Best,
Alex


Re: Rot.F: EDWAV not orthogonal

Posted: Tue Mar 03, 2026 2:21 pm
by Nick_Gerrits

Hi Alex,

My apologies if I came across heated, this was not my intention. Lets blame this on my Dutch directness and desire to make things clear. Another apology: It seems I mixed up which file and calculation led to which exact error. Like you suggested, I am including here a "clean" example of a calculation for bulk V. Unfortunately they are 2 calculations, since the second calculation (the one that gave me the error in this case) relied on the CHGCAR of the first calculation. Without that CHGCAR it ran fine. This indeed points again to differences in the electronic structure causing issues. The difference between the two calculations is ENCUT and KPOINTS, as well as the exact lattice constant being computed (each of these might be the culprit or combination). Hopefully this example is much clearer and faster to run.

I understand that the electronic minimization methods can work well. Unfortunately, I have had to deal many times over with calculations that needed ALGO=A due to a lack of convergence (whether due to the DF, spin polarization, bad luck, etc.). There are indeed alternative strategies to get things to work with ALGO=Normal or Fast, but they also don't always work. For example, the wiki (used to?) states for a reason for spin polarization as the final tip to pray, after doing things like you mentioned. In my experience, ALGO=A was often the more robust option to use, but of course like you already indicate, it does not necessarily need to be. Regardless, I want to emphasize that I opened this topic due to the fact that VASP tells me I should not see this message.

Nick

test_vasp_EDWAV.tar.gz

Re: Rot.F: EDWAV not orthogonal

Posted: Wed Mar 04, 2026 9:16 am
by ahampel

Hi Nick,

thank you for explicit example. Funny enough this one is not throwing the error (I am using an intel toolchain 2025.3.2 with intel mpi 2021.17.2 and also tried a GNU toolchain with openmpi: GCC 12.3 / mkl-2023.2.0 / ompi-4.1.6). I followed your exact steps and it went through. I also added a little print in the code to check the value of DORT (the abort criteria) in src/rot.F (around l870 or so):

Code: Select all

      DORT=WZDOTC(WDES1%NPL,W1(NP)%CW(1),1,CF(1,NP),1)
GPACC !$ACC WAIT(ACC_ASYNC_Q) IF(OFFLOAD_ON)
      CALLMPI( M_sum_d(WDES%COMM_INB, DORT, 1))
      IF (IONODE==1) THEN
        print *, "DORT before orthogonalization ", DORT
      ENDIF
      IF (ABS(DORT)>1E-4) THEN
         CALL vtutor%bug("EDWAV: internal error, the gradient is not orthogonal " // str(NK) // &
            " " // str(NP) // " " // str(DORT), __FILE__, __LINE__)
      ENDIF

for me the largest value that I see is of the order of 1e-6 . What compilers and libraries are you using? Maybe you have a version that is plagued by some MPI problem? For sure this will go away if you use a slightly different build. However, I see a potential problem here in the design of the abort criteria. I will discuss this internally and will get back to you. For now could you try the following for testing: can you remove in your src/rot.F this IF (ABS(DORT)>1E-4) THEN ... ENDIF part and try the exact same calculation again. Since I cannot reproduce this I want to test my idea. It could be that your calculation is very well within the physical regime and will just converge fine. Please let me know if this makes your calculation converge well or crash. Obviously, just do this for testing. The test is there for a good reason. :-)

edit: now I was able to trigger it by randomly changing the number cores and setting NCORE=1 . So don't worry about MPI problems. I can now reproduce it.

Best,
Alex


Re: Rot.F: EDWAV not orthogonal

Posted: Thu Mar 05, 2026 8:44 am
by Nick_Gerrits

Yes, that is indeed funny. Seems numerics are quite the influence. Removing that if statement indeed leads to calculations that are fine. Perhaps dealing with errors in VASP could be dealt more often with some kind of fallback option, instead of fatally crashing. As a user I have seen a considerable amount of different errors and, in my humble opinion, plenty of times they could/might have been dealt with a fallback option instead of fatally exiting the program. In this particular case, I am wondering whether you can't reorthonormalize if the threshold is violated?

I have used intel/oneapi/mkl/2022.1.0 (see also next few lines). I basically used makefile.include.intel and modified it with the following flags obtained from the Intel linker tool to compile it statically (has to do with the setup of our clusters):
FCL += -qmkl=sequential
LLIBS += ${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_sequential.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_openmpi_lp64.a -Wl,--end-group -lpthread -lm -ldl
INCS =-I$(MKLROOT)/include/fftw -I"${MKLROOT}/include"

However, I noticed another thing. Apparently I compiled it with -Ofast. I have now tried it with -O2 (and of course with the if statement) and the calculation also runs fine. Both yield the same energy up until convergence threshold. Numerically the two are different, e.g., different amount of SCF, but that is not surprising considering the different compilation and inclusion/exclusion of that if statement. The Ofast variant did take 2 SCF cycles longer, so perhaps the numerics are indeed slightly worse off, but N=1 is not really statistically relevant... I am not sure if that means I won't get that error anymore, or that numerics simply changed and now other calculations are affected by chance. But I hope these things can give you some hints on what has been happening to users.


Re: Rot.F: EDWAV not orthogonal

Posted: Wed Apr 15, 2026 3:40 pm
by ahampel

Hi Nick,

I now have a waiting fix for this on our internal git server. If you like to fix this in your code as well this should help to mitigate the problem: add the following additional orthonormalization of the wave functions after the delay phase in electron_all.F after line 475:

Code: Select all

...
         BTRIAL=INFO%TIME

         ! Re-orthonormalize wavefunctions once at the transition out of the
         ! EDDAV delay phase, before the first EDWAV call. Just to fix any
         ! possible loss of orthogonality due to the EDDAV steps
         IF (N == ABS(INFO%NELMDL)+1) THEN
            CALL ORTHCH(WDES,W, INFO%LOVERL, LMDIM,CQIJ)
         ENDIF

GPACC !$ACC UPDATE DEVICE(W%FERTOT) IF(OFFLOAD_ON)
GPOMP !$OMP TARGET UPDATE TO(W%FERTOT) IF(OFFLOAD_ON)
         CALL EDWAV(HAMILTONIAN,KINEDEN, &
...

this should help. The problem is a bit obscure because it only happens for specific metagga terms that produce non physical V_XC contributions if the kinetic-energy density is 0. The Davidson algorithm used in the delay phase in the beginning of VASP is guaranteed to produce orthonormal wavefunctions if the Hamiltonian is hermitian (and it does I checked), but for some metagga's specificially I tested M06L, the orthonormality in each DAV step is lost a bit. This is due to the fact that mu calculated from tau is non physical. To make very certain that the first gradient is in edwav (ALGO=ALL) is build with orthonormal wave function the ORTCH call above guarantees orthonormal states on entry. Internally it will always use ORTCH to keep the states orthonormal.

We working on a further improvement for the delay phase in cases in which the kinetic-energy density is 0 to prevent this from occurring in the first place. If you like you can test the fix above and I would appreciate any feedback. Thank you.

Best,
Alex


Re: Rot.F: EDWAV not orthogonal

Posted: Tue Apr 21, 2026 11:08 am
by Nick_Gerrits

Hi Alex,

Thanks for getting back to the topic and it looks like an elegant solution. I might have calculations running soon that can test this.

I am a bit confused about your statement regarding mu: Which quantity are you talking about? Because to me, mu means the second-order gradient expansion coefficient, which is a constant. Do you mean the various iso-orbital indicators used in MGGA DFs? Regularization of the iso-orbital indicator (e.g., r2SCAN and rMS-RPBEl, see also https://doi.org/10.1103/PhysRevB.99.041119) tends to help in those cases, but one could argue that you are then also slightly altering the DF itself. If you only do this for the delay phase, it might be worthwhile. However, in principle, could this error not also take place after the switch between DAV and CGA/SDA has already happened? Or is orthonormality somehow guaranteed in that part?

Also, M06L can be viewed as an entirely unphysical DF anyway. Just take a look at the exchange enhancement factor plotted against the reduced gradient for alpha=0 and alpha=1. Personally, I would simply avoid using it all...


Re: Rot.F: EDWAV not orthogonal

Posted: Tue Apr 21, 2026 2:15 pm
by ahampel

Hi Nick,

Let me try to recap different: I noticed is that in the job that crashes the calculation starts from CHGCAR but not with TAUCAR. This means that internally VASP creates a Hamiltonian from the supplied charge density and makes 5-6 (NELMDL) steps to diagonalize this Hamiltonian with fixed charge density. Without supplied TAUCAR VASP sets in this stage τ=0 and then evaluates μ=∂V_xc/∂τ (what I refereed to as mu). This evaluation for τ=0, as you point out, in M06L giving very badly conditioned Hamiltonians. The blocked DAV chokes somehow on this (I am trying to understand exactly why currently). But the aforementioned fix will correct this problem by orthonormalizing the wavefunctions before the first gradient ist constructed after the NELMDL steps are done.

And you are right this can happen at any point in the calculation but ALGO=ALL keeps at the end of its routine the wavefunction orthonormal. So the error should be really pronounced when starting VASP.

I hope this got a bit clearer.

Best,
Alex