Problem in electronic optimisation when calculating LEPSILON with SOC

Message

jeongyoungchoi · #1 Post by **jeongyoungchoi** » Wed Dec 10, 2025 4:11 pm

Dear VASP developers,

I would like to preface this by saying I have searched the VASP forum for a similar issue (most relevant: https://www.vasp.at/forum/viewtopic.php?t=19828) and have tried the solutions to no avail, so I am posting this here in hopes of getting some advice.

For context, I am running DFPT calculations to obtain the Born effective charges with IBRION = 8 (I have also tried 7 and got the same issue), and when I run it with SOC and ISYM = -1, I encounter the following error in the SLURM output file:

HAMIL_LR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthogonal to |phi(0)>

I have tried the provisional fix of using the Conjugate Gradient algorithm (ALGO = Conjugate / All), but the same issue still occurs.

I have attached the relevant inputs and parts of the output (truncated due to size).

DFPT+SOC.zip

Thank you in advance!

Best regards,
Jeong Young CHOI

Note: VASP (version 6.5.1) (also occurs on version 6.4.2) / system relaxed with SOC and ISYM = -1.

#2 Post by **jonathan_lahnsteiner2** » Fri Dec 12, 2025 6:49 am

Dear jeongyoungchoi,

I was not able to reproduce your issue. For me the calculation using VASP 6.5.1 with a gnu 11.2 toolchain works fine.
Could you tell me what tool chain you were using?

All the Best Jonathan

jeongyoungchoi · #3 Post by **jeongyoungchoi** » Fri Dec 12, 2025 1:23 pm

Hi Jonathan,

Thank you for taking the time to test if the issue is reproducible.

I usually run the precompiled version of VASP on the CSCS (Daint) cluster, and have tried looking into which toolchain they use, but I have been unsuccessful.

Alternatively, the same issue pops up for the following toolchain on the Euler cluster:
intel-oneapi-compilers/2023.2.0 intel-oneapi-mpi/2021.10.0 intel-oneapi-mkl/2023.2.0 hdf5/1.14.3 vasp/6.4.2

From the few other runs, I've also noticed it runs fine for calculating the "Linear response to external field (no local field effect)", but the issue starts when it is calculating the following:

Code: Select all

 Linear response to external field, progress :
  Direction:   1
       N       E                     dE             d eps       ncg     rms          rms(c)
RMM:   1    -0.810532869822E+01   -0.81053E+01   -0.34112E+01140592   0.255E+00
RMM:   2    -0.913545355605E+01   -0.10301E+01   -0.34461E-01119401   0.974E-01    0.566E+00
RMM:   3    -0.845703764293E+01    0.67842E+00   -0.10195E+00153329   0.105E+00    0.293E+00
RMM:   4    -0.839597460187E+01    0.61063E-01   -0.82142E-01193508   0.906E-01    0.132E+00
RMM:   5    -0.843818502961E+01   -0.42210E-01   -0.42138E-02188500   0.267E-01    0.685E-01
RMM:   6    -0.845900090702E+01   -0.20816E-01   -0.16324E-02177956   0.183E-01    0.301E+00
RMM:   7    -0.845957064279E+01   -0.56974E-03   -0.50259E-03179275   0.104E-01    0.285E+00
RMM:   8    -0.847886852478E+01   -0.19298E-01   -0.90552E-03180603   0.127E-01    0.893E-01
RMM:   9    -0.848977247912E+01   -0.10904E-01   -0.66347E-03180918   0.122E-01    0.540E-01
 HAMIL_LR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthogonal to |phi(0)>
 HAMIL_LR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthogonal to |phi(0)>
             1            1   3.9515046498775717E-003
 HAMIL_LR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthogonal to |phi(0)>
             1            1   1.4082541343490948E-002
 HAMIL_LR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthogonal to |phi(0)>
 HAMIL_LR internal error: the vector H(1)-e(1) S(1) |phi(0)> is not orthogonal to |phi(0)>
             1            1   1.9859739043422029E-003

Based on a deeper dive on a related issue (https://www.vasp.at/forum/viewtopic.php?t=20398), I do also see a large change in energy (in the third step), which makes me wonder if this is an issue with setting up the parallelisation (e.g. number of MPI ranks).

Could you let me know the settings you used to calculate to see if this is the case? I will also post any updates if I have any findings.

Again, I greatly appreciate your help and thank you in advance! :)

#4 Post by **jonathan_lahnsteiner2** » Mon Dec 22, 2025 8:30 am

Dear jeongyoungchoi,

I was checking your calculations with intel toolchains. I was using 2023.2.1_mkl-2023.2.0_impi-2021.10.0 which is the same as you are reporting. I was not able to reproduce the problem you are reporting.
I was using the same INCAR file as you were sending in one of your previous posts. I was running the calculation on AMD EPYC 7713 with 64 mpi ranks.
On how many nodes have you been running the job and on which computer architecture?

All the Best Jonathan

jeongyoungchoi · #5 Post by **jeongyoungchoi** » Mon Dec 22, 2025 1:03 pm

Dear Jonathan,

Thank you for taking the time to check the issue again.

On the Intel toolchain I was using 16 MPI ranks with 16 threads per rank (4 nodes) when the issue popped up.

On this HPC, the cluster is assigned semi-arbitrarily, and the specs usually get printed in calculations on other software, but I am unable to find them in the VASP outputs.
My best guess would be the AMD EPYC 9654 (2 sockets, 96 cores, 2 threads), followed by AMD EPYC 7742 (2 sockets, 64 cores, 2 threads).

That said, I have been able to run it without said error using 32 MPI ranks and 8 threads for 12+ hours; however, I need to rerun it due to the time limit, and the HPC is currently down at the moment.

As for the other toolchain on CSCS Daint that produces the issue, the following are used: "...MPI, OpenMP, OpenACC, HDF5 (and Wannier90 support is available)".

Here, I have been running with 1 node, 4 MPI ranks, and 16 threads (+ 1 GPU per MPI rank), and I have yet to find a setting that works on this cluster.
Specifications: NVIDIA Grace-Hopper, 4 sockets, 72 ARM cores, H100 GPU / socket

Code: Select all

 -----------------------------------------------------------------------------
|                     _     ____    _    _    _____     _                     |
|                    | |   |  _ \  | |  | |  / ____|   | |                    |
|                    | |   | |_) | | |  | | | |  __    | |                    |
|                    |_|   |  _ <  | |  | | | | |_ |   |_|                    |
|                     _    | |_) | | |__| | | |__| |    _                     |
|                    (_)   |____/   \____/   \_____|   (_)                    |
|                                                                             |
|     internal error in: hamil_lrf.F  at line: 815                            |
|                                                                             |
|     LRF_HAMIL internal error: the vector H(1)-e(1) S(1) |phi(0)> is not     |
|     orthogonal to |phi(0)> -0.114 0.066                                     |
|                                                                             |
|     If you are not a developer, you should not encounter this problem.      |
|     Please submit a bug report.                                             |
|                                                                             |
 -----------------------------------------------------------------------------

Thank you for looking into the issue.

I will see if I can get it to run successfully with either 32 / 64 MPI ranks once the cluster is back (Intel toolchain).

#6 Post by **jonathan_lahnsteiner2** » Mon Dec 22, 2025 2:50 pm

Dear jeongyoungchoi,

I was not aware that you are using hybrid parallelization. I will have to do some tests again. It might be that there is some error in the code for the openmp offloading since a lot of development was done there.

Maybe it is helpful to take a look at the page Combining MPI and OpenMP in the vasp wiki.
Otherwise for now you could try to run your calculation without openMP, only relying on MPI parallelization because this should work for your calculation.

I will do some tests again and hopefully will be able to reproduce your error.

All the Best Jonathan

VASP Forum

Problem in electronic optimisation when calculating LEPSILON with SOC

Problem in electronic optimisation when calculating LEPSILON with SOC

Re: Problem in electronic optimisation when calculating LEPSILON with SOC

Re: Problem in electronic optimisation when calculating LEPSILON with SOC

Re: Problem in electronic optimisation when calculating LEPSILON with SOC

Re: Problem in electronic optimisation when calculating LEPSILON with SOC

Re: Problem in electronic optimisation when calculating LEPSILON with SOC