Crash when computing MMN with NCORE>1

Problems running VASP: crashes, internal errors, "wrong" results.

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
ashwin_r
Newbie
Newbie
Posts: 5
Joined: Sat Nov 16, 2019 8:58 pm

Crash when computing MMN with NCORE>1

#1 Post by ashwin_r » Wed Jul 21, 2021 2:50 pm

I am running into a persistent error (possibly a memory leak?) when attempting to calculate Wannier projections using the latest version of VASP 6.2.1 with Wannier90-3.1.0. The error occurs only when NCORE>1; for NCORE=1, everything works as expected.

I am attaching a bug report here, using Si as an example. The bug is reproducible using either vasp_gam or vasp_std (for 1x1x1 k-point mesh) and with vasp_std (for larger k-point meshes). I am using Intel parallel_studio_xe 2020.2 compilers for the tests attached here but I am also able to reproduce the error with GCC/GFortran 9.3.0.
You do not have the required permissions to view the files attached to this post.

henrique_miranda
Global Moderator
Global Moderator
Posts: 414
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Crash when computing MMN with NCORE>1

#2 Post by henrique_miranda » Thu Jul 22, 2021 1:41 pm

Thanks for reporting the issue.

This is a known limitation that was present since older versions of VASP (I checked in 5.4.4).
The solution for the moment is to always use NCORE=1.

This issue appears because when NCORE/=1 is used the WF components are distributed among different MPI ranks (each band on NCORE MPI ranks). To compute MMN and AMN matrices for Wannier we need to generate the WFs in the full Brillouin zone which implies rotating them from one k-point to another which in turn requires transferring data among the CPUs that are treating the same band. This is not implemented yet.

Note that the default data distribution in VASP i.e. NCORE=1 means that each band is treated on one CPU so the code will work without problems.

ashwin_r
Newbie
Newbie
Posts: 5
Joined: Sat Nov 16, 2019 8:58 pm

Re: Crash when computing MMN with NCORE>1

#3 Post by ashwin_r » Thu Jul 22, 2021 2:30 pm

Thanks, Henrique. It might be useful to add a one-line check to terminate the calculation at the very outset if LWANNIER90 = .TRUE. and NCORE/=1 so that the calculation does not crash after wasting compute time on self-consistency cycles.

henrique_miranda
Global Moderator
Global Moderator
Posts: 414
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Crash when computing MMN with NCORE>1

#4 Post by henrique_miranda » Wed Jul 28, 2021 5:39 pm

Yes, this is a good point.
We will make this change in a future release.

Post Reply