Page 1 of 1

Crash when computing MMN with NCORE>1

Posted: Wed Jul 21, 2021 2:50 pm
by ashwin_r
I am running into a persistent error (possibly a memory leak?) when attempting to calculate Wannier projections using the latest version of VASP 6.2.1 with Wannier90-3.1.0. The error occurs only when NCORE>1; for NCORE=1, everything works as expected.

I am attaching a bug report here, using Si as an example. The bug is reproducible using either vasp_gam or vasp_std (for 1x1x1 k-point mesh) and with vasp_std (for larger k-point meshes). I am using Intel parallel_studio_xe 2020.2 compilers for the tests attached here but I am also able to reproduce the error with GCC/GFortran 9.3.0.

Re: Crash when computing MMN with NCORE>1

Posted: Thu Jul 22, 2021 1:41 pm
by henrique_miranda
Thanks for reporting the issue.

This is a known limitation that was present since older versions of VASP (I checked in 5.4.4).
The solution for the moment is to always use NCORE=1.

This issue appears because when NCORE/=1 is used the WF components are distributed among different MPI ranks (each band on NCORE MPI ranks). To compute MMN and AMN matrices for Wannier we need to generate the WFs in the full Brillouin zone which implies rotating them from one k-point to another which in turn requires transferring data among the CPUs that are treating the same band. This is not implemented yet.

Note that the default data distribution in VASP i.e. NCORE=1 means that each band is treated on one CPU so the code will work without problems.

Re: Crash when computing MMN with NCORE>1

Posted: Thu Jul 22, 2021 2:30 pm
by ashwin_r
Thanks, Henrique. It might be useful to add a one-line check to terminate the calculation at the very outset if LWANNIER90 = .TRUE. and NCORE/=1 so that the calculation does not crash after wasting compute time on self-consistency cycles.

Re: Crash when computing MMN with NCORE>1

Posted: Wed Jul 28, 2021 5:39 pm
by henrique_miranda
Yes, this is a good point.
We will make this change in a future release.