Parallel Wannier Projections

Question on input files/tags, interpreting output, etc.

Please check whether the answer to your question is given in the VASP online manual or has been discussed in this forum previously!

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
jbackman
Newbie
Newbie
Posts: 9
Joined: Thu Nov 26, 2020 10:27 am

Parallel Wannier Projections

#1 Post by jbackman » Fri Feb 26, 2021 2:02 pm

Dear VASP developers,

I understand that this might be out of the scope of the forum, since it has been asked before:
https://www.vasp.at/forum/viewtopic.php?f=4&t=17273

However, with the latest release 6.2 including some updates to the Wannier90 interface I wanted to check if the Wannier projections have been implemented in parallel in the latest release, or if there are any plans on doing this?

Best,
Jonathan

merzuk.kaltak
Administrator
Administrator
Posts: 122
Joined: Mon Sep 24, 2018 9:39 am

Re: Parallel Wannier Projections

#2 Post by merzuk.kaltak » Mon Mar 01, 2021 9:19 am

Dear Jonathan,

VASP calls the wannier90 library in serial mode, despite the fact that w90 as a stand-alone software package is parallelized.
To the best of my knowledge there is no general purpose parallel interface to the w90-library.

Best,
Merzuk

jbackman
Newbie
Newbie
Posts: 9
Joined: Thu Nov 26, 2020 10:27 am

Re: Parallel Wannier Projections

#3 Post by jbackman » Wed Apr 14, 2021 9:05 pm

Dear Merzuk,

When using VASP to interface to wannier90 the overlap calculation (wannier90.mmn) seems to be implemented in parallel since this scales with the number of cores used when running VASP with the LWANNIER90 = .TRUE. flag. However, this is not the case for the projections (wannier90.amn). So it looks like part of the code is implemented in parallel.

This is why I asked if the projections have also been addressed in the latest update?

Best,
Jonathan

henrique_miranda
Global Moderator
Global Moderator
Posts: 194
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallel Wannier Projections

#4 Post by henrique_miranda » Thu Apr 22, 2021 8:31 pm

Dear Jonathan,

We did not change the parallelization scheme for the computation of the initial projections for wannier90 in the last update.
From my experience, the computation of the overlaps (wannnier90.mmn) is the bottleneck.
But we might address this in the future, thanks for pointing it out.

henrique_miranda
Global Moderator
Global Moderator
Posts: 194
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallel Wannier Projections

#5 Post by henrique_miranda » Thu Apr 22, 2021 9:15 pm

A small addendum to my previous post:
the projections are already computed in parallel (they were also computed in parallel before 6.2).
Is your computation not scaling with the number of cores?
Could you give more information about which system you are looking at and the timings?

jbackman
Newbie
Newbie
Posts: 9
Joined: Thu Nov 26, 2020 10:27 am

Re: Parallel Wannier Projections

#6 Post by jbackman » Mon Apr 26, 2021 1:00 pm

Dear Henrique,
thanks for your answer.

Yes, for me it does not seem like it scales with the number of cores. It is really just a problem when I work with very large systems (500+ projections), but I here use a small 2D MoS2 system to test the scaling.

Using the WAVECAR from a previous SCF calculation. I use the following INCAR to calculate the necessary wannier90 files.

INCAR:
"
ENCUT = 500 eV
ALGO = None
ISMEAR = 0
SIGMA = 0.1
NELM = 0
EDIFF = 1E-10
GGA = PE
NBANDS = 18
LPLANE = .FALSE.
PREC = Accurate
ADDGRID = .TRUE.
LWAVE = .FALSE.
LCHARG = .FALSE.
LWANNIER90 = .TRUE.
LWRITE_MMN_AMN = .TRUE.
NUM_WANN = 11
WANNIER90_WIN = "
begin projections
Mo:l=2
S:l=1
end projections
search_shells = 70
"
"

KPOINTS:
"
K-Points
0
Monkhorst Pack
17 17 1
0 0 0
"
POSCAR:
"
MoS2 monolayer
3.18300000000000
0.8660254037800000 -0.5000000000000000 0.0000000000000000
0.8660254037800000 0.5000000000000000 0.0000000000000000
0.0000000000000000 0.0000000000000000 6.3291139240499996
Mo S
1 2
Direct
-0.0000000000000000 -0.0000000000000000 0.5000000000000000
0.3333333333352613 0.3333333333352613 0.5776104589503532
0.3333333333352613 0.3333333333352613 0.4223895410496468
"

I measure the time to calculate MMN and AMN by timing the MLWF_WANNIER90_MMN and MLWF_WANNIER90_AMN function calls in mlwf.F.

The time of each projection is measured as the time for each step of the: funcs: DO IFNC=1,SIZE(LPRJ_functions) loop in LPRJ_PROALL function, defined in locproj.F.

I get the following timings when increasing the number of cores.
Cores [#], MMN (s), AMN (s), Proj (s)
1 72.9 20.3 1.85
2 40.4 19.1 1.73
9 25.5 19.5 1.76


To me, it looks like the MMN calculation is scaling but not the AMN calculation. This is also my experiense when dealing with a large system where the AMN calculation becomes very slow. The calculations are done with VASP 6.2.

I'm thankful for any input you have.

Best,
Jonathan

henrique_miranda
Global Moderator
Global Moderator
Posts: 194
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallel Wannier Projections

#7 Post by henrique_miranda » Tue Apr 27, 2021 7:39 am

Dear Jonathan,

I ran this system for the same number of cores that you have.
Here are the timings I get (compiled with a gnu toolchain):

Code: Select all

Cores   MMN (s)     AMN (s)
1    132.439279   70.092866
2     69.650388   51.882086
9     26.695566   44.132432
While admittedly it is not a very good scaling, at least it does scale somehow.
I would like to understand what is going on i.e. why you don't observe any scaling while my testing shows some.
What does the `Proj` separator mean?

To collect the timings above I needed to add profiling statements in the routines that compute AMN and MMN (they will be included in a future release of VASP).
How did you collect the timings you report?

jbackman
Newbie
Newbie
Posts: 9
Joined: Thu Nov 26, 2020 10:27 am

Re: Parallel Wannier Projections

#8 Post by jbackman » Tue Apr 27, 2021 9:56 am

Dear Henrique,

Yes, it seems like you observe some scaling when it comes to the MMN and AMN calculations. However, as you could see. I don't see this for AMN.

As I said, I measure the time to run the MLWF_WANNIER90_MMN and MLWF_WANNIER90_AMN function calls in mlwf.F.
I do this using the cpu_time() command.

The reported Proj time is the time of each step of the funcs: DO IFNC=1,SIZE(LPRJ_functions) loop in LPRJ_PROALL function, defined in locproj.F. Also measured using the cpu_time().

If I sum up the individual time for each projection it adds up to that of the full AMN calculation.

My code is compiled with the intel toolchain. Version of intel: intel/19.1.1.217 This is my makefile.include file.

# Precompiler options
CPP_OPTIONS= -DHOST=\"LinuxIFC\"\
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-Dvasp6 \
-Duse_bse_te \
-Dtbdyn \
-Dfock_dblbuf \
-Duse_shmem \
-DVASP2WANNIER90

CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

FC = ftn
FCL = ftn -mkl=sequential

FREE = -free -names lowercase

FFLAGS = -assume byterecl -w -xHOST
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0

MKL_PATH = $(MKLROOT)/lib/intel64
BLAS = $(MKL_PATH)/libmkl_blas95_lp64.a
LAPACK = $(MKL_PATH)/libmkl_lapack95_lp64.a
BLACS = -lmkl_blacs_intelmpi_lp64
SCALAPACK = -L$(MKL_PATH) -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64

WANNIER = /users/jbackman/wannier90/wannier90-3.1.0/libwannier.a

OBJECTS = fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d.o

INCS = -I$(MKLROOT)/include/fftw

LLIBS = $(SCALAPACK) $(BLACS) $(LAPACK) $(BLAS) $(WANNIER)

OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = cc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# For the parser library
CXX_PARS = CC
LLIBS += -lstdc++

# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin

MPI_INC = $(MPICH_DIR)/include

Best,
Jonathan

henrique_miranda
Global Moderator
Global Moderator
Posts: 194
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallel Wannier Projections

#9 Post by henrique_miranda » Wed Apr 28, 2021 1:46 pm

Dear Jonathan,

I was able to obtain timings similar to yours when compiling the code with intel/19.1.2.254 and mkl/2020.2.254.

Code: Select all

Cores   MMN (s)     AMN (s)
1     58.068596   23.260825
2     33.101691   22.814278
9     20.505906   25.768041
I don't have yet an explanation for this i.e. why the AMN computation scales when compiled with a gnu toolchain while it does not scale with intel. In principle, both should scale.
In any case, the intel version is much faster than the gnu one. It is clear that on intel hardware one should if possible use the intel compiler and mkl for optimal performance.
I will investigate further the reason for these differences and get back to you.

jbackman
Newbie
Newbie
Posts: 9
Joined: Thu Nov 26, 2020 10:27 am

Re: Parallel Wannier Projections

#10 Post by jbackman » Mon May 10, 2021 11:05 am

Dear Henrique,

thank you for looking into the issue. Any news on the cause of the problem and a possible solution?

Best,
Jonthan

henrique_miranda
Global Moderator
Global Moderator
Posts: 194
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallel Wannier Projections

#11 Post by henrique_miranda » Tue May 11, 2021 10:25 am

Dear Jonathan,

Yes, we've looked into this issue.
The reason that the code is not scaling is that the part of the computation that is distributed (dot-product between the WF and projection WF) is not necessarily the most expensive. Using the intel toolchain this part is so fast that you don't see a difference in the final timing.
There are other possible parallelization schemes we are considering to distribute the computation of the projection WFs.
This becomes more important for systems with a lot of atoms and few k-points (or gamma-only).

For this particular example (few atoms and a lot of k-points), a k-point parallelization (KPAR=N) is better.
Unfortunately, KPAR is currently not used to distribute the computation of the projections.
Only a minor modification of the code is required, so we will try to include it in a future release of VASP.

Kind regards,
Henrique Miranda

jbackman
Newbie
Newbie
Posts: 9
Joined: Thu Nov 26, 2020 10:27 am

Re: Parallel Wannier Projections

#12 Post by jbackman » Tue May 11, 2021 2:17 pm

Dear Henrique,

again, thank you for looking into the issue.

I agree with your conclusions for the posted example, where k-point parallelization could be a good option. I however think the bigger problem is for large systems with a lot of atoms and few k-points. What do you think is the best option in this case where we don't have many k-point? Projection parallelization? Do you have any estimate for when such a new release would be available?

In your testing, what is the most expensive part of the projection calculation with the intel toolchain?

Best regards,
Jonathan

henrique_miranda
Global Moderator
Global Moderator
Posts: 194
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallel Wannier Projections

#13 Post by henrique_miranda » Wed May 12, 2021 12:48 pm

Dear Jonathan,

From my testing, the slowest part is the computation of the local wavefunctions (CONSTRUCT_RYLM routine).
These wavefunctions are currently generated on all the MPI nodes, only the evaluation of the dot-product with the Bloch orbitals is distributed.
We are looking into ways to improve the speed and scaling of this part of the code.
There are many possibilities (distributing the computation of the local wavefunctions being one of them) but they often involve a trade-off between computation and communication, it is hard to say what the best strategy is without implementing and testing.

For this particular case you showed (MoS2) I find that the KPAR parallelization is adequate.
We are testing other strategies for systems with many atoms and a few k-points.

I cannot say yet when this improvement will be implemented and released.
While the current code is not the fastest possible it is not so slow either.
We have already used it to compute projections on systems with ~1000 atoms.
Do you have some applications in mind where the current implementation is the limiting factor?

Kind regards,
Henrique Miranda

Post Reply