Page 1 of 1

Can the GW run be restart from where it stops?

Posted: Sun Aug 15, 2021 8:48 am
by SKM
Hi
i am doing a GW Band run for a semiconductor material.
it taking long time and i think the wall time given would not be sufficient, in such case can restart the run from where it stops after the wall time elapsed?

Also, its not allowing to use NPAR tag. [as i thought i can speed-up the run if i add this tag]

Regards

Re: Can the GW run be restart from where it stops?

Posted: Mon Aug 16, 2021 1:18 am
by marie-therese.huebsch
Dear SKM,

Let me investigate which tags you can use to speed up your calculation.

For now, you might try checkpointing tools. Perhaps MPI-agnostic network-agnostic (MANA) transparent checkpointing can help you to safely stop and restart your GW calculation. It is implemented as a plugin in DMTCP: Distributed MultiThreaded CheckPointing.

Please consider sharing your experience here afterward!

Best regards,
Marie-Therese

Re: Can the GW run be restart from where it stops?

Posted: Mon Aug 16, 2021 1:49 am
by SKM
Thanks Marie, for quick reply. Will check the DMTCP link and ask our Tech Admin, if its implemented in our supercomputer systems.

1. I just finished the GW0 run. After i sent the query in the forum, i have tested with KPAR= Number of nodes and KPAR=Number of cores in each node. These are two runs i tested and checked the duration of time taken for one NQ step. the observations are as below for my system with 3 types of elements and total 12 atoms. (3 elements each). Semiconductor. The resources: CPUs=288, each node = 48 cores, so 6 Nodes used.

a) with KPAR is used, each NQ step took around 1 hr. allowed to run for 7 NQ steps. thought it would be waste if its not complete. Then first tested the point (b), below
b) with KPAR=6 (number of nodes), each NQ step took 26 mins approx. so, 2 steps per 1 hr. while allowing it to run for some time, tested (c) below
c) with KPAR==48 (number of cores in each node), each NQ step took almost same time of 26 mins. So no improvement over point (b) above. So, i stopped both (a) and (b) runns and allowed the (c) to complete.

The Run-(c) took 8hr 12 mins, to finish total of 19 NQ steps.

N.B: But now i understand that to get the GW Band structure, i should use Wannier90 tag in INCAR. is that my run gone waste?
i asked another query similar to this using Si_GW example from VASP tutorial. Now my target is to get GW gap and Band structure, and then BSE optical absorption spectrum, Can i still use the above run? if needed, i will ask this question separately.

Regards

Re: Can the GW run be restart from where it stops?

Posted: Thu Aug 26, 2021 2:05 am
by marie-therese.huebsch
Dear SKM,

Good news, your run did not go to waste!

You can do selective postprocessing using ALGO=None. For your case, set the following in the INCAR file:

Code: Select all

ALGO = NONE  ! no electronic changes
NELM = 1   

! set this as you want 
ISMEAR =
SIGMA =

! set this as in your previous run
NBANDS =

! Wannier90
LWANNIER90 = T
NUM_WANN  =          ! number of Wannier orbitals

! do not overwrite 
LWAVE    = .FALSE.    ! WAVECAR
LCHARG = .FALSE.    ! CHGCAR
And do not forget to also provide basic information in the wannier90.win file. Particularly, the projections block must be supplied. The k points can be added by VASP automatically.

Regarding restarting and speeding up GW calculations. Testing it on your infrastructure is really the best you can do 👍 Did MANA work?

Best regards,
Marie-Therese

Re: Can the GW run be restart from where it stops?

Posted: Sat Sep 04, 2021 11:43 am
by SKM
Thank you Marie, for the reply.
i could not try the MANA. Requested out system team to test. They informed it will take time.
May be once a i familiarized with the computations first, then will try on my own.
For this thread, its fine. but i faced some issue with this same run, for the GW/BSE combination.
Will start another thread, as not to mix-up topics
Regards