Page 1 of 1

NPAR or NCORE setting for NEB

Posted: Tue May 04, 2021 5:11 am
by hlzhan
Hi, I wish to run a nudged elastic band simulation with 14 images. I wonder if my following settings are optimized for an efficient run.

If I use 84 cores (3 nodes * 28 cores-per-node), each image gets 6 cores.

According to VASPwiki (wiki/index.php/NPAR), I should set NPAR=2 [i.e., \sqrt(\#of cores)] in a cluster with fast network and NPAR=6 in a cluster with slow network. This means that NCORE=3 and NCORE=1, respectively.

However, I found the following warning/advice in the log file, suggesting that the value of NCORE should be between 4 and \sqrt(6)=2.

So I guess my question is, which NCORE value should I choose if I have a fast network? NCORE=3 according to VASPwiki, or NCORE=4 (the minimum number in the advice), or NCORE=2 (the upper bound value in the advice)? Thanks a lot.

-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| For optimal performance we recommend to set |
| NCORE = 4 - approx SQRT(number of cores). |
| NCORE specifies how many cores store one orbital (NPAR=cpu/NCORE). |
| This setting can greatly improve the performance of VASP for DFT. |
| The default, NCORE=1 might be grossly inefficient on modern |
| multi-core architectures or massively parallel machines. Do your |
| own testing!!!! |
| Unfortunately you need to use the default for GW and RPA |
| calculations (for HF NCORE is supported but not extensively tested |
| yet). |
| |
-----------------------------------------------------------------------------

Re: NPAR or NCORE setting for NEB

Posted: Fri May 07, 2021 12:59 pm
by andreas.singraber
Hello!

First, you may consider using less images for your NEB simulation. Here is a quote from the VASP Wiki (wiki/index.php/IMAGES):
The fewer images are used, the faster to convergence to the groundstate is. Often, it is advisable to start with a single image between the two endpoints, and to increase the number of images, once this first run has converged.
14 images looks like a very high number already, potentially your NEB simulation will also work with much less. However, sticking to your example with 14 images it should be noted that you may find unexpected parallel performance because not all your images can rely on intra-node communication. Have a look at this schematic view of your parallel setup where each "*" denotes one core and "|" represents the node boundaries:

Code: Select all

nodes/cores:  |****************************|****************************|****************************|
images:       |(--1-)(--2-)(--3-)(--4-)(--5|-)(--6-)(--7-)(--8-)(--9-)(-|10-)(-11-)(-12-)(-13-)(-14-)|
In the second line I aligned the images (6 cores each). You can easily spot that images 5 and 10 will require communication over node boundaries and therefore
(depending on your network) may slow down the entire NEB simulation. This should be avoided and I suggest using a combination of number of images and/or nodes that would allow the images to fit within node boundaries.

Another point is that I found that the recommendation from your warning message

Code: Select all

NCORE = 4 - approx SQRT(number of cores)
is not up-to-date with the current source code and the Wiki page (this has changed about 4 months ago). The current recommendation is (in accordance with
the Wiki):

Code: Select all

NCORE = 2 up to number-of-cores-per-socket
Hence, if we stick to your example where you have 6 cores per image available the recommendation says NCORE=2,3,6 can work which would correspond to NPAR=3,2 and 1, respectively. I recommend you test out these values in a smaller setting where you are not using 14 images. Start with a configuration from a single image and do a normal VASP test run with different NCORE values. Then use the optimal value for your large NEB simulation.

Maybe another hint from the NCORE Wiki page (wiki/index.php/NCORE) is useful in your case:
The best value NCORE depends somewhat on the number of atoms in the unit cell. Values around 4 are usually ideal for 100 atoms in the unit cell. For very large unit cells (more than 400 atoms) values around 12-16 are often optimal.