NPAR: Difference between revisions

From VASP Wiki
No edit summary
Line 1: Line 1:
{{TAGDEF|NPAR|[integer]|number of cores}}
{{TAGDEF|NPAR|[integer]|available ranks}}


Description: {{TAG|NPAR}} determines the number of bands that are treated in parallel.  
Description: {{TAG|NPAR}} determines the number of bands that are treated in parallel. This is a legacy tag; use {{TAG|NCORE}} instead.


----
----
VASP currently offers parallelization and data distribution over bands and/or over plane wave coefficients, and as of VASP.5.3.2, parallelization over '''k'''-points (no data distribution, see {{TAG|KPAR}}).
To obtain high efficiency on massively parallel systems or modern multi-core machines, it is strongly recommended to use all at the same time. Most algorithms work with any data distribution (except for the single band conjugated gradient, which is considered to be obsolete).


{{TAG|NPAR}} determines how many bands are treated in parallel. The current default is {{TAG|NPAR}}=''number of cores'', meaning that one orbital is treated by one core. {{TAG|NCORE}} is then set to 1. If {{TAG|NPAR}}=1, {{TAG|NCORE}} is set to the number of cores. This implies data distribution over plane wave coefficients only: all cores will work together on every individual band, i.e., the plane wave coefficients of each band are distributed over all cores. This is usually very slow and should be avoided.
{{NB|warning|{{TAG|NCORE}} is the recommended tag for controlling band-level parallelization and has been available since VASP.5.2.13. It is more intuitive and directly expresses the size of each band group. Only use {{TAG|NPAR}} if you have a specific reason to prefer it. If both {{TAG|NPAR}} and {{TAG|NCORE}} are specified in the {{FILE|INCAR}} file, {{TAG|NPAR}} takes precedence.}}


{{TAG|NPAR}}=''number of cores'' is the optimal setting for platforms with a small communication bandwidth and is a good choice for up to 8 cores, as well as for machines with a single core per node and a Gigabit network. However, this mode substantially increases the memory requirements, because the non-local projector functions must be stored entirely on each core. In addition, substantial all-to-all communications are required to orthogonalize the bands. On massively parallel systems and modern multi-core machines we strongly urge to set
== Relationship to NCORE ==


:<math>\textrm{NPAR}\approx\sqrt{\textrm{\#of}\; \textrm{cores}}</math>
VASP distributes the available MPI ranks into band groups that each work on one band, parallelizing the [[Energy_cutoff_and_FFT_meshes#FFT_mesh|FFTs]] for that band. For the common case that {{TAG|IMAGES|1}} and no other algorithm-dependent parallelization (e.g., {{TAG|NOMEGAPAR}}) is active::


or
:<math>\text{available ranks} = \frac{\text{total MPI ranks}}{\text{KPAR}}</math>


:<math>\textrm{NCORE}=\textrm{\#of}\;\textrm{cores}\;\textrm{per}\;\textrm{compute}\;\textrm{node}</math>
{{TAG|NPAR}} sets the number of band groups; {{TAG|NCORE}} sets the size of each band group. They are strict inverses:


:<math>\text{NPAR} \times \text{NCORE} = \text{available ranks}</math>


In selected cases, we found that this improves the performance by a factor of up to four compared to the default, and it also significantly improves the stability of the code due to reduced memory requirements.
The default ({{TAG|NPAR}} = available ranks) is equivalent to {{TAG|NCORE|1}}: each band is handled by a single rank.


{{TAG|NCORE}} is available from VASP.5.2.13 on, and is more handy than the previous parameter {{TAG|NPAR}}.
{{NB|warning|Setting {{TAG|NPAR|1}} means all available ranks collaborate on a single band (plane-wave coefficient distribution only). No band parallelization occurs. This is almost always very slow and should be avoided.}}
The user should either specify {{TAG|NCORE}} or {{TAG|NPAR}}, where {{TAG|NPAR}} takes a higher preference.
{{NB|tip|See the [[optimizing the parallelization]] page for a step-by-step guide to finding the best parallelization setup for your system, and {{TAG|NCORE}} for information on how to parallelize over [[Energy_cutoff_and_FFT_meshes#FFT_mesh|FFTs]] in particular.}}
The relation between both parameters is


:<math>\textrm{NCORE}=\textrm{\#of}\; \textrm{cores}/\textrm{NPAR}</math>
== Related tags and articles ==
{{TAG|NCORE}},
{{TAG|KPAR}},
{{TAG|LPLANE}},
{{TAG|LSCALU}},
{{TAG|NSIM}},
{{TAG|LSCALAPACK}},
{{TAG|LSCAAWARE}},
[[GPU ports of VASP]],
[[Combining MPI and OpenMP]],
[[Optimizing the parallelization]],
[[:Category:Parallelization]],
[[Energy cutoff and FFT meshes]]


{{sc|NPAR|HowTo|Workflows that use this tag}}
----


The optimum settings for {{TAG|NPAR}} and {{TAG|LPLANE}} depend strongly on the type of machine you are using.
[[Category:INCAR tag]][[Category:Performance]][[Category:Parallelization]]
Some recommended setups:
 
*LINUX cluster linked by Infiniband, modern multicore machines:  
:On a LINUX cluster with multicore machines linked by a fast network we recommend to set
<pre>
LPLANE = .TRUE.
NCORE  = number of cores per node (e.g. 4 or 8)
LSCALU = .FALSE.
NSIM  = 4
</pre>
:If very many nodes are used, it might be necessary to set {{TAG|LPLANE}}=.FALSE., but usually this offers very little advantage. For long (e.g. molecular dynamics runs), we recommend to optimize {{TAG|NPAR}} by trying short runs for different settings.
 
*LINUX cluster linked by 1 Gbit Ethernet, and LINUX clusters with single cores:
:On a LINUX cluster linked by a relatively slow network, {{TAG|LPLANE}} must be set to .TRUE., and the {{TAG|NPAR}} flag should be equal to the number of cores:
<pre>
LPLANE = .TRUE.
NCORE  = 1
LSCALU = .FALSE.
NSIM  = 4
</pre>
:Mind that you need at least a 100 Mbit full duplex network, with a fast switch offering at least 2 Gbit switch capacity to find usefull speedups. Multi-core machines should be always linked by an Infiniband, since Gbit is too slow for multi-core machines.
 
*Massively parallel machines (Cray, Blue Gene):
:On many massively parallel machines one is forced to use a huge number of cores. In this case load balancing problems and problems with the communication bandwidth are likely to be experienced. In addition the local memory is fairly small on some massively parallel machines; too small keep the real space projectors in the cache with any setting. Therefore, we recommend to set {{TAG|NPAR}} on these machines to &radic;''# of cores'' (explicit timing can be helpful to find the optimum value). The use of {{TAG|LPLANE}}=.TRUE. is only recommended if the number of nodes is significantly smaller than {{TAG|NGX}}, {{TAG|NGY}} and {{TAG|NGZ}}.
 
:In summary, the following setting is recommended
<pre>
LPLANE = .FALSE.
NPAR  = sqrt(number of cores)
NSIM  = 1
</pre>


== Related tags and articles ==
== Related tags and articles ==
{{TAG|NCORE}},
{{TAG|NCORE}},
{{TAG|KPAR}},
{{TAG|LPLANE}},
{{TAG|LPLANE}},
{{TAG|LSCALU}},
{{TAG|LSCALU}},
{{TAG|NSIM}},
{{TAG|NSIM}},
{{TAG|KPAR}},
[[Optimizing the parallelization]],
{{TAG|LSCALAPACK}},
[[:Category:Parallelization]]
{{TAG|LSCAAWARE}}


{{sc|NPAR|Examples|Examples that use this tag}}
{{sc|NPAR|Examples|Examples that use this tag}}
----
----


[[Category:INCAR tag]][[Category:parallelization]]
[[Category:INCAR tag]][[Category:Parallelization]]

Revision as of 09:19, 18 March 2026

NPAR = [integer]
Default: NPAR = available ranks 

Description: NPAR determines the number of bands that are treated in parallel. This is a legacy tag; use NCORE instead.



Warning: NCORE is the recommended tag for controlling band-level parallelization and has been available since VASP.5.2.13. It is more intuitive and directly expresses the size of each band group. Only use NPAR if you have a specific reason to prefer it. If both NPAR and NCORE are specified in the INCAR file, NPAR takes precedence.

Relationship to NCORE

VASP distributes the available MPI ranks into band groups that each work on one band, parallelizing the FFTs for that band. For the common case that IMAGES = 1 and no other algorithm-dependent parallelization (e.g., NOMEGAPAR) is active::

[math]\displaystyle{ \text{available ranks} = \frac{\text{total MPI ranks}}{\text{KPAR}} }[/math]

NPAR sets the number of band groups; NCORE sets the size of each band group. They are strict inverses:

[math]\displaystyle{ \text{NPAR} \times \text{NCORE} = \text{available ranks} }[/math]

The default (NPAR = available ranks) is equivalent to NCORE = 1: each band is handled by a single rank.


Warning: Setting NPAR = 1 means all available ranks collaborate on a single band (plane-wave coefficient distribution only). No band parallelization occurs. This is almost always very slow and should be avoided.
Tip: See the optimizing the parallelization page for a step-by-step guide to finding the best parallelization setup for your system, and NCORE for information on how to parallelize over FFTs in particular.

Related tags and articles

NCORE, KPAR, LPLANE, LSCALU, NSIM, LSCALAPACK, LSCAAWARE, GPU ports of VASP, Combining MPI and OpenMP, Optimizing the parallelization, Category:Parallelization, Energy cutoff and FFT meshes

Workflows that use this tag


Related tags and articles

NCORE, KPAR, LPLANE, LSCALU, NSIM, Optimizing the parallelization, Category:Parallelization

Examples that use this tag