NBLOCK FOCK: Difference between revisions

From VASP Wiki
(Created page with "{{TAGDEF|NBLOCK_FOCK|[integer]|64}} Description: {{TAG|NBLOCK_FOCK}} sets the number of orbitals that are processed simultaneously when computing the action of the Fock potential. Tuning this parameter can significantly affect both the performance and memory consumption of hybrid functional calculations. Especially on GPUs, {{TAG|NBLOCK_FOCK}} should be tuned carefully to achieve optimal performance. ---- Instead of computing the action of the Fock potential on one orbi...")
 
No edit summary
Line 1: Line 1:
{{TAGDEF|NBLOCK_FOCK|[integer]|64}}
{{TAGDEF|NBLOCK_FOCK|[integer]}
{{DEF|LNONCOLLINEAR|64|'''CPU build''' |32|'''GPU build''' (OpenACC/OpenMP offload))|<code>OMP_NUM_THREADS</code>|'''CPU build''' that is compiled with [[Precompiler_options#-D_OPENMP|OpenMP support]] and threading is active (<code>OMP_NUM_THREADS</code>&nbsp;>&nbsp;1)}}
{{DISPLAYTITLE:NBLOCK_FOCK}}
Description: Sets the number of orbitals that are processed simultaneously when computing the action of the Fock potential.
----


Description: {{TAG|NBLOCK_FOCK}} sets the number of orbitals that are processed simultaneously when computing the action of the Fock potential. Tuning this parameter can significantly affect both the performance and memory consumption of hybrid functional calculations. Especially on GPUs, {{TAG|NBLOCK_FOCK}} should be tuned carefully to achieve optimal performance.
----
Instead of computing the action of the Fock potential on one orbital at a time, up to {{TAG|NBLOCK_FOCK}} orbitals are gathered and processed at once. This enables the use of matrix-matrix operations rather than matrix-vector operations, which is beneficial for performance on modern hardware.
Instead of computing the action of the Fock potential on one orbital at a time, up to {{TAG|NBLOCK_FOCK}} orbitals are gathered and processed at once. This enables the use of matrix-matrix operations rather than matrix-vector operations, which is beneficial for performance on modern hardware.


The default values depend on the build type:
Tuning {{TAG|NBLOCK_FOCK}} can significantly affect both the performance and memory consumption of hybrid functional calculations. Especially on GPUs, {{TAG|NBLOCK_FOCK}} should be tuned carefully to achieve optimal performance.
* '''CPU build''': {{TAG|NBLOCK_FOCK}} = 64. When VASP is compiled with [[Precompiler_options#-D_OPENMP|OpenMP support]] and threading is active (<code>OMP_NUM_THREADS</code>&nbsp;>&nbsp;1), the default is automatically set to 2&times;<code>OMP_NUM_THREADS</code>.
* '''GPU build''' (OpenACC/OpenMP offload): {{TAG|NBLOCK_FOCK}} = 32.
 
{{NB|tip|On GPU architectures, the optimal value of {{TAG|NBLOCK_FOCK}} depends strongly on the specific hardware and the number of bands. It is recommended to experiment with values in the range 16-64.}}
{{NB|tip|On GPU architectures, the optimal value of {{TAG|NBLOCK_FOCK}} depends strongly on the specific hardware and the number of bands. It is recommended to experiment with values in the range 16-64.}}


Line 21: Line 20:
{{TAG|PRECFOCK}}
{{TAG|PRECFOCK}}


{{sc|NBLOCK_FOCK|Examples|Examples that use this tag}}
{{sc|NBLOCK_FOCK|HowTo|Workflows that use this tag}}
----
----


[[Category:INCAR tag]][[Category:Performance]][[Category:Hybrid functionals]][[Category:parallelization]]
[[Category:INCAR tag]][[Category:Performance]][[Category:Hybrid functionals]][[Category:parallelization]]

Revision as of 08:07, 31 March 2026

{{TAGDEF|NBLOCK_FOCK|[integer]}

Default: LNONCOLLINEAR = 64 CPU build
= 32 GPU build (OpenACC/OpenMP offload))
= OMP_NUM_THREADS CPU build that is compiled with OpenMP support and threading is active (OMP_NUM_THREADS > 1)

Description: Sets the number of orbitals that are processed simultaneously when computing the action of the Fock potential.


Instead of computing the action of the Fock potential on one orbital at a time, up to NBLOCK_FOCK orbitals are gathered and processed at once. This enables the use of matrix-matrix operations rather than matrix-vector operations, which is beneficial for performance on modern hardware.

Tuning NBLOCK_FOCK can significantly affect both the performance and memory consumption of hybrid functional calculations. Especially on GPUs, NBLOCK_FOCK should be tuned carefully to achieve optimal performance.

Tip: On GPU architectures, the optimal value of NBLOCK_FOCK depends strongly on the specific hardware and the number of bands. It is recommended to experiment with values in the range 16-64.

Related tags and articles

NSIM, LHFCALC, AEXX, HFSCREEN, NCORE, NPAR, KPAR, PRECFOCK

Workflows that use this tag