NCSHMEM: Difference between revisions

From VASP Wiki
(More detailed description and discussion of NUMA domains)
mNo edit summary
Line 4: Line 4:


----
----
By default no shared memory MPI is in the non-cubic scaling GW routines ({{TAG|ALGO}}=EVGW, EVGW0, QPGW and QPGW0).
By default, shared memory is not used in the non-cubic scaling GW routines ({{TAG|ALGO}}=EVGW, EVGW0, QPGW and QPGW0).
To use shared memory MPI in the non-cubic scaling GW, set NCSHMEM to -1 or a positive integer.  
To use shared-memory MPI in the non-cubic scaling GW, set NCSHMEM to -1 or a positive integer.  
If NCSHMEM is set to -1, VASP will automatically determine how many nodes physically share memory and set NCSHMEM to that number. For instance, if the machine has 48 or 128 cores per node, NCSHMEM will default to 48 or 128, respectively.
If NCSHMEM is set to -1, VASP will automatically determine how many nodes physically share memory and set NCSHMEM to that number. For instance, if the machine has 48 or 128 cores per node, NCSHMEM will default to 48 or 128, respectively.
Note that larger numbers can degrade performance, particularly on NUMA architectures and multi socket machines. In this case, it is often necessary to manually decrease NCSHMEM to a value between 8 and 32 in order to strike a balance between memory savings and performance.
Note that larger numbers can degrade performance, particularly on NUMA architectures and multi socket machines. In this case, it is often necessary to manually decrease NCSHMEM to a value between 8 and 32 in order to strike a balance between memory savings and performance.

Revision as of 09:25, 5 February 2026

NCSHMEM = [integer]
Default: NCSHMEM = 1 

Description: NCSHMEM determines the number of compute cores sharing the memory in MPI in the non-cubic scaling GW routines.


By default, shared memory is not used in the non-cubic scaling GW routines (ALGO=EVGW, EVGW0, QPGW and QPGW0). To use shared-memory MPI in the non-cubic scaling GW, set NCSHMEM to -1 or a positive integer. If NCSHMEM is set to -1, VASP will automatically determine how many nodes physically share memory and set NCSHMEM to that number. For instance, if the machine has 48 or 128 cores per node, NCSHMEM will default to 48 or 128, respectively. Note that larger numbers can degrade performance, particularly on NUMA architectures and multi socket machines. In this case, it is often necessary to manually decrease NCSHMEM to a value between 8 and 32 in order to strike a balance between memory savings and performance. Note that most EPYC HPC clusters are configured with a mode that divides the memory into four NUMA domains per socket. Therefore, if you use a machine with 64 cores per socket, each NUMA domain will consist of 16 cores and NCSHMEM=16 will yield the best performance (in this case, shmem will not be shared between NUMA domains). The performance penalty may be particularly significant if memory is shared between sockets.

Related tags and articles

Shared memory, ALGO, Practical guide to GW calculations