ML NCSHMEM: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
Line 1: Line 1:
{{DISPLAYTITLE:ML_NCSHMEM}}
{{DISPLAYTITLE:ML_NCSHMEM}}
{{TAGDEF|ML_NCSHMEM|[integer]|Number of available ranks per node}}
{{TAGDEF|ML_NCSHMEM|[integer]|Number of available ranks per computational node}}


Description: Sets the number of ranks that share a shared memory segment.  
Description: Specifies the number of MPI ranks that share a single shared memory segment.  
----
----
The number of memory segments that are created are equal to the number of cores per node divided by {{TAG|ML_NCSHMEM}}. Each memory segement has the same size in memory. Hence, the more segments there are the higher the total memory consumption is. However on some machines, especially with a high number of NUMA domains the performance for inference of the MLFFs can drastically degrade if all NUMA domains access the same memory segment. Ideally each domain should have it's own shared memory segment.
The total number of memory segments created equals the number of cores per node divided by {{TAG|ML_NCSHMEM}}. All memory segments have identical sizes, so a larger number of segments results in higher total memory consumption.  
 
However, on systems with multiple NUMA domains, performance can degrade significantly during machine-learned force field inference if all domains access the same memory segment. For optimal performance, each NUMA domain should have its own dedicated shared memory segment.


== Related tags and articles ==
== Related tags and articles ==

Revision as of 13:40, 2 February 2026

ML_NCSHMEM = [integer]
Default: ML_NCSHMEM = Number of available ranks per computational node 

Description: Specifies the number of MPI ranks that share a single shared memory segment.


The total number of memory segments created equals the number of cores per node divided by ML_NCSHMEM. All memory segments have identical sizes, so a larger number of segments results in higher total memory consumption.

However, on systems with multiple NUMA domains, performance can degrade significantly during machine-learned force field inference if all domains access the same memory segment. For optimal performance, each NUMA domain should have its own dedicated shared memory segment.

Related tags and articles

ML_LMLFF, ML_MODE, Shared memory