ML NCSHMEM: Difference between revisions

From VASP Wiki
No edit summary
No edit summary
Line 5: Line 5:
Description: Specifies the number of MPI ranks that share a single shared memory segment.  
Description: Specifies the number of MPI ranks that share a single shared memory segment.  
----
----
'''Default''': The default is chosen such that it is divided by 2 (maximum up to three times) until the value is lower than or equal to 24 or the value cannot be subdivided further since it becomes an odd number. E.g. 128 becomes 16, 48 becomes 24, 80 becomes 20 and 16 stays 16.  
'''Default''': The default is chosen such that it is divided by 2 (maximum up to three times) until the value is lower than or equal to 24 or the value cannot be subdivided further since it becomes an odd number. E.g. 128 becomes 16, 48 becomes 24, 80 becomes 20, 86 becomes 43 and 16 stays 16.  


The total number of memory segments created equals the number of cores per node divided by {{TAG|ML_NCSHMEM}}. All memory segments have identical sizes, so a larger number of segments results in higher total memory consumption.  
The total number of memory segments created equals the number of cores per node divided by {{TAG|ML_NCSHMEM}}. All memory segments have identical sizes, so a larger number of segments results in higher total memory consumption.  

Revision as of 08:01, 17 March 2026

ML_NCSHMEM = [integer] 

Default: ML_NCSHMEM = komplex see below VASP.6.6.0 or higher
= Number of available ranks per computational node else

Description: Specifies the number of MPI ranks that share a single shared memory segment.


Default: The default is chosen such that it is divided by 2 (maximum up to three times) until the value is lower than or equal to 24 or the value cannot be subdivided further since it becomes an odd number. E.g. 128 becomes 16, 48 becomes 24, 80 becomes 20, 86 becomes 43 and 16 stays 16.

The total number of memory segments created equals the number of cores per node divided by ML_NCSHMEM. All memory segments have identical sizes, so a larger number of segments results in higher total memory consumption.

However, on systems with multiple NUMA domains, performance can degrade significantly during machine-learned force field inference if all domains access the same memory segment. For optimal performance, each NUMA domain should have its own dedicated shared memory segment. For more details, we refer to NCSHMEM.

Related tags and articles

ML_LMLFF, ML_MODE, NCSHMEM, Shared memory