Page 1 of 1

memory leak: AIMD with openmpi 4.1.4 on GPUs

Posted: Sat Nov 19, 2022 2:12 pm
by liu_jiyuan
Hi all,

I am Liu Jiyuan, who asked this memory leak problem in the Q&A during the VASP workshop.

This job was run by 2xA30 GPU associated with the 2 Xeon Gold 6326 sockets and 256 G memory. The used memory exceeded the total memory when the calculation reached 6700+ steps. The VASP was compiled by nvhpc 22.7 along with cuda 11.7 and VTST. The ompi414 was compiled by nvc+nvfortran with coda aware.

Thanks!

Re: memory leak: AIMD with openmpi 4.1.4 on GPUs

Posted: Mon Nov 21, 2022 5:06 pm
by henrique_miranda
Hi Liu,

Could you try running the same calculation using OMP_NUM_THREADS=1 and check if the problem persists?
Recently we had a report about a similar issue in this thread:
https://www.vasp.at/forum/viewtopic.php?f=3&t=18493
We are still looking into it but knowing whether setting OMP_NUM_THREADS=1 alleviates the issue would be a great help for us to narrow down the scope of possible issues.

Re: memory leak: AIMD with openmpi 4.1.4 on GPUs

Posted: Thu Nov 24, 2022 1:02 am
by liu_jiyuan
Hi Henrique,

OMP_NUM_THREADS=1 works! The memory usage is greatly reduced.

For OMP_NUM_THREADS=16 ion step 0~6000 OUTCAR:
Total CPU time used (sec): 96743.969
User time (sec): 94038.925
System time (sec): 2705.043
Elapsed time (sec): 70288.818

Maximum memory used (kb): 131238960.
Average memory used (kb): N/A

Minor page faults: 102568499
Major page faults: 5194
Voluntary context switches: 59524318

For OMP_NUM_THREADS=1 ion step 6001~12000 OUTCAR (continue run):
Total CPU time used (sec): 83684.477
User time (sec): 83524.712
System time (sec): 159.766
Elapsed time (sec): 83831.101

Maximum memory used (kb): 16510832.
Average memory used (kb): N/A

Minor page faults: 18466879
Major page faults: 4622
Voluntary context switches: 846170

The real usage of memory is much higher that the recorded one, but the magnitude makes sense.

Thanks.