MD-on the fly-out of memory

Question on input files/tags, interpreting output, etc.

Please check whether the answer to your question is given in the VASP online manual or has been discussed in this forum previously!

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
gniding
Newbie
Newbie
Posts: 11
Joined: Sun Nov 17, 2019 4:55 am

MD-on the fly-out of memory

#1 Post by gniding » Sat Nov 26, 2022 8:46 am

Dear everyone:
I'm running the MD simulation using the on-the-fly method.But it has some problem,always show the out of memory after few steps.The errors message are show as follows:

slurmstepd: error: Detected 2 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: cpu04: task 0: Out Of Memory
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 3 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 2 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=272349.0. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: First task exited 30s ago
srun: StepId=272349.0 tasks 1-38,40-90,92-112,114-156,158-182,184-191: running
srun: StepId=272349.0 tasks 0,39,91,113,157,183: exited abnormally
srun: launch/slurm: _step_signal: Terminating StepId=272349.0
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 272349.0 ON cpu04 CANCELLED AT 2022-11-26T13:18:22 ***

If you have some ideas,please kindly let me know.
THANK YOU!

merzuk.kaltak
Administrator
Administrator
Posts: 277
Joined: Mon Sep 24, 2018 9:39 am

Re: MD-on the fly-out of memory

#2 Post by merzuk.kaltak » Mon Nov 28, 2022 7:27 am

Please upload an error report (INCAR, POSCAR, POTCAR, KPOINTS, OUTCAR, OSZICAR).

gniding
Newbie
Newbie
Posts: 11
Joined: Sun Nov 17, 2019 4:55 am

Re: MD-on the fly-out of memory

#3 Post by gniding » Mon Nov 28, 2022 12:57 pm

Thank you
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 422
Joined: Mon Nov 04, 2019 12:44 pm

Re: MD-on the fly-out of memory

#4 Post by ferenc_karsai » Mon Nov 28, 2022 1:29 pm

Please also post your ML_LOGFILE and your ML_AB file.

Have you compiled with shared memory MPI (-Duse_shmem)?

gniding
Newbie
Newbie
Posts: 11
Joined: Sun Nov 17, 2019 4:55 am

Re: MD-on the fly-out of memory

#5 Post by gniding » Tue Nov 29, 2022 2:17 am

Good morning!
No, the used VASP6 is not compiled with -Duse-shmem.
You do not have the required permissions to view the files attached to this post.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 422
Joined: Mon Nov 04, 2019 12:44 pm

Re: MD-on the fly-out of memory

#6 Post by ferenc_karsai » Fri Dec 02, 2022 12:28 pm

I think you simply run out of memory.

The beginning of the ML_LOGFILE shows the estimate for the memory required for each core. This is done before allocation, so it will be printed even if your calculation crashes due to insufficient memory.
In your case it is 12 GB per core. Do you really have this amount of memory?

How to bring the memory consumption down:
-) First of all compile with shared memory mpi (-Duse_shmem). This will hugely reduce the required memory.
-) You can also go to more cores. Many of the large arrays scale almost linearly with the number of cores.
-) You have set ML_MB=13000. That is really a high number. I have never run a calculation with that size. You can set a maximum value for ML_MB and then set "ML_LBASIS_DISCARD=.TRUE.". You will probably have to retrain from scratch and also do this in the previous calculations. Please also read our best practices site, where ML_LBASIS_DISCARD is further explained:
https://www.vasp.at/wiki/index.php/Best ... rce_fields

Post Reply