ML_MB and MCONF getting too large, exceeding memory

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
Poonam_Chauhan
Newbie
Newbie
Posts: 10
Joined: Wed Jul 12, 2023 9:21 am

ML_MB and MCONF getting too large, exceeding memory

#1 Post by Poonam_Chauhan » Mon Jun 22, 2026 7:28 am

Hi all,
I'm trying to train a force field for my interface system between two solids at different temperatures. I've first tried training the force field separately for bulk of the two solids first and then interface between them. My force field training for bulk system seemed to run fine at different temperatures, but whenever I tried training the interface, the bayesian error is very high, giving rise to a large ML_MB configuration which is eventually exceeding my system memory. I'm using HPC cluster with upto 700 gb ram and 48 cores per node. I want to solve this memory issue without compromising too much of accuracy.
My input file is as follow:

SYSTEM=interface
#Start parameter
ISTART = 0
ICHARG = 2
ISMEAR = 0
ISYM = 0
SIGMA = 0.04
ENCUT = 500
PREC =Normal
LREAL = Auto
ALGO= Fast
EDIFF = 1E-6
IVDW = 12
LASPH = .TRUE.
#MD SETTINGS
IBRION = 0
ISIF = 3
NSW = 20000
POTIM = 2
NCORE = 4
#THERMOSTAT
MDALGO = 3
TEBEG = 300
TEEND = 300
LANGEVIN_GAMMA = 10 10 10 10

#MACHINE LEARNING
ML_ISTART = 1
ML_LMLFF = .TRUE.
ML_MODE = TRAIN
ML_RCUT1 = 6.0
ML_RCUT2 = 5.0
ML_EPS_LOW = 1E-7
ML_ICRITERIA = 1
ML_MCONF = 6000
#ML_CX = -0.1
ML_MB = 4000


michael_wolloch
Global Moderator
Global Moderator
Posts: 212
Joined: Tue Oct 17, 2023 10:17 am

Re: ML_MB and MCONF getting too large, exceeding memory

#2 Post by michael_wolloch » Mon Jun 22, 2026 9:37 am

Dear Poonam Chauhan,

Please provide more input and output files to create a minimal reproducible example.
Especially the ML_LOGFILE, but also all input files (including ML_AB files from the bulk trainings if they are not too large). You can upload them to the forum as a compressed tarball (*.tar.gz).

It would also be very helpful if you could provide the exact steps you took to train the separate bulk systems and how you combined them for the interface.

Thanks, Michael


Post Reply