Page 1 of 1

ML_MB and MCONF getting too large, exceeding memory

Posted: Mon Jun 22, 2026 7:28 am
by Poonam_Chauhan

Hi all,
I'm trying to train a force field for my interface system between two solids at different temperatures. I've first tried training the force field separately for bulk of the two solids first and then interface between them. My force field training for bulk system seemed to run fine at different temperatures, but whenever I tried training the interface, the bayesian error is very high, giving rise to a large ML_MB configuration which is eventually exceeding my system memory. I'm using HPC cluster with upto 700 gb ram and 48 cores per node. I want to solve this memory issue without compromising too much of accuracy.
My input file is as follow:

SYSTEM=interface
#Start parameter
ISTART = 0
ICHARG = 2
ISMEAR = 0
ISYM = 0
SIGMA = 0.04
ENCUT = 500
PREC =Normal
LREAL = Auto
ALGO= Fast
EDIFF = 1E-6
IVDW = 12
LASPH = .TRUE.
#MD SETTINGS
IBRION = 0
ISIF = 3
NSW = 20000
POTIM = 2
NCORE = 4
#THERMOSTAT
MDALGO = 3
TEBEG = 300
TEEND = 300
LANGEVIN_GAMMA = 10 10 10 10

#MACHINE LEARNING
ML_ISTART = 1
ML_LMLFF = .TRUE.
ML_MODE = TRAIN
ML_RCUT1 = 6.0
ML_RCUT2 = 5.0
ML_EPS_LOW = 1E-7
ML_ICRITERIA = 1
ML_MCONF = 6000
#ML_CX = -0.1
ML_MB = 4000


Re: ML_MB and MCONF getting too large, exceeding memory

Posted: Mon Jun 22, 2026 9:37 am
by michael_wolloch

Dear Poonam Chauhan,

Please provide more input and output files to create a minimal reproducible example.
Especially the ML_LOGFILE, but also all input files (including ML_AB files from the bulk trainings if they are not too large). You can upload them to the forum as a compressed tarball (*.tar.gz).

It would also be very helpful if you could provide the exact steps you took to train the separate bulk systems and how you combined them for the interface.

Thanks, Michael