Continuation run when retraining MLFF

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
maxime_legrand
Newbie
Newbie
Posts: 11
Joined: Wed Oct 23, 2024 11:17 am

Continuation run when retraining MLFF

#1 Post by maxime_legrand » Tue Feb 10, 2026 9:39 am

Dear developpers,

I am currently re-training MLFFs by playing on the maximum amount of local configurations and calculations exceed the wall time of my cluster.

I was wondering if there was a way to safely restart a continuation run ? I mean by discarding unused structures from the initial ML_AB file (if not shuffled during selection) or whatever.

Also, I increased ML_MCONF_NEW to reduce computation time, but I am struggling to see its concrete impact on the resulting FF.

Could anyone help me ?

Thanks in advance !


ferenc_karsai
Global Moderator
Global Moderator
Posts: 581
Joined: Mon Nov 04, 2019 12:44 pm

Re: Continuation run when retraining MLFF

#2 Post by ferenc_karsai » Thu Feb 12, 2026 10:24 am

There is no built in way to discard structures that don't lead to local reference configurations. If you want to discard those structures you have to write a script that deletes them from the ML_AB file. Just simply removing those structures and reducing the total number of training structures at the beginning should do the job. But you should question yourself why you want to do this. Additional training structures may need more resources but they also help to stabilize the force field. I would rather keep them. We never deleted training structures hence there is also no option for that.

ML_MCONF_NEW sets the number of structures that are added to the set of training structures (and local reference configurations chosen from them) before refitting during on-the-fly training (and reselection of local reference configurations). Setting ML_MCONF_NEW increases efficiency of the fittings since overall a lesser number of fits has to be done. At the other hand memory consumption is increased since more new candidates have to be stored (especially problematic if the structure contains many atoms) and the stability of the force field is crooked since it takes longer to update the force field and make it better. This can result in force fields that try to sample more training structures during on-the-fly leading to no real gain in performance overall. During reselection (ML_MODE=select) increasing ML_MCONF_NEW can hugely impact the computation time, but the final results can become different (posibly worse) than with a small ML_MCONF_NEW. This has to be individually tested in that mode.
To sum up ML_MCONF_NEW should rather be not changed from the default in ML_MODE=train. In ML_MODE=select it can improve the efficiency of the calculation but the results migth worsen. In ML_MODE=refit this variable should have not impact.


Post Reply