Continuation run when retraining MLFF

Message

maxime_legrand · #1 Post by **maxime_legrand** » Tue Feb 10, 2026 9:39 am

Dear developpers,

I am currently re-training MLFFs by playing on the maximum amount of local configurations and calculations exceed the wall time of my cluster.

I was wondering if there was a way to safely restart a continuation run ? I mean by discarding unused structures from the initial ML_AB file (if not shuffled during selection) or whatever.

Also, I increased ML_MCONF_NEW to reduce computation time, but I am struggling to see its concrete impact on the resulting FF.

Could anyone help me ?

Thanks in advance !

#2 Post by **ferenc_karsai** » Thu Feb 12, 2026 10:24 am

There is no built in way to discard structures that don't lead to local reference configurations. If you want to discard those structures you have to write a script that deletes them from the ML_AB file. Just simply removing those structures and reducing the total number of training structures at the beginning should do the job. But you should question yourself why you want to do this. Additional training structures may need more resources but they also help to stabilize the force field. I would rather keep them. We never deleted training structures hence there is also no option for that.

ML_MCONF_NEW sets the number of structures that are added to the set of training structures (and local reference configurations chosen from them) before refitting during on-the-fly training (and reselection of local reference configurations). Setting ML_MCONF_NEW increases efficiency of the fittings since overall a lesser number of fits has to be done. At the other hand memory consumption is increased since more new candidates have to be stored (especially problematic if the structure contains many atoms) and the stability of the force field is crooked since it takes longer to update the force field and make it better. This can result in force fields that try to sample more training structures during on-the-fly leading to no real gain in performance overall. During reselection (ML_MODE=select) increasing ML_MCONF_NEW can hugely impact the computation time, but the final results can become different (posibly worse) than with a small ML_MCONF_NEW. This has to be individually tested in that mode.
To sum up ML_MCONF_NEW should rather be not changed from the default in ML_MODE=train. In ML_MODE=select it can improve the efficiency of the calculation but the results migth worsen. In ML_MODE=refit this variable should have not impact.

VASP Forum

Continuation run when retraining MLFF

Continuation run when retraining MLFF

Re: Continuation run when retraining MLFF