Page 1 of 1
Large deviation in DFT and ML energies during on the fly training of MLFF
Posted: Mon Oct 13, 2025 12:19 pm
by ccchiu
Dear All,
I am seeking advice on an issue I'm encountering while training an on-the-fly MLFF for a H/Ru surface system using VASP version 6.4.2. I'm observing a large energy jump when the simulation switches from DFT to the MLFF, even when the step is considered "accurate."
My systems constists of 2 H atoms on a Ru slab with 72 Ru atoms. During the simulation, when the MLFF is judged as "accurate" based on the Bayesian error and the DFT calculation is skipped, I see a sudden jump in the total energy in the OSZICAR file. This energy difference between the DFT-calculated steps and the MLFF-predicted steps is consistently close to 100 eV.
My Question:
I understand that the ML and DFT energies can differ during training. However, why would such a large discrepancy (~100 eV) occur during a step that the algorithm has high confidence in? This seems too large to be a simple prediction error and might point to a more fundamental issue in my setup or understanding.
The relevant calculation files can be found in this archive:
https://www.dropbox.com/scl/fi/counsai5 ... 2ccrv&dl=0
Any insights into this behavior would be greatly appreciated.
Thank you for your time.
Best wishes,
Cheng-chau
The calculations files can be found here
https://www.dropbox.com/scl/fi/counsai5 ... 2ccrv&dl=0
best wishes,
Cheng-chau
Re: Large deviation in DFT and ML energies during on the fly training of MLFF
Posted: Mon Oct 13, 2025 5:48 pm
by martin.schlipf
I had a look at the structures in your ML_AB file and found some with very large positive energies.
Code: Select all
grep -A3 'Config\|Total' ML_AB | awk 'NR % 10 == 1 || NR % 10 == 6 || NR % 10 == 8' | grep -B2 '^\s*[0-9]'
Configuration num. 149
Total energy (eV)
849.9462008339254
--
Configuration num. 152
Total energy (eV)
10390.08370774051
--
Configuration num. 164
Total energy (eV)
26150.00619765239
--
Configuration num. 215
Total energy (eV)
849.9462008339254
--
Configuration num. 218
Total energy (eV)
10390.08370774051
--
Configuration num. 230
Total energy (eV)
26150.00619765239
It also seems that the energy of 149 is exactly reproduced as configuration 215 and likewise for the pairs (152, 218) and (164, 230).
Perhaps you tried some setups in the initial training run that did not work and you did not reset the ML_AB file?
Re: Large deviation in DFT and ML energies during on the fly training of MLFF
Posted: Tue Oct 14, 2025 7:09 am
by ccchiu
Dear Martin,
thank you for pointing that issue out. In fact we have been trying to merge different ML_AB files as well as started from one system and continuted the training with an other system (by changing the number of atoms etc). Not impossible that during something went wrong during one or the other calculations. In that case, is there any chance that I can remove the configurations with the positive energy from the ML_AB file?
best,
ccc
Re: Large deviation in DFT and ML energies during on the fly training of MLFF
Posted: Tue Oct 14, 2025 8:25 am
by martin.schlipf
At the moment, we do not have an automatic tool to modify the ML_AB file. There are two approaches you can take:
First, you could extract the structures from the ML_AB file and run with ML_MODE = train only on these structures. You can modify the structure VASP uses with IBRION = 11 or IBRION = 12 (the latter requires a Python plugin).
Secondly, you can manually edit the ML_AB file. This requires a bit of knowledge about the structure of the file. In the wiki, it is documented how to do so in principle.
If you decide to go this route, please make sure that all the remaining structures are reasonable. You may want to check the initial runs to see if the electronic SCF converged.
Re: Large deviation in DFT and ML energies during on the fly training of MLFF
Posted: Fri Oct 17, 2025 9:22 am
by martin.schlipf
You can also look at ML_MODE = select. This is a part of the second approach.
In any case, please make sure that all your structures that you do use are with a consistent setup (ENCUT, PREC, xc functional, ...). You cannot mix results that were obtained with different setups.