MLFF Error Analysis: DFT and ML energies don't match

Message

audrey_thiessen · #1 Post by **audrey_thiessen** » Wed Sep 10, 2025 1:04 am

Hello,

I am trying to use MLFF to look at the volume of liquid copper. I have completed training, and am trying to check test set errors. My analysis shows DFT and ML forces in agreement, but energies and forces not in agreement.

Unknown-2.png

It looks if the energies are shifted (They look strangely grouped and linear)? I have tried changing some of the hyper parameters, but none have made any significant changes. (ML_WTOTEN, ML_LMAX2, ML_MRB1, ML_SION)

For error calculation runs I am selecting 50 random structures from the ML production run and performing static ML and DFT calculations.
I have been using the Total Free Energy “F” from the OSZICAR to compare ML and DFT energies. I wrote my own code to compare the forces component wise using the formula on the wiki page.

Files for Refit, Production, and Error Analysis:
https://merced-my.sharepoint.com/:u:/g/ ... A?e=jFk9NN

#2 Post by **max_liebetreu** » Wed Sep 10, 2025 8:36 am

Hello Audrey,

Welcome to the VASP Forum, and thank you for reaching out to us!
The shift you observed is indeed curious - with Machine Learning, a number of things may have gone wrong. Thank you for providing all the relevant files from your run, that will certainly help us narrow it down! I will confer with my ML colleagues and get back to you.

In the meantime, could you please tell us which version of VASP you are using?

Best regards,

#3 Post by **max_liebetreu** » Wed Sep 10, 2025 10:39 am

Hello Audrey,

After conferring with my ML colleagues, we suspect the issue might lie with the training - especially due to the low training set errors but high test set errors. Could you perhaps provide us with the files from your training run, minus vasprun.xml & ML_FFN to keep the folder size manageable?

What seems odd is the high diffusion of atoms in your system out of the simulation box. Looking at your runs (beginning and end of the run):

Bild.png

Bild (1).png

This, too, might point at a training issue.
One possible other thing to try is to reduce the MD timestep (POTIM) to 2.0 or even 1.0, and check if that changes anything.

The VASP version you are using might also still be relevant to know.

Looking forward to hearing back from you!

#4 Post by **andreas.singraber** » Wed Sep 10, 2025 1:40 pm

Hello Audrey,

after having a closer look into the data you provided we found that there is a significant difference in the energy range for the training and test set. The training set consists of roughly 2000 configurations mostly in the interval [-430, -390]eV. Only very few configurations were sampled with higher energies up to approx. -270eV. On the other hand in the test set you have mostly structures in the narrow regime of [-366, -358]eV, where there is almost no corresponding data in the training set. A histogram makes this much more clear:

histogram.png

So, for the given test set the force field is probably only extrapolating and thus gives very wrong results. At this point one would need to clarify which regime is actually interesting for your application. Two possibilities here:

The training data actually covers the application regime but the test set was accidentally created differently (e.g. different temperature). Then, you would just need to create a matching test set and rerun your comparison.
The training and test data regime (and maybe everything in between) should also be covered by the force field. In this case you would need to enhance the training data set to cover more of the desired areas. I suspect that in your initial training run you heated up the system very fast and thus sampled only very few configurations at lower temperatures. Maybe you want to repeat the heat-up phase at a lower pace to get more coverage.

As already mentioned it would be very helpful if you can provide also output of your training runs. Thank you!

All the best,
Andreas Singraber

audrey_thiessen · #5 Post by **audrey_thiessen** » Wed Sep 10, 2025 9:15 pm

Thank you for your quick and thoughtful responses,

I am using VASP 6.4.2

For training data, I started with a big temperature jump (2000 K) to melt the structure, and then did lots of ramping runs at different target temperatures to get a larger number of training structures in the ML_AB (2000 K - 1000 K and back up).

I am using larger value for POTIM (4.0) which is more of an artifact from running expensive AIMD calculations.

Here is my final training run:
https://merced-my.sharepoint.com/:u:/g/ ... A?e=g4GIAD

#6 Post by **max_liebetreu** » Thu Sep 11, 2025 11:15 am

Hello Audrey,

That sounds reasonable, but the final training run is insufficient to discern the issue. Could we ask you for more data from your training, if you have it, so that we can investigate the training and test sets in their full context?

All the best,
Max

audrey_thiessen · #7 Post by **audrey_thiessen** » Thu Sep 18, 2025 3:21 am

Hello,

I do not have all of my training data. However, I have be retraining with some of your suggestions (POTIM = 2.0) and higher temperatures. It appears that I have captured more of the energies that I am seeing in the test sets, however, the MLFF predicted energies are still quite off from the DFT energies when I have looked at test set error.

Unknown-8.png

Here is the new training set: https://merced-my.sharepoint.com/:u:/g/ ... Q?e=LguojY

(The training set is a bit small, but there is still the weird correlation going on)

#8 Post by **andreas.singraber** » Thu Sep 18, 2025 3:21 pm

Hello!

thanks for uploading a larger batch of your training runs! We finally (most likely) found the origin of the discrepancies between your DFT reference data and ML predictions. In your training runs you use Grimme DFT-D2 van der Waals corrections (INCAR tag IVDW = 10). These corrections are computed whenever an ab initio calculation is performed during training and you can find the corresponding energy contributions in sections like this in the OUTCAR file:

Code: Select all

  Number of pair interactions contributing to vdW energy: 2057320
  Edisp (eV):  -44.21481
    FORVDW:  cpu time      0.1362: real time      0.1367

The vdW-corrections are automatically added to energy, forces and stress before feeding the data into the machine learning routines. Hence, the ML force field trains on the potential energy landscape which includes the vdW-corrections. As a consequence, also the predictions automatically include them. However, we found that you seem to have forgotten them when you created your test set. In the directories in "Error Analysis/dfterror/run_data" (from your first upload) you performed individual ab initio calculations for the 50 test structures. Unlike in all training runs, the tag

Code: Select all

IVDW = 10

is missing from the INCAR files (the OUTCAR files summarize the INCAR contents). Therefore, there is now a mismatch between the settings for reference and predicted data:

ab initio reference data without vdW-corrections
ML predictions with vdW-corrections (implicitly added because enabled during training)

You will need to recompute the ab initio data for your test set. Since the shift of energies due to vdW-corrections is roughly -44 eV you should find that the reference energies will shift the points in your energy plot to the correct position:

energy-shift.png

We hope that this will fix the main issue for you!

While looking through your data we found two more things worth mentioning:

In your POSCARs you have defined an orthorhombic lattice. However, in your ICONST file you use a setup for fixing the shape of a cubic cell instead. This corresponds to item (3) in this list here while you should actually be using number (4). To be honest, we are not sure whether this would really cause problems (your lattice seems to change normally along the trajectory) but to avoid any issues please try to use the recipe for orthorhombic lattices instead. Or, alternatively, use a cubic lattice.
Your ML-prediction-only runs (ML_MODE = run, ML_ISTART = 2) were executed on 240 to 480 cores, just like the ab initio calculations. However, since the ML force field method is computationally much less demanding, this is a huge waste of CPU hours. Please perform a separate benchmark for the ML-only runs, starting from a single core (you may want to set ML_OUTBLOCK and ML_OUTPUT_MODE = 0). You will probably see no more decrease in execution time with more than 10 to 20 cores. Using even more cores will not speed up your simulations and will just increase the communication overhead and consume extra energy and CPU hours.

All the best,
Andreas Singraber

audrey_thiessen · #9 Post by **audrey_thiessen** » Thu Sep 18, 2025 4:27 pm

Thank you for all of your suggestions! I will keep in touch if I see anything else unexpected.

VASP Forum

MLFF Error Analysis: DFT and ML energies don't match

MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match

Re: MLFF Error Analysis: DFT and ML energies don't match