Validation of ML-FF

Message

akretschmer · #1 Post by **akretschmer** » Wed Jul 30, 2025 12:56 pm

Hi,

I have trained a ML-FF and want to validate the force field with ab initio data. For that I ran a FF-only calculation, extracted structures and certain intervals, and ran a single ionic DFT step for these structures. For this I set IBRION = 2, so no MD. Is this correct, or does it even matter?

Now I want to compare the energies, forces, and stresses, but I am unsure which output I need to use for that.

For the energies, I look at the difference of E0 in the OSZICAR files, with the unit in eV. This error should be below 1 meV/atom.
For the forces, I look at the column below

Code: Select all

TOTAL-FORCE (eV/Angst) (ML)

with the force in eV/Angstrom per atom and direction. If I understood correctly, the average of the difference between these force components of the ML-FF and DFT calculation should be below 30 meV/Angstrom. Is this correct?
For the stress, I am rather clueless. There is this line in the OUTCAR:

Code: Select all

ML FORCE on cell =-STRESS in cart. coord. units (eV/cell)
  Direction    XX          YY          ZZ          XY          YZ          ZX

below which the stress is given in two units. Is this what I need to compare? And what is a reasonable threshold that I should look out for? I could not find anything in the wiki or the forum.

Alternatively, there is this script in the tutorial and the wiki to do the heavy lifting, and I would love to use it:

Code: Select all

from py4vasp import MLFFErrorAnalysis
from py4vasp import plot
import numpy as np
# Compute the errors
mlff_error_analysis = MLFFErrorAnalysis.from_files(
    dft_data="./test_set/DFTdata/*.h5",
    mlff_data="./e01_error_analysis/MLFF_data/*.h5"
)
energy_error = mlff_error_analysis.get_energy_error_per_atom()
force_error = mlff_error_analysis.get_force_rmse()
stress_error = mlff_error_analysis.get_stress_rmse()
x = np.arange(len(energy_error))

but the code produces an error:

Code: Select all

Traceback (most recent call last):
  File "/mnt/c/Andreas/DFT/Water/Bulk/Validation/MLFF-validation.py", line 5, in <module>
    mlff_error_analysis = MLFFErrorAnalysis.from_files(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 102, in from_files
    set_appropriate_attrs(mlff_error_analysis)
  File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 179, in set_appropriate_attrs
    set_energies(cls)
  File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 277, in set_energies
    cls.mlff.energies = _dict_to_array(energies_data["mlff_data"], tag)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 282, in _dict_to_array
    return np.array([_data[key] for _data in data])
                     ~~~~~^^^^^
KeyError: 'free energy    TOTEN'

I have the latest version of py4vasp installed, and give the correct directories for the MLFF and DFT calculations. From the tutorial it seems that the ML-FF is calculated from previously calculated DFT simulations, so the other way round of what is described in the wiki. A single h5 file is given for every step in the tutorial, while I have all ML-FF steps in one h5 file (I have a corresponding DFT calculation for every frame that I output). Is this causing the issue? Is there a way to split the h5 file for each frame? If py4vasp needs the data separated like this it would contradict the workflow outlined in the wiki (https://www.vasp.at/wiki/index.php/Best ... est_errors), as one would have to run a ML-FF simulation, extract structures, and calculate both DFT and ML-FF again for these structures.

There is unfortunately no documentation on these functions, so I don't know what they are doing. This is maybe not the correct place for this kind of support, but I would be grateful for any information, how to get help.

#2 Post by **ahampel** » Thu Jul 31, 2025 12:29 pm

Hi,

thank you for reaching out to us on the forum. I will try to answer your points one by one.

For this I set IBRION = 2, so no MD. Is this correct, or does it even matter?

I think it is advisable to set IBRION=-1 to not do any ionic relaxation. Otherwise you will compare different structures.

ML-FF and DFT calculation should be below 30 meV/Angstrom. Is this correct?

correct, this is also my understanding as a general guideline from the wiki tutorials.

For the stress, I am rather clueless. There is this line in the OUTCAR:

You can take a look here: https://www.vasp.at/wiki/index.php/Volume_relaxation . The important line for the stress on the cell is given in the line starting with external pressure = in the OUTCAR file. Apart from problems with pulay stress (https://www.vasp.at/wiki/index.php/Pulay_stress) this number should go to 0 when optimizing cell volume / shape. The size of the cell will determine a bit what to look out for. But for a small cell the stress should be probably below 1kB .

Now regarding your py4vasp problem. You call this function for a specific h5 archive here:

Code: Select all

dft_data="./test_set/DFTdata/*.h5",
mlff_data="./e01_error_analysis/MLFF_data/*.h5"

And you replaced *.h5 by your h5 file path+name. Can you check this dft_data h5 file for the energy outputs? These should be stored here:

Code: Select all

h5ls -d vaspout.h5/intermediate/ion_dynamics/energies

The error you encounter points to py4vasp not being able to load this exact path in the h5 archive.

Best regards,
Alex

akretschmer · #3 Post by **akretschmer** » Thu Jul 31, 2025 12:50 pm

Thank you for the reply.

I also set NSW = 0, so there should be no ionic movement. Does IBRION then make a difference?

And I used the command that you gave me and this is the output:

Code: Select all

energies                 Dataset {1/Inf, 3}
    Data:
         -938.669943520003, -938.669943519971, -938.669943519987

I have 10 different files in the directory that all produce an output like this.

In the ML-FF h5 file I get this:

Code: Select all

energies                 Dataset {10/Inf, 7}
    Data:
         -938.908755518695, 7.14671228353675, 0.128296453960564, 288.623784630884, 0, 0, -931.633746781197,
         -939.640979899825, 7.22263611001397, 0.031240333789259, 287.7853962792, 0, 0, -932.387103456022,
         -938.46308591373, 7.43776275884493, 0.036215763577261, 296.517577535381, 0, 0, -930.989107391308,
         -940.086794108592, 6.9337533479122, 0.0857972983221489, 278.488912801127, 0, 0, -933.067243462357,
         -939.533055620857, 7.82520987444488, 0.026515010178255, 311.504031658389, 0, 0, -931.681330736234,
         -940.022794337502, 7.41257588819382, 0.028125062480558, 295.197345622576, 0, 0, -932.582093386828,
         -940.182576027461, 7.87696440571742, 0.0147927461934436, 313.092244801335, 0, 0, -932.29081887555,
         -939.322252142698, 7.43619414656062, 0.0506179636370815, 297.026728631663, 0, 0, -931.8354400325,
         -939.875894560062, 6.97715710770437, 0.0514550882598305, 278.848414606832, 0, 0, -932.847282364098,
         -939.788050157104, 7.3999273698299, 0.0243638876812949, 294.546318535697, 0, 0, -932.363758899592

So the data is there, is the problem maybe caused by having the DFT data in separate files? If yes, is it possible to stitch the h5 files together somehow?

#4 Post by **ahampel** » Thu Jul 31, 2025 1:33 pm

Yes NSW=0 should be the same result.

Okay interesting. And you are sure you passed the right path? Something like this works?

Code: Select all

calc = py4vasp.Calculation.from_file('path/to/vaspout.h5')

I think then I would have to try your set of input files myself. Could you bundle the input files such that I can try myself?

Best,
Alex

#5 Post by **ahampel** » Thu Jul 31, 2025 1:33 pm

Yes NSW=0 should be the same result.

Okay interesting. And you are sure you passed the right path? Something like this works?

Code: Select all

calc = py4vasp.Calculation.from_file('path/to/vaspout.h5')

I think then I would have to try your set of input files myself. Could you bundle the input files such that I can try myself?

Best,
Alex

akretschmer · #6 Post by **akretschmer** » Thu Jul 31, 2025 2:10 pm

Here it is. One folder contains the h5 for ML-FF, one the DFT files.

Validation.zip

#7 Post by **ahampel** » Fri Aug 01, 2025 9:06 am

I understand now what is going on. Sry my initial post was misleading. I am not so experienced with the MLFF part.

What the tutorial suggests is absolutely correct:

Code: Select all

# Compute the errors
mlff_error_analysis = MLFFErrorAnalysis.from_files(
    dft_data="./DFT/*.h5",
    mlff_data="./ML-FF/*.h5"
)

where the function explicitly accepts wildcards to find all h5 files in each folder. Note, that both folders should contain the same number of h5 files. For each structure you need to run once more a DFT and a MLFF calculation! Those will then be compared. So your initial question is valid and the answer is: yes you have to run MLFF again for each structure you want to use for the error analysis. Make sure to set IBRION=-1 for those calculations or NSW=0 . Unfortunately, we cannot automatically find the used images in DFT/*.h5 in the full MD run of the MLFF run, hence you have to run both manually again. This is maybe not very clear in the tutorial. I will change the text to make this more clearly. Additionally I agree that the error message of py4vasp is misleading. What should happen is that py4vasp checks if the DFT and MLFF provided h5 files match (count and structure) before processing any further. Then this would have been a much clearer error message. We will add such check in the next py4vasp version.

Best regards,
Alex

VASP Forum

Validation of ML-FF

Validation of ML-FF

Re: Validation of ML-FF

Re: Validation of ML-FF

Re: Validation of ML-FF

Re: Validation of ML-FF

Re: Validation of ML-FF

Re: Validation of ML-FF