Hi,
I have trained a ML-FF and want to validate the force field with ab initio data. For that I ran a FF-only calculation, extracted structures and certain intervals, and ran a single ionic DFT step for these structures. For this I set IBRION = 2, so no MD. Is this correct, or does it even matter?
Now I want to compare the energies, forces, and stresses, but I am unsure which output I need to use for that.
For the energies, I look at the difference of E0 in the OSZICAR files, with the unit in eV. This error should be below 1 meV/atom.
For the forces, I look at the column below
Code: Select all
TOTAL-FORCE (eV/Angst) (ML)with the force in eV/Angstrom per atom and direction. If I understood correctly, the average of the difference between these force components of the ML-FF and DFT calculation should be below 30 meV/Angstrom. Is this correct?
For the stress, I am rather clueless. There is this line in the OUTCAR:
Code: Select all
ML FORCE on cell =-STRESS in cart. coord. units (eV/cell)
Direction XX YY ZZ XY YZ ZXbelow which the stress is given in two units. Is this what I need to compare? And what is a reasonable threshold that I should look out for? I could not find anything in the wiki or the forum.
Alternatively, there is this script in the tutorial and the wiki to do the heavy lifting, and I would love to use it:
Code: Select all
from py4vasp import MLFFErrorAnalysis
from py4vasp import plot
import numpy as np
# Compute the errors
mlff_error_analysis = MLFFErrorAnalysis.from_files(
dft_data="./test_set/DFTdata/*.h5",
mlff_data="./e01_error_analysis/MLFF_data/*.h5"
)
energy_error = mlff_error_analysis.get_energy_error_per_atom()
force_error = mlff_error_analysis.get_force_rmse()
stress_error = mlff_error_analysis.get_stress_rmse()
x = np.arange(len(energy_error))but the code produces an error:
Code: Select all
Traceback (most recent call last):
File "/mnt/c/Andreas/DFT/Water/Bulk/Validation/MLFF-validation.py", line 5, in <module>
mlff_error_analysis = MLFFErrorAnalysis.from_files(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 102, in from_files
set_appropriate_attrs(mlff_error_analysis)
File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 179, in set_appropriate_attrs
set_energies(cls)
File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 277, in set_energies
cls.mlff.energies = _dict_to_array(energies_data["mlff_data"], tag)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/andreas/miniconda3/lib/python3.12/site-packages/py4vasp/_analysis/mlff.py", line 282, in _dict_to_array
return np.array([_data[key] for _data in data])
~~~~~^^^^^
KeyError: 'free energy TOTEN'I have the latest version of py4vasp installed, and give the correct directories for the MLFF and DFT calculations. From the tutorial it seems that the ML-FF is calculated from previously calculated DFT simulations, so the other way round of what is described in the wiki. A single h5 file is given for every step in the tutorial, while I have all ML-FF steps in one h5 file (I have a corresponding DFT calculation for every frame that I output). Is this causing the issue? Is there a way to split the h5 file for each frame? If py4vasp needs the data separated like this it would contradict the workflow outlined in the wiki (https://www.vasp.at/wiki/index.php/Best ... est_errors), as one would have to run a ML-FF simulation, extract structures, and calculate both DFT and ML-FF again for these structures.
There is unfortunately no documentation on these functions, so I don't know what they are doing. This is maybe not the correct place for this kind of support, but I would be grateful for any information, how to get help.

