MLFF volume explosion during production run

Message

henry_ezeaku · #1 Post by **henry_ezeaku** » Mon Jun 29, 2026 3:33 am

Hi Vasp team, good day!
I hope you are doing well.

I trained an MLFF for the eutectic NaCl-MgCl2 salt using the NPT approach from 750 K to 1150 K for 100 ps. My production temperature is 900 K. This is the content of my INCAR file:
# ---------- Electronic / accuracy ----------
ALGO = Normal # SCF algorithm (robust choice for MD runs)
PREC = Accurate # Precision; 'Accurate' improves force quality (important for ML training)
NELM = 100 # Max number of electronic SCF steps per ionic move
ISMEAR = 0 # Gaussian smearing (good for insulators / ionic liquids)
SIGMA = 0.1 # Smearing width in eV
ENCUT = 550 # Plane-wave energy cutoff (basis set size)
EDIFF = 1E-05 # Electronic SCF convergence criterion (tighter = more accurate forces)
LASPH = .TRUE. # Include non-spherical corrections inside PAW spheres (improves accuracy)
LREAL = Auto # Projection operators evaluated in real space automatically (faster)
LWAVE = .FALSE. # Don't write WAVECAR (saves disk during MD/ML runs)
LCHARG = .FALSE. # Don't write CHGCAR (saves disk during MD/ML runs)

# ---------- MD (NVT) ----------
IBRION = 0 # Molecular dynamics mode (no ionic relaxation)
ISIF = 3 # Change positions AND cell shape/volume dynamically (True NpT)
ISYM = 0 # Turn symmetry off (important for MD + MLFF)
ISPIN = 1 # Non-spin-polarized run

MDALGO = 3 # Langevin thermostat for NPT sampling
LANGEVIN_GAMMA = 10 10 10 # Thermal friction per species (ps^-1)
LANGEVIN_GAMMA_L = 5 # Lattice friction coefficient (ps^-1) for barostat
PMASS = 1000 # Fictitious lattice mass (amu) - paper used 1000
PSTRESS = 0.001 # Target pressure in kbar (0.001 kbar = 1 bar)

TEBEG = 750 # Starting temperature (K)
TEEND = 1150 # Target temperature at end of run (linear ramp if not equal to TEBEG)
NSW = 100000 # Number of MD steps
POTIM = 1.0 # Time step (in fs)

NCORE = 2 # Parallelization control

# ---------- (optional) Dispersion corrections ----------
GGA = ML # vdW-DF2 functional base (when using ML-type GGA = "ML")
AGGAC = 0.0 # Exchange parameter for vdW-DF2
LUSE_VDW = .TRUE. # Enable nonlocal vdW-DF2 correction
ZAB_VDW = -1.8867 # Kernel parameter (specific to vdW functional)

# ---------- MLFF: enable on-the-fly training ----------
ML_LMLFF = .TRUE. # Switch on the MLFF framework
ML_MODE = train # Training mode: ab-initio + ML hybrid MD

Now, after training with a simulation cell of 58 atoms, and doing the refitting, I do get good training errors:
# ERR nstep rmse_energy rmse_force rmse_stress
# ERR 2 3 4 5
# ERR ######################################################################
ERR 0 1.09769321E-03 2.83920615E-02 2.34328322E-01

Also, I do get good test set errors too.

Now, when I scale my system from 58 atoms to 1210 atoms and then run a prediction-only NPT simulation at 900 K (for 400 ps) to obtain the system's equilibrium volume, the volume keeps increasing and doesn't seem to converge (goes up to 200000 Å^3). The temperature and energy do converge, and the total pressure of the system fluctuates around zero.

Please, what might be causing this volume behavior? Looking forward to your response. Thanks!

#2 Post by **michael_wolloch** » Wed Jul 01, 2026 7:59 am

Dear Henry,

I guess your salt system is liquid? NPT is very tricky for liquids, since the cell can deform a lot.

If your system is indeed liquid, I would suggest constraining it. You can do this either by using LATTICE_CONSTRAINTS to freeze two lattice vectors and let the volume change by letting the third one move freely. Or, maybe more elegant, but slightly more work, use an ICONST file and constrain the angles of your cell, so that any volume change is isotropic.

Doing this, the simulations of liquids are usually much better controlled.

If this does not help, or you don't even have a liquid, please post your input and relevant output files, so the forum can help you more efficiently.

Cheers, Michael

henry_ezeaku · #3 Post by **henry_ezeaku** » Thu Jul 02, 2026 8:32 pm

Hi Michael, good day.
Thank you for responding.

Yes, my system is liquid. Also, I used the ICONST file for the NPT training run and the production run, respectively. The volume is stable during the NPT training. The explosion only happens when I scale my system size (from 58 atoms used for training to ~1000 atoms), and try running prediction-only mode using NPT. Interestingly, if I generate a different 58 atoms structure, and use the trained MLFF to run same NPT run with it, my volume do converge; however, when I scale my system size, the volume just explodes (continuously increasing).

I attached the volume plot - showing the increase in volume during the NPT run at the target temperature (900 K).

Also, I attached the training (Training.tar.gz) and production run (Production_1000_atoms.tar.gz) files, respectively. Thank you so much.

#4 Post by **michael_wolloch** » Fri Jul 03, 2026 9:42 am

Dear Henry,

Thanks for sharing the files. I will look at them more closely today and also speak with our MLFF experts if necessary.
But right from the start, I see a problem with your 1000-atom cell: The distances between atoms seem fine, but the bounding box is wrong, and you don't get the correct periodic boundary conditions. Instead, there is vacuum between your replicas:

POSCAR_1000_2x2x1.png

For the training system, the periodic boundary conditions are correct:

POSCAR_57_2x2x1.png

Please correct the larger system and make sure the periodic boundary conditions are correct, and report back if that fixed your issue.

Cheers, Michael

#5 Post by **michael_wolloch** » Fri Jul 03, 2026 1:56 pm

Dear Henry,

I talked to one of our ML experts, and he has additional suggestions for checks once you have fixed your production structure.

Run a short MD with the MLFF and set ML_IERR=1 (or, if you change ML_OUTBLOCK, set ML_IERR=ML_OUTBLOCK).
This will print the spilling factor. Ideally, it will be around 10E-3 to 10E-2, but it can be fine if it spikes from time to time. If it is larger than around 0.3 (other than a quick spike) in your simulation, the force field you trained is not good. It might be that it is not transferable to the larger system. Since your compounds are quite polar, it is possible that the small cell you are training on is not able to capture longer-range dipole-dipole interactions that might be important in the larger system. Or something else goes wrong.

So, to make it short:

Fix your production simulation cell and ensure periodic boundary conditions.
Set ML_IERR and monitor the spilling factor in a short MD.
Run your production MD if the spilling factor is OK.
Report back here with your findings.

Cheers, Michael

henry_ezeaku · #6 Post by **henry_ezeaku** » Fri Jul 03, 2026 2:47 pm

Hi Michael,
Thanks for the insightful suggestions.

I will try them out and report back to you. Thanks.

- Henry

VASP Forum

MLFF volume explosion during production run

MLFF volume explosion during production run

Re: MLFF volume explosion during production run

Re: MLFF volume explosion during production run

Re: MLFF volume explosion during production run

Re: MLFF volume explosion during production run

Re: MLFF volume explosion during production run