Getting a robust ML_FF with (hollow) perovskites

Message

max_senno · #1 Post by **max_senno** » Fri Nov 29, 2024 12:56 pm

Dear VASP developers,

I am trying to generate two FF using machine learning for both a "normal" hybrid organic-inorganic perovskite (CH3NH3PbI3 or MAPI) and a "hollow" perovskite structure, replacing two methylammonium cations with one ethylenediammonium (EDA), and creating a PbI2 vacancy (that's why they are called "hollow perovskites").

To do that, my starting point was this article by the VASP developers (https://doi.org/10.1103/PhysRevLett.122.225701), so I am working with a 2x2x2 supercell (~100 atoms), aiming to do the simulation in a NVT ensemble. First of all, I tried to reproduce the MAPI calculation reported in the aforementioned article, and I got a fairly robust FF that I can use. Still there are a couple things that I need to adjust, but the production run goes well.

However, when I tried to make a FF for the "hollow" perovskite, I encountered many issues. Based on trial and error and reading the guidelines you wrote, I could create a FF that worked well just once (I mean, the structure didn't break) after 400k steps of production run, in a 2x2x2 supercell. Every other attempt I made (for example, changing the lattice parameter, running on a bigger supercell as 4x4x4) failed. In particular I have found that during the production runs, two methylammonium cations collide, i.e. come closer to each other, and the H atoms fly away.

As I am only interested in the structure at room temperature, I have only trained the system at 300 K. I am currently working with a cubic structure as the literature says that the addition of EDAI forces the perovskite to adopt a cubic phase at room temperature.

This is what I have tried so far, with some improvements in the final production run:
1) I treated separately the C,N and H atoms from EDA and MA, since I read in this forum that it helps the algorithm to train better the FF (see the POSCAR file).
2) I refitted the ML_FF after the training. This reduced the time during the production-run (as it activated the fast-prediction mode), and allowed me to have at least one successful production run.
3) Increased the training time, to capture more configurations (up to 33k steps, POTIM=3, ~100 ps as is reported for perovskite structures in the Supporting Information of the article I mentioned before).
4) Increased the H mass up to 8 times.
5) Reduced the timestep (POTIM) value up to 0.5, because it is suggested (Best practices..) to use a POTIM value above 0.7 when the system has H atoms.
I have my doubts about the last two points, because I think I have to choose either work with the augmented H mass and higher POTIM values, or to use a lower H mass and shorter timesteps.

Now I'm training a new FF with these corrections:
6) Heating the structure from 50 K to 400 K (30% above the desired working temperature), as is suggested in the "Best practices for ML" page.
7) I further increased the NSW parameter (up to 100k steps, POTIM=1) to include more configurations.
8) I started using the Andersen Thermostat (ISIF=2, MDALGO=3), as suggested in the "Best practices for ML".
9) I also changed the lattice parameter of the structure, based on new data found in the literature.
10) I also realized that I've been running the training step and the production run with ISIF=3, but using the Nosé-Hoover thermostat, which is only available on a NVT ensemble. Maybe this is the main problem, so I corrected this in the new training step.

I checked the errors, but I don't see anything unusual, as they are in the same magnitude order to those reported in the article about ML I mentioned before. But I am no expert in this kind of calculations. I am pretty sure that I am making mistakes, so I would like to kindly ask you for suggestions on how to improve the training procedure. From your experience, what should I check/correct before taking further steps?

In the attached zip file I included the INCAR from the previous training (INCAR_1) and the INCAR file from the current training (INCAR_2). I also included the KPOINTS, POTCAR and POSCAR files, as the ML_LOGFILE from the previous run (ML_LOGFILE_1), and the current ML_LOGFILE so far (ML_LOGFILE_2). I also include the "errors in force" plot during the training production in a png image. If I need to submit any other file, please let me know.

Thanks in advance,
M.S.

#2 Post by **ferenc_karsai** » Mon Dec 02, 2024 7:46 am

Can you please refit (ML_MODE=refit) with ML_DESC_TYPE=1 (https://www.vasp.at/wiki/index.php/ML_DESC_TYPE). That is the reduced-descriptor, it should be 5-20 percent less accurate and 3-4 times faster in MAPbI3 solid. But most importantly what we sometimes saw is that it can be more stable on long trajectories than the "normal" descriptor with default paramters.

So please check the stability after refitting the force field with this descriptor.

#3 Post by **ferenc_karsai** » Mon Dec 02, 2024 8:14 am

You got some important points in the list of things that you will try now. Possibly the most important was to increase the temperature range to train on. If you run at 300K and only train up to 300K that is usually not leading to a stable force field.
You mentioned that you started using the Andersen thermostat (point 8 ) from our ML best practice site. I think you misread that because we never suggested to use that.
We would suggest to train in NpT mode with a Langevin thermostat (ISIF=3, MDALGO=3, LANGEVIN_GAMMA=5 and LANGEVIN_GAMMA_L=5). That would probably solve a couple of issues as you also may have realized. Also the volume fluctuations increase the stability of the force field. If you run NpT simulations with on-the-fly training always increase ENCUT by at least 30%. This is needed because if you have volume expansion during your NpT simulation the basis might be too small for the DFT calculations with increased lattice.

Also very important don't run production calculations on volumes that you have not trained on. This is equivalent to having a given pressure and the different pressures can even effect your configurations even stronger than temperature.

max_senno · #4 Post by **max_senno** » Mon Dec 02, 2024 9:40 am

Dear Dr. Ferenc Karsai:

Thank you very much for your kind and detailed reply. As you mention, you're right, I wrote it wrong, it was the Langevin thermostat (at least I wrote MDALGO=3 right). Sorry for that.
I am refitting the FF using the reduced descriptor right now, and after checking the stability I'll get back to you. Then I will launch a MLFF training with the suggestions you have made.

My Community

Getting a robust ML_FF with (hollow) perovskites

Getting a robust ML_FF with (hollow) perovskites

Re: Getting a robust ML_FF with (hollow) perovskites

Re: Getting a robust ML_FF with (hollow) perovskites

Re: Getting a robust ML_FF with (hollow) perovskites