Page 1 of 1

ML-FF refit with all basis functions

Posted: Mon Apr 15, 2024 10:38 am
by julien_steffen
Dear VASP community,

we are working on heuristic methods to select basis functions (local reference configurations) from a given ML_AB file. The motivation behind this is that we are training liquid metal alloy interface systems that require a very large number of configurations (ML_MCONF) (typically more than 6000) to result in stable ML-FFs. Due to memory issues during the learning, we are restricting the number of basis functions during the learning (ML_MB) to rather small values like 3000 or 4000. After the learning, we then want to increase the number to values as 8000 or 10000. For this, we tried ML_MODE = select calculations, but they took much too long to be completed within the cluster walltime limit of one day (and cannot be restarted). Further, we want to combine different training sets from different ML_AB files (e.g., from different phases of the system) to one large ML_AB file.
We therefore built a program that selects a desired number of basis functions from the ML_AB file and writes a new ML_AB file containing the longer list of basis functions, based on the objective to include a diverse as possible set of local reference environments into the basis function list.
To obtain a ML_FF file (preferentially with the fast FF mode), however, we still need to do then a ML_MODE = refit calculation, which indeed works.
We there noted, however, that this calculation does not only generates a ML_FF, but also significantly shortens the given list of basis functions listed in the ML_ABN file (often by 20-30%).
Is there a way to avoid this removal of basis functions from the given ML_AB file during the ML_MODE = refit calculation? Since many of our samplings still tend to be unstable, we wanted to benchmark some different selection methods, in the hope that it might be possible with some of them to increase the stability at the cost of slightly more expensive calculations due to larger number of basis functions (for example, by including all atoms with large gradient norms from the ML_AB file into them). This of course cannot be done if the list of basis functions given by us is shortened significantly before the actual ML_FF is generated.

Best wishes,
Julien

Re: ML-FF refit with all basis functions

Posted: Tue Apr 16, 2024 8:02 am
by michael_wolloch
Dear Julien,

we suspect that the reference configurations you select in your script are not as diverse as you hope. Thus the refitting removes 20-30% of your basis functions. You can try to lower ML_EPS_LOW to keep more reference configurations when refitting, but it is somewhat doubtful if the resulting force field will be superior. Don't go lower than 1E-14 however, since you run into numerical problems otherwise.

In principle, you should use ML_MODE = select, as you tried to do, for such a problem. Set ML_CDOUB to 100 or even 1000 to avoid frequent refitting. This should speed up your calculation by a lot. You can also try to slowly increase ML_MCONF_NEW from the default of 5 for even more speedup, but if you have many atoms for a specific species in your configurations, this gets very memory-intensive very quickly. Maybe you can also contact your cluster administrators and ask about the possibility of running longer jobs.

Please try this method, and, if you are still unsuccessful, provide a complete list of inputs and outputs so we can try and figure out what is going on in more detail.

Cheers, Michael

(Note: This answer was updated after a discussion with @ferenc.karsai)

Re: ML-FF refit with all basis functions

Posted: Tue May 07, 2024 8:37 am
by julien_steffen
Dear Michael,

thank you very much for the answer, that helped a lot! Now the refit calculations are indeed much faster and almost all basis functions are kept, if desired.

Best wishes,
Julien

Re: ML-FF refit with all basis functions

Posted: Tue May 07, 2024 9:49 am
by michael_wolloch
Dear Julien,

I am pleased to hear this, but all praise belongs to Ference Karsai, one of our ML experts.

If you have no further questions on this topic, please tell me, so I can lock this thread.
Cheers, Michael