ML on system with H2O
Moderators: Global Moderator, Moderator
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: ML on system with H2O
Ok so I tested now and many things mainly from the DFT/MD settings were not ok.
There are some input parameters that need to be improved:
1) Do not run machine learning with ISIF=0. We do a multivariate fitting of energy, forces and stress. For ISIF=0 the stress is not calculated and the fitting won't work. This prevented the proper learning of the MLFF calculation. I will implement an automatic stop in the next release that prevents people using the machine learning with ISIF=0.
2) Also consider to run something like the D3 method for van der Waals (IVDW=11) and GGA=RP. The dDSC method (IVDW=4) that you set is not guaranteed to have the proper forces for machine learning. Also the D3 method is most reliable working with GGA=RP.
3) Increase the mass of hydrogen in the POTCAR file to 8 instead of 1. Then you can set POTIM=1.5. Otherwise the hydrogens will move too fast and uncontrolled.
4) Use ENCUT=700 to minimize the error for the stress tensor. This matters especially for single molecules, but in our case we have very few molecules so it could possibly matter.
5) Use LASPH = True.
I hope everything will run fine for you now. After this you can also try the graphene with H2O.
We are also working on a how to for proper training in machine learning.
There are some input parameters that need to be improved:
1) Do not run machine learning with ISIF=0. We do a multivariate fitting of energy, forces and stress. For ISIF=0 the stress is not calculated and the fitting won't work. This prevented the proper learning of the MLFF calculation. I will implement an automatic stop in the next release that prevents people using the machine learning with ISIF=0.
2) Also consider to run something like the D3 method for van der Waals (IVDW=11) and GGA=RP. The dDSC method (IVDW=4) that you set is not guaranteed to have the proper forces for machine learning. Also the D3 method is most reliable working with GGA=RP.
3) Increase the mass of hydrogen in the POTCAR file to 8 instead of 1. Then you can set POTIM=1.5. Otherwise the hydrogens will move too fast and uncontrolled.
4) Use ENCUT=700 to minimize the error for the stress tensor. This matters especially for single molecules, but in our case we have very few molecules so it could possibly matter.
5) Use LASPH = True.
I hope everything will run fine for you now. After this you can also try the graphene with H2O.
We are also working on a how to for proper training in machine learning.
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Re: ML on system with H2O
Dear Ferenc,
Thank you for your advice. I have already started a new training run. I am curious about the logic behind some of your recommendations. Most of the ideas, ENCUT=700 and ISIF=3, in retrospect are more or less obvious, but I am curious about the others. For instance, changing the mass of H in the POTCAR file to 8. Does this affect the accuracy of the calculated trajectories significantly? Also why is GGA=RP a better choice than PE (or some other functional) and in what way are the forces better for IVDW=11. I don't doubt you are correct, I just wish to have some insight into the choices so I can be proactive in choosing the right settings for future simulations. For instance in the past for relaxation calculations on a 2D chalcogenide system, I did a systematic comparison of lattice constants after relaxation after trying all of IVDW options (in vasp 5.4.3) and IVDW=4 gave better results. I suspect that the details of the vdW force algorithm are relevant here so a few words on the reasoning would be helpful for doing the right thing in the future.
It might also be a good idea to add the ISIF=3 to the Vasp wiki on machine learning (for instance why MAXMIX should not be set is explained very well).
Thanks a lot for your help!
Thank you for your advice. I have already started a new training run. I am curious about the logic behind some of your recommendations. Most of the ideas, ENCUT=700 and ISIF=3, in retrospect are more or less obvious, but I am curious about the others. For instance, changing the mass of H in the POTCAR file to 8. Does this affect the accuracy of the calculated trajectories significantly? Also why is GGA=RP a better choice than PE (or some other functional) and in what way are the forces better for IVDW=11. I don't doubt you are correct, I just wish to have some insight into the choices so I can be proactive in choosing the right settings for future simulations. For instance in the past for relaxation calculations on a 2D chalcogenide system, I did a systematic comparison of lattice constants after relaxation after trying all of IVDW options (in vasp 5.4.3) and IVDW=4 gave better results. I suspect that the details of the vdW force algorithm are relevant here so a few words on the reasoning would be helpful for doing the right thing in the future.
It might also be a good idea to add the ISIF=3 to the Vasp wiki on machine learning (for instance why MAXMIX should not be set is explained very well).
Thanks a lot for your help!
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: ML on system with H2O
Ok next thing you have to be very careful:
If you run with ISIF=3 then you should constrain your lattice. In liquids I have almost always observed that your cell is going to be deformed monoclinically until it becomes like a thin rod. At that point your cell is irreversibly destroyed. In our case we have in principle a gas, but I fear the same would happen.
You can constrain the angles and lattice constant ratios by using the 3rd ICONST file from here:
https://www.vasp.at/wiki/index.php/ICONST
Alternatively in your case since you are not using a cubic cell you could use ISIF=2. In that case no volume changes are allowed.
We would also strongly recommend to use a Langevin thermostat, MDALGO=3. For ISIF=3 (NpT) ensemble that's anyway the only available thermostat.
Due to the small mass of hydrogen, very small time steps need to be used (<1fs), otherwise the simulations become unstable. An alternative to is to use a larger mass for hydrogen. This way still a larger timestep can be used. This is useful for on-the-fly learning where we want to collect snapshots on a larger trajectory as fast as possible. If you need strict trajectories (depending on the observables you need) switch back and use small time steps.
We advise you to use RP+D3 for water, since it is known from literature that RP is good for water and we have also used this combination in our thermodynamic integration paper with machine learning for water and it has worked fine.
If you run with ISIF=3 then you should constrain your lattice. In liquids I have almost always observed that your cell is going to be deformed monoclinically until it becomes like a thin rod. At that point your cell is irreversibly destroyed. In our case we have in principle a gas, but I fear the same would happen.
You can constrain the angles and lattice constant ratios by using the 3rd ICONST file from here:
https://www.vasp.at/wiki/index.php/ICONST
Alternatively in your case since you are not using a cubic cell you could use ISIF=2. In that case no volume changes are allowed.
We would also strongly recommend to use a Langevin thermostat, MDALGO=3. For ISIF=3 (NpT) ensemble that's anyway the only available thermostat.
Due to the small mass of hydrogen, very small time steps need to be used (<1fs), otherwise the simulations become unstable. An alternative to is to use a larger mass for hydrogen. This way still a larger timestep can be used. This is useful for on-the-fly learning where we want to collect snapshots on a larger trajectory as fast as possible. If you need strict trajectories (depending on the observables you need) switch back and use small time steps.
We advise you to use RP+D3 for water, since it is known from literature that RP is good for water and we have also used this combination in our thermodynamic integration paper with machine learning for water and it has worked fine.
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Problems with machine learning and H2O
I tried training a ML FF for water using Vasp 6.3.1. As suggested I have modified the mass of the Hydrogen atom to be 8 amu, ISIF=3, with cell constraints in ICONST. Since the H-O molecule has rather short bonds I used a value for ML_SION1 and ML_SION2 of 0.30. The temperature is set to ramp from 200 to 500 K over 5000 steps with an integration step of 1.5 fs. I have two questions. The ML run terminated due to insufficient storage space as ML_MB = 5000 so only 300 steps or so were executed. The temperature of the run exceeded 600K on the final 308 step. Continuing the run using the ML_ABN and ML_FFN files with a larger ML_MB setting of 10,000 resulted in the run hanging with the value of the temperature shooting up and eventually overflowing the temperature field. I have seen the same behavior in various similar scenarios, namely (for H2O) the temperature becomes unstable and shoots up until it overflows the field in OSZICAR and the run subsequently hangs. I am not sure of what to try next.
I am also curious to understand what is a reasonable goal for the size of the training, namely the number of configurations and training structures. I don't understand completely the details of the ML_LOGFILE, but I can grep LCONF in it. The last few lines of the grep are below for the 308 step run. I interpret this as there being slightly less than 5000 reference configurations and a total of 65 training structures (grep -c LCONF ML_LOGFILE gives the number 65). What is a desirable number of configuration and training structures. I have already started a second run with 300K as the temperature and a fixed (ICONST) cell to avoid the problem with diverging temperature. My plans are if this run complete with a couple of hundred training structures and 5000-8000 reference configurations, I will then introduce a cell with a graphene layer in it in addition to water molecules (again with a fixed cell) and add the additional interactions to the force field. Does this sound reasonable? Are there any other issues or diagnostics I should be concerned with in monitoring the progress of the ML FF training?
The input files for the above results are attached below.
I am also curious to understand what is a reasonable goal for the size of the training, namely the number of configurations and training structures. I don't understand completely the details of the ML_LOGFILE, but I can grep LCONF in it. The last few lines of the grep are below for the 308 step run. I interpret this as there being slightly less than 5000 reference configurations and a total of 65 training structures (grep -c LCONF ML_LOGFILE gives the number 65). What is a desirable number of configuration and training structures. I have already started a second run with 300K as the temperature and a fixed (ICONST) cell to avoid the problem with diverging temperature. My plans are if this run complete with a couple of hundred training structures and 5000-8000 reference configurations, I will then introduce a cell with a graphene layer in it in addition to water molecules (again with a fixed cell) and add the additional interactions to the force field. Does this sound reasonable? Are there any other issues or diagnostics I should be concerned with in monitoring the progress of the ML FF training?
Code: Select all
LCONF 275 H 4700 4739 O 2382 2403
LCONF 285 H 4739 4784 O 2402 2426
LCONF 292 H 4783 4832 O 2425 2451
LCONF 300 H 4832 4881 O 2450 2476
LCONF 307 H 4880 4925 O 2476 2500
LCONF 314 H 4924 4971 O 2498 2522
LCONF 319 H 4971 5012 O 2520 2544
You do not have the required permissions to view the files attached to this post.
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Error: RATTLE_vel algorithm did not converge! err= NaN
I tried a second (new) run with both TEBEG and TEEND set to 300K, ISIF=3 to compute the stress tensor while using a Langevin thermostat with MDALGO=3. I have used ICONST to fix the cell dimensions and a time step of 1.5 fs after changing the mass of H to 8 amu in the POTCAR file. I have attached the INCAR file and other associated files for reference. During the run the temperature more or less monotonically increased from 300K to about 1600K upon which the run terminated due to the RATTLE_vel algorithm not converging. The settings for this run were from what I gather exactly what you (Ferenc) suggested. Can you offer some insight as to what should be changed?
I fixed the volume and shape of the simulation cell using the ICONST file below. I noticed upon a careful rereading of your earlier message that you suggested using the third example from the ICONST entry in the vasp wiki. Unless I am mistaken, this allowed for variable volume, but fixed the cell shape. I don't understand why this is a better option than fixing the cell volume (but I am doing another run using this option to check). Can you elaborate on the logic behind the variable cell volume/ fixed shape option?
I know I am getting ahead of myself, but I am still optimistic that the training errors will be resolved and am would like to set up a plan for carrying out training on the H2O/graphene system.
The initial suggestion was to train with H2O and then after an initial training session, introduce a graphene sheet with the water molecules and train the system for interactions with C. If the problems can be solved, I am curious with how to decide training is sufficient. You stated earlier "Please expect around 1000-2000 training structures (ML_MCONF) and several thousand local reference configurations (ML_MB)."
From the (very helpful) comments in ML_LOGFILE it would seem that grepping the tag SPRSC reporting on sparsification offers these values in the form of "nstr_spar ... Number of reference structures after sparsification" and "nlrc_spar ... Number of local reference configurations after sparsification for this element". Is this correct. If so I assume I should be looking for 1000-2000 values of nstr_spar and several thousand local reference configurations via the quantity nlrc_spar . Is this correct. If all goes well and I proceed to training graphene and H2O together, what should the values of these quantities be for a reasonable training set?
# SPRSC nstr_prev ... Number of reference structures before sparsification
# SPRSC nstr_spar ... Number of reference structures after sparsification
# SPRSC el .......... Element symbol
# SPRSC nlrc_prev ... Number of local reference configurations before sparsification for this element
# SPRSC nlrc_spar ... Number of local reference configurations after sparsification for this element
I fixed the volume and shape of the simulation cell using the ICONST file below. I noticed upon a careful rereading of your earlier message that you suggested using the third example from the ICONST entry in the vasp wiki. Unless I am mistaken, this allowed for variable volume, but fixed the cell shape. I don't understand why this is a better option than fixing the cell volume (but I am doing another run using this option to check). Can you elaborate on the logic behind the variable cell volume/ fixed shape option?
Code: Select all
LR 1 0
LR 2 0
LR 3 0
LA 1 2 0
LA 1 3 0
LA 2 3 0
The initial suggestion was to train with H2O and then after an initial training session, introduce a graphene sheet with the water molecules and train the system for interactions with C. If the problems can be solved, I am curious with how to decide training is sufficient. You stated earlier "Please expect around 1000-2000 training structures (ML_MCONF) and several thousand local reference configurations (ML_MB)."
From the (very helpful) comments in ML_LOGFILE it would seem that grepping the tag SPRSC reporting on sparsification offers these values in the form of "nstr_spar ... Number of reference structures after sparsification" and "nlrc_spar ... Number of local reference configurations after sparsification for this element". Is this correct. If so I assume I should be looking for 1000-2000 values of nstr_spar and several thousand local reference configurations via the quantity nlrc_spar . Is this correct. If all goes well and I proceed to training graphene and H2O together, what should the values of these quantities be for a reasonable training set?
# SPRSC nstr_prev ... Number of reference structures before sparsification
# SPRSC nstr_spar ... Number of reference structures after sparsification
# SPRSC el .......... Element symbol
# SPRSC nlrc_prev ... Number of local reference configurations before sparsification for this element
# SPRSC nlrc_spar ... Number of local reference configurations after sparsification for this element
Code: Select all
240 T= 1581. E= -.43275244E+03 F= -.45359345E+03 E0= -.45357421E+03 EK= 0.20841E+02 SP= 0.00E+00 SK= 0.00E+00
bond charge predicted
N E dE d eps ncg rms rms(c)
DAV: 1 -0.451863271227E+03 0.55226E+00 -0.29361E+02 376 0.394E+01 0.164E+01
RMM: 2 -0.459586864411E+03 -0.77236E+01 -0.36956E+01 371 0.697E+00 0.180E+01
RMM: 3 -0.458468649332E+03 0.11182E+01 -0.58793E+00 371 0.274E+00 0.213E+01
RMM: 4 -0.453537754950E+03 0.49309E+01 -0.36499E+00 389 0.163E+00 0.104E+01
RMM: 5 -0.453696974782E+03 -0.15922E+00 -0.63704E-01 313 0.714E-01 0.955E+00
RMM: 6 -0.452580704164E+03 0.11163E+01 -0.60897E-01 282 0.580E-01 0.138E+01
RMM: 7 -0.452022759660E+03 0.55794E+00 -0.18440E+00 337 0.971E-01 0.758E+00
RMM: 8 -0.451830711252E+03 0.19205E+00 -0.18537E-01 284 0.426E-01 0.870E+00
RMM: 9 -0.451763943923E+03 0.66767E-01 -0.21265E-02 247 0.218E-01 0.712E+00
RMM: 10 -0.452053172556E+03 -0.28923E+00 -0.72738E-02 289 0.334E-01 0.726E+00
RMM: 11 -0.452092287215E+03 -0.39115E-01 -0.19354E-01 316 0.464E-01 0.138E+01
RMM: 12 -0.451989254863E+03 0.10303E+00 -0.73097E-03 249 0.220E-01 0.136E+01
RMM: 13 -0.451720915878E+03 0.26834E+00 -0.78246E-03 225 0.103E-01 0.136E+01
RMM: 14 -0.451773841071E+03 -0.52925E-01 -0.30987E-04 147 0.408E-02 0.137E+01
RMM: 15 -0.451434606854E+03 0.33923E+00 -0.12759E-02 189 0.895E-02 0.473E+00
RMM: 16 -0.451576414548E+03 -0.14181E+00 -0.15747E-02 259 0.225E-01 0.684E+00
RMM: 17 -0.451634495264E+03 -0.58081E-01 -0.19473E-02 243 0.183E-01 0.680E+00
RMM: 18 -0.451533125181E+03 0.10137E+00 -0.16496E-02 217 0.130E-01 0.636E+00
RMM: 19 -0.451487383314E+03 0.45742E-01 -0.52534E-03 202 0.952E-02 0.522E+00
RMM: 20 -0.451445379399E+03 0.42004E-01 -0.35656E-03 205 0.941E-02 0.147E+00
RMM: 21 -0.451479791045E+03 -0.34412E-01 -0.28801E-03 177 0.605E-02 0.534E+00
RMM: 22 -0.451476075435E+03 0.37156E-02 -0.10349E-03 158 0.580E-02
241 T= 1556. E= -.43213539E+03 F= -.45265027E+03 E0= -.45262914E+03 EK= 0.20515E+02 SP= 0.00E+00 SK= 0.00E+00
242 T= 1590. E= -.44109121E+03 F= -.46205846E+03 E0= -.46205846E+03 EK= 0.20967E+02 SP= 0.00E+00 SK= 0.00E+00
243 T= 1559. E= -.44075037E+03 F= -.46130621E+03 E0= -.46130621E+03 EK= 0.20556E+02 SP= 0.00E+00 SK= 0.00E+00
244 T= 1546. E= -.44065648E+03 F= -.46103677E+03 E0= -.46103677E+03 EK= 0.20380E+02 SP= 0.00E+00 SK= 0.00E+00
245 T= 1625. E= -.44100014E+03 F= -.46242998E+03 E0= -.46242998E+03 EK= 0.21430E+02 SP= 0.00E+00 SK= 0.00E+00
246 T= 1594. E= -.44072307E+03 F= -.46173465E+03 E0= -.46173465E+03 EK= 0.21012E+02 SP= 0.00E+00 SK= 0.00E+00
Error: RATTLE_vel algorithm did not converge! err= NaN
-----------------------------------------------------------------------------
| |
| EEEEEEE RRRRRR RRRRRR OOOOOOO RRRRRR ### ### ### |
| E R R R R O O R R ### ### ### |
| E R R R R O O R R ### ### ### |
| EEEEE RRRRRR RRRRRR O O RRRRRR # # # |
| E R R R R O O R R |
| E R R R R O O R R ### ### ### |
| EEEEEEE R R R R OOOOOOO R R ### ### ### |
| |
| Error too large, I have to terminate this calculation! |
| |
| ----> I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <---- |
| |
-----------------------------------------------------------------------------
Code: Select all
# LCONF ###############################################################
# LCONF This line shows the number of local configurations
# LCONF which were sampled from ab initio reference calculations.
# LCONF
# LCONF nstep ...... MD time step or input structure counter
# LCONF el ......... Element symbol
# LCONF nlrc_old ... Previous number of local reference configurations for this element
# LCONF nlrc_new ... Current number of local reference configurations for this element
# LCONF ###############################################################
# LCONF nstep el nlrc_old nlrc_new el nlrc_old nlrc_new
# LCONF 2 3 4 5 6 7 8
# LCONF ###############################################################
LCONF 1 H 0 66 O 0 33
LCONF 2 H 65 131 O 32 65
LCONF 3 H 131 197 O 65 98
LCONF 4 H 197 263 O 98 131
LCONF 5 H 259 325 O 129 162
LCONF 6 H 317 383 O 158 191
LCONF 7 H 374 440 O 186 219
Multiple deleted lines ....
LCONF 230 H 3911 3942 O 2010 2028
LCONF 232 H 3941 3969 O 2027 2044
LCONF 235 H 3969 3998 O 2044 2060
LCONF 237 H 3998 4027 O 2060 2076
LCONF 238 H 4026 4052 O 2076 2090
LCONF 240 H 4052 4074 O 2090 2104
LCONF 241 H 4074 4097 O 2104 2117
You do not have the required permissions to view the files attached to this post.
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
ML training and Error: RATTLE_vel algorithm did not converge! err= NaN
I am training a force field using ML for H2O in a box. This training is being done using Vasp 6.3.1. The job terminated with the error below.
I tried a second (new) run with both TEBEG and TEEND set to 300K, ISIF=3 to compute the stress tensor while using a Langevin thermostat with MDALGO=3. I have used ICONST to fix the cell dimensions and a time step of 1.5 fs after changing the mass of H to 8 amu in the POTCAR file. I have attached the INCAR file and other associated files for reference. During the run the temperature more or less monotonically increased from 300K to about 1600K upon which the run terminated due to the RATTLE_vel algorithm not converging. The settings for this run were from what I gather exactly what you (Ferenc) suggested. Can you offer some insight as to what should be changed?
Another question relates to diagnostics in ML_LOGFILE. If I grep the sparsification entries I can see that a typical SPRSC entry looks like below. For this line, am I correct in assuming there are 144 reference structures (training structures?) and for H there are 4096 local reference configurations while there are 2116 reference configurations for O. Is this correct?
I fixed the volume and shape of the simulation cell using the ICONST file below. I noticed upon a careful rereading of your earlier message that you suggested using the third example from the ICONST entry in the vasp wiki. Unless I am mistaken, this allowed for variable volume, but fixed the cell shape. I don't understand why this is a better option than fixing the cell volume (but I am doing another run using this option to check). Can you elaborate on the logic behind the variable cell volume/ fixed shape option?
I know I am getting ahead of myself, but I am still optimistic that the training errors will be resolved and am would like to set up a plan for carrying out training on the H2O/graphene system.
The initial suggestion was to train with H2O and then after an initial training session, introduce a graphene sheet with the water molecules and train the system for interactions with C. If the problems can be solved, I am curious with how to decide training is sufficient. You stated earlier "Please expect around 1000-2000 training structures (ML_MCONF) and several thousand local reference configurations (ML_MB)."
From the (very helpful) comments in ML_LOGFILE it would seem that grepping the tag SPRSC reporting on sparsification offers these values in the form of "nstr_spar ... Number of reference structures after sparsification" and "nlrc_spar ... Number of local reference configurations after sparsification for this element". Is this correct. If so I assume I should be looking for 1000-2000 values of nstr_spar and several thousand local reference configurations via the quantity nlrc_spar . Is this correct. If all goes well and I proceed to training graphene and H2O together, what should the values of these quantities be for a reasonable training set?
I tried a second (new) run with both TEBEG and TEEND set to 300K, ISIF=3 to compute the stress tensor while using a Langevin thermostat with MDALGO=3. I have used ICONST to fix the cell dimensions and a time step of 1.5 fs after changing the mass of H to 8 amu in the POTCAR file. I have attached the INCAR file and other associated files for reference. During the run the temperature more or less monotonically increased from 300K to about 1600K upon which the run terminated due to the RATTLE_vel algorithm not converging. The settings for this run were from what I gather exactly what you (Ferenc) suggested. Can you offer some insight as to what should be changed?
Another question relates to diagnostics in ML_LOGFILE. If I grep the sparsification entries I can see that a typical SPRSC entry looks like below. For this line, am I correct in assuming there are 144 reference structures (training structures?) and for H there are 4096 local reference configurations while there are 2116 reference configurations for O. Is this correct?
Code: Select all
SPRSC 241 114 114 H 4097 4096 O 2117 2116
I fixed the volume and shape of the simulation cell using the ICONST file below. I noticed upon a careful rereading of your earlier message that you suggested using the third example from the ICONST entry in the vasp wiki. Unless I am mistaken, this allowed for variable volume, but fixed the cell shape. I don't understand why this is a better option than fixing the cell volume (but I am doing another run using this option to check). Can you elaborate on the logic behind the variable cell volume/ fixed shape option?
Code: Select all
LR 1 0
LR 2 0
LR 3 0
LA 1 2 0
LA 1 3 0
LA 2 3 0
The initial suggestion was to train with H2O and then after an initial training session, introduce a graphene sheet with the water molecules and train the system for interactions with C. If the problems can be solved, I am curious with how to decide training is sufficient. You stated earlier "Please expect around 1000-2000 training structures (ML_MCONF) and several thousand local reference configurations (ML_MB)."
From the (very helpful) comments in ML_LOGFILE it would seem that grepping the tag SPRSC reporting on sparsification offers these values in the form of "nstr_spar ... Number of reference structures after sparsification" and "nlrc_spar ... Number of local reference configurations after sparsification for this element". Is this correct. If so I assume I should be looking for 1000-2000 values of nstr_spar and several thousand local reference configurations via the quantity nlrc_spar . Is this correct. If all goes well and I proceed to training graphene and H2O together, what should the values of these quantities be for a reasonable training set?
Code: Select all
# SPRSC #######################################################################################################
# SPRSC This line shows the results of sparsification regarding the number
# SPRSC of reference structures and local reference configurations.
# SPRSC
# SPRSC nstep ....... MD time step or input structure counter
# SPRSC nstr_prev ... Number of reference structures before sparsification
# SPRSC nstr_spar ... Number of reference structures after sparsification
# SPRSC el .......... Element symbol
# SPRSC nlrc_prev ... Number of local reference configurations before sparsification for this element
# SPRSC nlrc_spar ... Number of local reference configurations after sparsification for this element
# SPRSC #######################################################################################################
# SPRSC nstep nstr_prev nstr_spar el nlrc_prev nlrc_spar nstr_prev nstr_spar el nlrc_prev nlrc_spar
# SPRSC 2 3 4 5 6 7 8 9 10 11 12
# SPRSC #######################################################################################################
SPRSC 1 1 1 H 66 65 O 33 32
SPRSC 2 2 2 H 131 131 O 65 65
SPRSC 3 3 3 H 197 197 O 98 98
SPRSC 4 4 4 H 263 259 O 131 129
SKIPPED LINES
SPRSC 230 108 108 H 3942 3941 O 2028 2027
SPRSC 232 109 109 H 3969 3969 O 2044 2044
SPRSC 235 110 110 H 3998 3998 O 2060 2060
SPRSC 237 111 111 H 4027 4026 O 2076 2076
SPRSC 238 112 112 H 4052 4052 O 2090 2090
SPRSC 240 113 113 H 4074 4074 O 2104 2104
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 542
- Joined: Fri Nov 08, 2019 7:18 am
Re: ML on system with H2O
I merged the topics as the discussion seem to all concern the same system.
Martin Schlipf
VASP developer
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Re: ML on system with H2O
Thank you for merging the topics. I initially tried to add to my original post, but when submitting the post, the forum responded with a file permission error. Thus I made a new post. I am curious about the answer to the questions. I have tried yet one more time to train the FF for H2O with the temperature fixed at 300K, but over the course of 400 steps, the temperature has creeped up to about 2000K. As ISIF=3, I am using a Langevin themostat and a time step of 1.5fs. Is my time step too big even though I increased the pass of H to 8 amu to avoid problems with too big time steps? Any advice is welcome!
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Re: ML on system with H2O
I have attempted a new ML learning session with H2O. In this run I again used the Langevin, but did not attempt to ramp the temperature, but left it set at 300K. I used a time step of 1.5 fs (POTIM) and set the mass of H to 8 amu to allow sampling of a larger subset of phase-space without running into integration problems. The problem here again is that the temperature drifted up from 300K to about 5500K before the run crashed with the following error. I am at a loss as to how to train the ML FF.
Although the INCAR file is attached, I will mention that I used the following machine learning parameters
e.g. As the bond length in H2O is about an Angstrom, I change ML_ION1 and ML_ION2 to 0.3 Angstroms and increased the number of basis functions to 12. Any suggestions as to what to try next are most welcome.
Although the INCAR file is attached, I will mention that I used the following machine learning parameters
Code: Select all
#ML_SION1 = 0.30
#ML_SION2 = 0.30
ML_MB = 10000
#ML_IALGO_LINREG=1
#ML_MRB1=12
Code: Select all
MM: 37 -0.420936430149E+03 -0.39800E-02 -0.19248E-03 201 0.131E-01
Abort(537501702) on node 31 (rank 31 in comm 0): Fatal error in PMPI_Irecv: Invalid rank, error stack:
PMPI_Irecv(167): MPI_Irecv(buf=0x7ffe991ca270, count=1, MPI_DOUBLE, src=-25, tag=15, comm=0xc4000010, request=0x7ffe991c9da0) failed
PMPI_Irecv(95).: Invalid rank has value -25 but must be nonnegative and less than 32
You do not have the required permissions to view the files attached to this post.
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Re: ML on system with H2O
A small update. Even though I had modified the atomic weight of H to 8 amu, I was still suspicious of the time step of 1.5 fs being too large. I changed the time step to 0.5 (still with H=8 amu) and restarted the run. The run using using the Langevin thermostat with fixed cell size and shape with a temperature ramp from 300 to 500 K. The temperature has crept upwards similar to before, but seems to have stabilized at slightly less than 800K for the last few hundred steps. As the electronic structure is converging, I am still hoping this is providing good training data for the force field.
I now have a few hundred training steps with several thousand configurations for both H and O. What would be a good place to stop at?
I now have a few hundred training steps with several thousand configurations for both H and O. What would be a good place to stop at?
Code: Select all
SPRSC 1073 152 152 H 4493 4475 O 1993 1969
SPRSC 1110 156 156 H 4590 4576 O 2039 2020
SPRSC 1128 158 158 H 4638 4627 O 2054 2040
SPRSC 1146 160 160 H 4694 4685 O 2075 2063
SPRSC 1169 163 163 H 4802 4791 O 2122 2105
SPRSC 1219 168 168 H 4976 4962 O 2196 2175
SPRSC 1232 170 170 H 5016 5007 O 2206 2189
SPRSC 1255 173 173 H 5094 5084 O 2237 2220
SPRSC 1305 178 178 H 5231 5218 O 2304 2282
SPRSC 1355 183 183 H 5356 5343 O 2345 2323
SPRSC 1374 185 185 H 5390 5380 O 2350 2335
SPRSC 1424 190 190 H 5522 5510 O 2407 2390
SPRSC 1474 195 195 H 5640 5626 O 2466 2446
SPRSC 1507 199 199 H 5723 5713 O 2501 2483
SPRSC 1557 204 204 H 5844 5832 O 2545 2527
SPRSC 1607 209 209 H 5937 5927 O 2578 2562
SPRSC 1644 213 213 H 6041 6027 O 2618 2602
SPRSC 1683 217 217 H 6170 6156 O 2671 2652
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Langevin thermostat
I think I may have answered my own question about the cause of the temperature instability, namely the parameters for the Langevin thermostat required by MDALGO=3 using the method of Parrinello and Rahman.
The parameters are
1. LANGEVIN_GAMMA : the friction coefficients γ (in ps-1) for each atom type
2. LANGEVIN_GAMMA_L (the friction coefficient (in ps-1) for lattice degrees-of-freedom)
3. PMASS (mass for the lattice degrees-of-freedom)
Can anyone offer insight on what values I should use for a system containing H2O atoms in a big box?
The parameters are
1. LANGEVIN_GAMMA : the friction coefficients γ (in ps-1) for each atom type
2. LANGEVIN_GAMMA_L (the friction coefficient (in ps-1) for lattice degrees-of-freedom)
3. PMASS (mass for the lattice degrees-of-freedom)
Can anyone offer insight on what values I should use for a system containing H2O atoms in a big box?
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: ML on system with H2O
So in none of your calculations I can see that you have changed the mass of hydrogen.
I just grepped in your latest calculation for POMASS and I get the following:
POMASS = 1.00 16.00
So H has still a mass of 1.
You need to modify your POMASS for H in the POTCAR file or set the values in the INCAR file:
https://www.vasp.at/wiki/index.php/POMASS
Also you have set a way too large energy cut-off for the energetic convergence: EDIFF = 0.00428
A good practice is, try to set as few as possible tags and order the INCAR file, the defaults usually work well. Already the ISIF tag and the EDIFF were set really wrongly.
I have made an INCAR for you now, please try that:
ENCUT = 700
GGA = RP
ALGO = Fast
IBRION = 0
ISIF = 3
MDALGO=3
LANGEVIN_GAMMA = 10.0 10.0
LANGEVIN_GAMMA_L = 3.0
PMASS = 100
ISMEAR = 0
ISYM = 0
LASPH = True
LCHARG = False
LREAL = Auto
ML_ISTART = 0
ML_LMLFF = True
ML_MB = 10000
NCORE = 2
NSW = 10000
POTIM = 1.5
PREC = Normal
TEBEG = 300
TEEND = 500
IVDW = 11
POMASS = 8.0 16.0
I just grepped in your latest calculation for POMASS and I get the following:
POMASS = 1.00 16.00
So H has still a mass of 1.
You need to modify your POMASS for H in the POTCAR file or set the values in the INCAR file:
https://www.vasp.at/wiki/index.php/POMASS
Also you have set a way too large energy cut-off for the energetic convergence: EDIFF = 0.00428
A good practice is, try to set as few as possible tags and order the INCAR file, the defaults usually work well. Already the ISIF tag and the EDIFF were set really wrongly.
I have made an INCAR for you now, please try that:
ENCUT = 700
GGA = RP
ALGO = Fast
IBRION = 0
ISIF = 3
MDALGO=3
LANGEVIN_GAMMA = 10.0 10.0
LANGEVIN_GAMMA_L = 3.0
PMASS = 100
ISMEAR = 0
ISYM = 0
LASPH = True
LCHARG = False
LREAL = Auto
ML_ISTART = 0
ML_LMLFF = True
ML_MB = 10000
NCORE = 2
NSW = 10000
POTIM = 1.5
PREC = Normal
TEBEG = 300
TEEND = 500
IVDW = 11
POMASS = 8.0 16.0
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: ML on system with H2O
Concerning the Langevin thermostat (as for all thermostats), the results should be independent of it's parameters.
Of course if the damping is to large your system will not move enough or in worst case won't be ergodic.
If the value is too large you can get huge uncontrollable fluctuations and deformations.
What I have usually experienced is that the calculation is not very sensitive to the values of LANGEVIN_GAMMA, PMASS and LANGEVIN_GAMMA_L so I just carry the values over from one calculation to the other.
Of course to be really safe you need to check the effect of the thermostat/barostat parameters for each new system by varying each parameter and comparing the change of the desired quantity.
Of course if the damping is to large your system will not move enough or in worst case won't be ergodic.
If the value is too large you can get huge uncontrollable fluctuations and deformations.
What I have usually experienced is that the calculation is not very sensitive to the values of LANGEVIN_GAMMA, PMASS and LANGEVIN_GAMMA_L so I just carry the values over from one calculation to the other.
Of course to be really safe you need to check the effect of the thermostat/barostat parameters for each new system by varying each parameter and comparing the change of the desired quantity.
- paulfons
- Jr. Member
- Posts: 85
- Joined: Sun Nov 04, 2012 2:40 am
- License Nr.: 5-1405
- Location: Yokohama, Japan
- Contact:
Re: ML on system with H2O
Dear Ferenc,
I seemed to have solved the problem with the LANGEVIN Gamma and the system has remained within a reasonable proximity of the 300K setting I used.
I am curious to know what constitutes a sufficient number of reference configurations. Currently I have ML_MB = 10000 and the run terminated due to insufficient memory for the number of configurations stored.
I can see that when it terminated there were 9683 configurations for O and 4092 configurations for H stored along with a total of 192 samples. Is this a reasonable amount of configurations? Should I increase ML_MB more and continue for H2O before adding carbon? Thanks for your help (and patience!)
I seemed to have solved the problem with the LANGEVIN Gamma and the system has remained within a reasonable proximity of the 300K setting I used.
I am curious to know what constitutes a sufficient number of reference configurations. Currently I have ML_MB = 10000 and the run terminated due to insufficient memory for the number of configurations stored.
I can see that when it terminated there were 9683 configurations for O and 4092 configurations for H stored along with a total of 192 samples. Is this a reasonable amount of configurations? Should I increase ML_MB more and continue for H2O before adding carbon? Thanks for your help (and patience!)
Code: Select all
SPRSC 1152 162 162 H 8170 8139 O 3566 3505
SPRSC 1202 167 167 H 8460 8430 O 3670 3603
SPRSC 1252 172 172 H 8751 8723 O 3766 3694
SPRSC 1302 177 177 H 9044 9015 O 3859 3778
SPRSC 1352 182 182 H 9336 9307 O 3943 3858
SPRSC 1402 187 187 H 9621 9592 O 4023 3930
SPRSC 1452 192 192 H 9896 9863 O 4092 4000
(
-
- Global Moderator
- Posts: 460
- Joined: Mon Nov 04, 2019 12:44 pm
Re: ML on system with H2O
Please post the input and output files of your last calculation (POSCAR, KPOINTS, POTCAR, INCAR, ML_LOGFILE, OSZICAR)