ML_FF is not compatible with IBRION = 5 or 6

Problems running VASP: crashes, internal errors, "wrong" results.

Moderators: Global Moderator, Moderator

Locked
Message
Author
soungminbae
Newbie
Newbie
Posts: 5
Joined: Wed Apr 27, 2022 5:44 am

ML_FF is not compatible with IBRION = 5 or 6

#1 Post by soungminbae » Thu May 25, 2023 7:54 am

Dear developers,

I have tested gamma point phonon calculations with IBRION = 5 or 6 tags with using a pre-trained ML_FF.
It is found that VASP gives error when displaced structures are calculated.

Code: Select all

 POSCAR, INCAR and KPOINTS ok, starting setup
 entering main loop
   1 F= -.68821541E+03 E0= -.68821541E+03  d E =0.000000E+00
 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     Your timestep is larger than 0.1 Angst.                                 |
|     For finite differences, this really does not make sense. I will         |
|     reset POTIM to 0.015. I recommend to use 0.01 to 0.02 for finite        |
|     differences.                                                            |
|                                                                             |
 -----------------------------------------------------------------------------

 Found   336 degrees of freedom:
 Finite differences POTIM= 0.01500 DOF= 336
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libpthread-2.17.s  00002B1C170B55D0  Unknown               Unknown  Unknown
vasp-6.4.1.x       0000000001F0C087  Unknown               Unknown  Unknown
vasp-6.4.1.x       0000000000655722  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000064E144  Unknown               Unknown  Unknown
vasp-6.4.1.x       00000000006CDCD7  Unknown               Unknown  Unknown
vasp-6.4.1.x       00000000006C37E1  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000074B507  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000113B8A3  Unknown               Unknown  Unknown
vasp-6.4.1.x       0000000001E06E82  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000040E85D  Unknown               Unknown  Unknown
libc-2.17.so       00002B1C175E63D5  __libc_start_main     Unknown  Unknown
vasp-6.4.1.x       000000000040E776  Unknown               Unknown  Unknown


In the standard output, the energy and force at the first step (i.e. non displaced POSCAR) is given but calculation crashes right after print out the number of DOF (degree of freedom).
Is this a bug or IBRION = 5 or 6 are not compatible with ML_FF?

Thank you for your help!

SB

martin.schlipf
Global Moderator
Global Moderator
Posts: 458
Joined: Fri Nov 08, 2019 7:18 am

Re: ML_FF is not compatible with IBRION = 5 or 6

#2 Post by martin.schlipf » Thu May 25, 2023 2:28 pm

Using MLFF with phonons should work with the finite difference approach (IBRION = 5 and 6). Can you produce a small example to illustrate the issue you observe?

soungminbae
Newbie
Newbie
Posts: 5
Joined: Wed Apr 27, 2022 5:44 am

Re: ML_FF is not compatible with IBRION = 5 or 6

#3 Post by soungminbae » Fri May 26, 2023 5:01 am

Thank you for the reply!
I found that the bug what I asked (i.e., termination ML_FF calculation after printing out DOF) seemingly not occur on global cases, but specified to my own case. I also tested the IBRION =5 + ML_FF combination for the other case, the calculation has been successfully done.
I think it is quit difficult why this termination happened, but I share my input files.

POSCAR

Code: Select all

unknown system
   1.00000000000000
    10.0431600000000003    0.0000000000000000    0.0000000000000000
     0.0000000000000000    4.3488100000000003    0.0000000000000000
     0.0000000000000000    0.0000000000000000   34.7949919547301363
   P    B    N
    48    32    32
Direct
  0.0000000479704525  0.5024172302218768  0.9829901259105888
  0.1666404148590806  0.1586785216372398  0.9829109776706766
  0.3332410049743103  0.5013226545054458  0.9829722312545602
  0.4999999602109294  0.1579264134683900  0.9828484952336063
  0.6667589122970512  0.5013231068842177  0.9829722647063603
  0.8333596357649804  0.1586779962933235  0.9829110106337643
  0.0000000966521290  0.6568851672941729  0.0451421782603636
  0.1666651999519846 -0.0000942854833427  0.0448988467162209
  0.3334297085187903  0.6570324504359637  0.0450844977737194
  0.4999998940148024  0.0002380067200743  0.0448628540203238
  0.6665701658663598  0.6570317258325401  0.0450845381947704
  0.8333348566847183 -0.0000934642069198  0.0448988932618396
  0.0000000115067490  0.0572101524804199  0.1364283041400120
  0.1666741117582839  0.4001816892898144  0.1358354760333592
  0.3333305137178604  0.0571632030230954  0.1364207103380511
  0.4999999819922730  0.4001554259151572  0.1358258886713993
  0.6666695338789785  0.0571632726503430  0.1364207246029787
  0.8333258182959278  0.4001816270104787  0.1358354873828342
 -0.0000000214872880  0.8988842603198575  0.1986054882246096
  0.1666738446566494  0.5558491366926843  0.1980800265209932
  0.3333349234582820  0.8989434105266239  0.1986047107411991
  0.5000000293512319  0.5558843807026168  0.1980727955179495
  0.6666650380076778  0.8989433547818926  0.1986046994806499
  0.8333261745295195  0.5558491431470232  0.1980800118401050
  0.0000000130954311  0.4311817691051696  0.2887621499029351
  0.1666646244181119  0.0879927075752353  0.2888002174853139
  0.3333267494645972  0.4312277092198860  0.2887569451395237
  0.4999999988372724  0.0880377056376928  0.2888002015700850
  0.6666732498389993  0.4312277037911273  0.2887569628067588
  0.8333353621551315  0.0879927128681067  0.2888002256388837
 -0.0000000079975959  0.5870461023461482  0.3510105274213041
  0.1666691763745840  0.9302106484828939  0.3509919825897128
  0.3333229161831341  0.5870028403244807  0.3510067931900736
  0.5000000207717736  0.9301733862926959  0.3509896430581534
  0.6666770956512588  0.5870028993081915  0.3510067694389507
  0.8333308311520016  0.9302105732960629  0.3509919680779670
  0.0000001207498109  0.0014827564404178  0.4421685573508389
  0.1665857545527171  0.3445377776603269  0.4416546607076480
  0.3333592875880079  0.0018119098522435  0.4421428640790219
  0.4999998424331478  0.3446782553242987  0.4416117194694667
  0.6666405632054166  0.0018108290076843  0.4421427817092846
  0.8334144661577452  0.3445387042323458  0.4416545728313952
  0.0000000086811350  0.8441159391044999  0.5042539644660159
  0.1667599594598428  0.5007517304170989  0.5038199905877139
  0.3333547652223657  0.8433531180878536  0.5041958497747018
  0.5000000698024034  0.4996749772315339  0.5038140628514327
  0.6666451102389939  0.8433537014641527  0.5041958002355972
  0.8332401154233873  0.5007510600983550  0.5038199365134397
 -0.0000017221627683  0.3344237676810776  0.6017447651913934
  0.1249779519471058  0.8353457355616614  0.6017333100155745
  0.2499175308007087  0.3349158789808500  0.6016448879529076
  0.3749816688881143  0.8357102681608836  0.6015920708488587
  0.5000018931590097  0.3354432736343464  0.6015270878854218
  0.6250237602954128  0.8357044544181554  0.6015920309499286
  0.7500827751030371  0.3349371054544923  0.6016448202759475
  0.8750171408708652  0.8353402590262956  0.6017332646776798
  0.0000034931018406  0.6788011941775469  0.6961481706227399
  0.1250064256764779  0.1790616719762557  0.6961361780659491
  0.2500380353019690  0.6789408600271669  0.6961039924837559
  0.3750090713058075  0.1787894344843516  0.6960732159211250
  0.4999972149075260  0.6790420747936593  0.6960588648292801
  0.6249893370827092  0.1788020056553781  0.6960732298653208
  0.7499619772730861  0.6789321959966670  0.6961039126114782
  0.8749960530840140  0.1790760999740355  0.6961361929210441
 -0.0000020089850368  0.3233590794022165  0.7906779819963394
  0.1249546347984885  0.8231401875292894  0.7906640407626008
  0.2499489748711586  0.3232002729675522  0.7906284684705709
  0.3749492421978882  0.8232173616575705  0.7905955979907729
  0.5000019212258763  0.3230046363452347  0.7905776060499575
  0.6250525621500173  0.8232227486211046  0.7905956231760878
  0.7500509958445754  0.3231945214693389  0.7906284626118930
  0.8750432812052481  0.8231460589455777  0.7906640544414961
  0.0000008674400243  0.6705873192733220  0.8853534171533231
  0.1249915132835089  0.1704443668398578  0.8852771789563695
  0.2500459948274679  0.6712134395027555  0.8852394403438482
  0.3749993740503551  0.1707683519386726  0.8851561883912028
  0.4999984996671081  0.6717570934593251  0.8851421390313935
  0.6249994220049151  0.1707697177329970  0.8851561725309132
  0.7499537451199971  0.6712095048233380  0.8852395272941075
  0.8750091802840395  0.1704447423541346  0.8852771863916959
  0.0000023098916645  0.6677921998300472  0.6016557790997666
  0.1250310141842520  0.1680415296527433  0.6018644141518965
  0.2500462164881654  0.6683757321294845  0.6016099269260904
  0.3750148650400890  0.1687460735965728  0.6017650279950917
  0.4999978718465458  0.6689740390421491  0.6015316333634627
  0.6249809785953323  0.1687424031018545  0.6017651798001630
  0.7499536881846941  0.6683735397302809  0.6016101356383534
  0.8749738405561361  0.1680376313961198  0.6018645957831142
 -0.0000042703272570  0.3456486451065082  0.6961628485564336
  0.1249496079623166  0.8457642688906337  0.6961361722260275
  0.2498662831092858  0.3457089794478185  0.6961121078992479
  0.3749411844141756  0.8456516819174877  0.6960743168612646
  0.5000032366333320  0.3457785561122383  0.6960656497035292
  0.6250616789006387  0.8456521490176619  0.6960743929536352
  0.7501339990206778  0.3457044423951191  0.6961121872939889
  0.8750466387458309  0.8457642207255252  0.6961362446659051
  0.0000017192843778  0.6565047156256175  0.7906758111336150
  0.1250619834940994  0.1564051641501307  0.7906681541191491
  0.2500458781934027  0.6564341486937699  0.7906236842129285
  0.3750727003290278  0.1564709372316634  0.7905997961885904
  0.4999980075773310  0.6563806482972437  0.7905725304111455
  0.6249260693007518  0.1564737233326111  0.7905997025301166
  0.7499537076412253  0.6564323493827225  0.7906235455460060
  0.8749385232227525  0.1564079491145711  0.7906680578156953
  0.0000002149105192  0.3370534437563116  0.8853355144178576
  0.1249864939748610  0.8374689902739335  0.8851236162645627
  0.2498854826489557  0.3377327340113212  0.8852790726165661
  0.3749520074459789  0.8379928871932684  0.8850270489998693
  0.5000003252244177  0.3384215658278193  0.8852524582602606
  0.6250480537860201  0.8379954356285886  0.8850270232597163
  0.7501144647786125  0.3377338598459103  0.8852790528191520
  0.8750137474297580  0.8374714153656402  0.8851235937486719
INCAR

Code: Select all

ISTART = 0
ICHARG = 2
ENCUT = 500

PREC = Normal
ISYM = -1

ISMEAR = 0
SIGMA = 0.05

IBRION = 5

LWAVE = F
LCHARG = F

GGA      = BO
PARAM1   = 0.1833333333
PARAM2   = 0.22
AGGAC    = 0.0
LUSE_VDW = .TRUE.
LASPH    = .TRUE.

### Machine Learning part
### Major tags for machine learning
ML_LMLFF = .TRUE.
ML_MODE = run
ML_WTIFOR = 100
ML_MB = 3000

LWAVE = F
LCHARG = F
ML_FF header

Code: Select all

ML_FF 0.2.1 binary { "date" : "2023-05-10T16:46:52.305", "ML_LFAST" : True, "ML_DESC_TYPE" :   0, "types" : [ "B", "N", "P" ], "training_structures" : 598, "local_reference_cfgs" : [ 347, 352, 805 ], "descriptors" : [ 1053, 1053, 1053 ], "ML_IALGO_LINREG" : 4, "ML_RCUT1" :  8.0000E+00, "ML_RCUT2" :  5.0000E+00, "ML_W1" :  1.0000E-01, "ML_SION1" :  5.0000E-01, "ML_SION2" :  5.0000E-01, "ML_LMAX2" : 3, "ML_MRB1" : 12, "ML_MRB2" : 8, "ML_IWEIGHT" : 3, "ML_WTOTEN" :  1.0000E+00, "ML_WTIFOR" :  1.0000E+02, "ML_WTSIF" :  1.0000E+00 }
Standard output

Code: Select all

 running  128 mpi-ranks, on    4 nodes
 distrk:  each k-point on  128 cores,    1 groups
 distr:  one band on    1 cores,  128 groups
 vasp.6.4.1 05Apr23 (build May 03 2023 07:59:44) complex

 POSCAR found type information on POSCAR P B N
 POSCAR found :  3 types and     112 ions
 scaLAPACK will be used
 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     For optimal performance we recommend to set                             |
|       NCORE = 2 up to number-of-cores-per-socket                            |
|     NCORE specifies how many cores store one orbital (NPAR=cpu/NCORE).      |
|     This setting can greatly improve the performance of VASP for DFT.       |
|     The default, NCORE=1 might be grossly inefficient on modern             |
|     multi-core architectures or massively parallel machines. Do your        |
|     own testing! More info at https://www.vasp.at/wiki/index.php/NCORE      |
|     Unfortunately you need to use the default for GW and RPA                |
|     calculations (for HF NCORE is supported but not extensively tested      |
|     yet).                                                                   |
|                                                                             |
 -----------------------------------------------------------------------------

 -----------------------------------------------------------------------------
|                                                                             |
|               ----> ADVICE to this user running VASP <----                  |
|                                                                             |
|     You have a (more or less) 'large supercell' and for larger cells it     |
|     might be more efficient to use real-space projection operators.         |
|     Therefore, try LREAL= Auto in the INCAR file.                           |
|     Mind: For very accurate calculation, you might also keep the            |
|     reciprocal projection scheme (i.e. LREAL=.FALSE.).                      |
|                                                                             |
 -----------------------------------------------------------------------------

 LDA part: xc-table for Pade appr. of Perdew
 Machine learning selected
 Setting communicators for machine learning
 Initializing machine learning
 The following ML algorithm is executed for production run: FAST.
 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     Your FFT grids (NGX,NGY,NGZ) are not sufficient for an accurate         |
|     calculation. Thus, the results might be wrong. Good settings for        |
|     NGX, NGY and NGZ are 2, 2 and 10, respectively.                         |
|     Mind: This setting results in a small but reasonable wrap-around        |
|     error. It is also necessary to adjust these values to the FFT           |
|     routines you use.                                                       |
|                                                                             |
 -----------------------------------------------------------------------------

 POSCAR, INCAR and KPOINTS ok, starting setup
 entering main loop
   1 F= -.68821541E+03 E0= -.68821541E+03  d E =0.000000E+00
 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     Your timestep is larger than 0.1 Angst.                                 |
|     For finite differences, this really does not make sense. I will         |
|     reset POTIM to 0.015. I recommend to use 0.01 to 0.02 for finite        |
|     differences.                                                            |
|                                                                             |
 -----------------------------------------------------------------------------

 Finite differences POTIM= 0.01500 DOF= 336
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libpthread-2.17.s  00002AE1927545D0  Unknown               Unknown  Unknown
vasp-6.4.1.x       0000000001F0C087  Unknown               Unknown  Unknown
vasp-6.4.1.x       0000000000655722  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000064E144  Unknown               Unknown  Unknown
vasp-6.4.1.x       00000000006CDCD7  Unknown               Unknown  Unknown
vasp-6.4.1.x       00000000006C37E1  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000074B507  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000113B8A3  Unknown               Unknown  Unknown
vasp-6.4.1.x       0000000001E06E82  Unknown               Unknown  Unknown
vasp-6.4.1.x       000000000040E85D  Unknown               Unknown  Unknown
libc-2.17.so       00002AE192C853D5  __libc_start_main     Unknown  Unknown
vasp-6.4.1.x       000000000040E776  Unknown               Unknown  Unknown
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
libpthread-2.17.s  00002AF7DB4975D0  Unknown               Unknown  Unknown

Stack trace terminated abnormally.
To note, I got the same termination even after I turned off the vdW related tags (apparently not related to this issue)...

SB

martin.schlipf
Global Moderator
Global Moderator
Posts: 458
Joined: Fri Nov 08, 2019 7:18 am

Re: ML_FF is not compatible with IBRION = 5 or 6

#4 Post by martin.schlipf » Fri May 26, 2023 6:18 am

Can you make the ML_FF available to us so that we can reproduce the issue? If uploading here does not work for you because the file is too big or you don't want to share it publicly, please sent me a direct message.

Can you also tell me which POTCARs you use? I assume Gamma-point only; otherwise please also provide the KPOINTS file.

martin.schlipf
Global Moderator
Global Moderator
Posts: 458
Joined: Fri Nov 08, 2019 7:18 am

Re: ML_FF is not compatible with IBRION = 5 or 6

#5 Post by martin.schlipf » Fri May 26, 2023 9:46 am

I could reproduce the issue with your input. We will now investigate possible causes and get back to you as soon as we know more.

ferenc_karsai
Global Moderator
Global Moderator
Posts: 422
Joined: Mon Nov 04, 2019 12:44 pm

Re: ML_FF is not compatible with IBRION = 5 or 6

#6 Post by ferenc_karsai » Wed May 31, 2023 2:30 pm

So I have figured out the problem.
The fast code uses NSW to determine whether if it's at the end of the code. If it's at the end it would deallocate important variables.
Unfortunately with finite differences the value of NSW is completely wrong and not used for the finite difference method. However for the machine learning this can lead to early deallocations which would lead to the errors you have encountered.

The best way until an official fix is to modify the code:
Please delete the lines 568, 569, 570 and 572 in the ml_ff_ff2.F file.
So change:
IF (FF%NSTEP.GE.FF%NSW) THEN
DO_DEALLOCATE = .TRUE.
ELSE
DO_DEALLOCATE = .FALSE.
ENDIF
to:
DO_DEALLOCATE = .FALSE.

This way no explicit deallocation is done at the end of the code and the arrays are deallocated when the code runs out of scope.

Thank you very much for reporting the bug!!!

soungminbae
Newbie
Newbie
Posts: 5
Joined: Wed Apr 27, 2022 5:44 am

Re: ML_FF is not compatible with IBRION = 5 or 6

#7 Post by soungminbae » Fri Jun 02, 2023 1:09 am

Dear VASP team,
Thank you very much for your advise.
I'd just complied VASP-6.4.1 with the suggested modification and now I confirm that the VASP code runs without any problem for ML_FF + IBRION = 5 !

Thank you!
SB

Locked