Page 1 of 1

Strange results from Nvidia compiler: vasp 6.4.2

Posted: Fri Sep 22, 2023 1:19 pm
by sergey_lisenkov1
Hello all,

We have 2 suits of compilers on our IBM Power 9 machine: gnu (11.2.1) and Nvidia HPC SDK (23.7 and earlier versions). I found out that vasp executable made from Nvidia compilers crashes on the first ionic step for many well known structures, while GNU compiled executable works fine. It is a CPU version.

nvhpc executable:

Code: Select all

POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.173264571665E+04    0.17326E+04   -0.11788E+05  1920   0.130E+03
DAV:   2    -0.230358811178E+03   -0.19630E+04   -0.19301E+04  2360   0.312E+02
DAV:   3    -0.413055714576E+03   -0.18270E+03   -0.18196E+03  2112   0.105E+02
DAV:   4    -0.418346594421E+03   -0.52909E+01   -0.52754E+01  2304   0.184E+01
DAV:   5    -0.418504365022E+03   -0.15777E+00   -0.15760E+00  2368   0.314E+00
DAV:   6    -0.418510569381E+03   -0.62044E-02   -0.62019E-02  2384   0.609E-01
DAV:   7    -0.418510821741E+03   -0.25236E-03   -0.25230E-03  2104   0.119E-01
DAV:   8    -0.418510833627E+03   -0.11886E-04   -0.11878E-04  1352   0.277E-02
DAV:   9    -0.418510835590E+03   -0.19633E-05   -0.19605E-05  1248   0.114E-02
DAV:  10    -0.418510836385E+03   -0.79468E-06   -0.79410E-06  1248   0.694E-03    0.391E+01
DAV:  11    -0.409127465421E+03    0.93834E+01   -0.47181E+00  2496   0.598E+00    0.341E+01
DAV:  12    -0.600727414656E+04   -0.55981E+04   -0.25396E+04  2232   0.427E+02    0.753E+02
DAV:  13    -0.206374649373E+05   -0.14630E+05   -0.16714E+05  3616   0.617E+02    0.942E+02
DAV:  14    -0.199576022174E+04    0.18642E+05   -0.63744E+04  2680   0.473E+02    0.630E+02
DAV:  15    -0.240481014127E+04   -0.40905E+03   -0.32877E+04  3152   0.372E+02    0.254E+02
DAV:  16    -0.949437102878E+03    0.14554E+04   -0.12875E+04  2992   0.197E+02    0.256E+02
DAV:  17    -0.130140804480E+04   -0.35197E+03   -0.32329E+03  2296   0.226E+02    0.190E+02
DAV:  18    -0.802601320115E+03    0.49881E+03   -0.17270E+03  2664   0.133E+02    0.175E+02
DAV:  19    -0.110661176686E+05   -0.10264E+05   -0.38819E+04  3280   0.446E+02    0.135E+03
DAV:  20    -0.673739963064E+04    0.43287E+04   -0.34403E+04  2800   0.387E+02    0.703E+02
DAV:  21    -0.822818573836E+05   -0.75544E+05   -0.14020E+05  3792   0.129E+03    0.135E+03
 -----------------------------------------------------------------------------
|                                                                             |
|     EEEEEEE  RRRRRR   RRRRRR   OOOOOOO  RRRRRR      ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     EEEEE    RRRRRR   RRRRRR   O     O  RRRRRR       #       #       #      |
|     E        R   R    R   R    O     O  R   R                               |
|     E        R    R   R    R   O     O  R    R      ###     ###     ###     |
|     EEEEEEE  R     R  R     R  OOOOOOO  R     R     ###     ###     ###     |
|                                                                             |
|     ERROR FEXCP: supplied Exchange-correletion table                        |
|      is too small, maximal index : 5237                                     |
|                                                                             |
|       ---->  I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <----       |
|                                                                             |
 -----------------------------------------------------------------------------
gnu executable:

Code: Select all

POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.173243421018E+04    0.17324E+04   -0.11787E+05  1920   0.130E+03
DAV:   2    -0.230370982792E+03   -0.19628E+04   -0.19299E+04  2360   0.312E+02
DAV:   3    -0.413047841306E+03   -0.18268E+03   -0.18194E+03  2112   0.105E+02
DAV:   4    -0.418338752370E+03   -0.52909E+01   -0.52755E+01  2304   0.184E+01
DAV:   5    -0.418496586483E+03   -0.15783E+00   -0.15766E+00  2368   0.314E+00
DAV:   6    -0.418502797412E+03   -0.62109E-02   -0.62085E-02  2384   0.609E-01
DAV:   7    -0.418503050125E+03   -0.25271E-03   -0.25265E-03  2104   0.119E-01
DAV:   8    -0.418503062029E+03   -0.11905E-04   -0.11897E-04  1352   0.277E-02
DAV:   9    -0.418503063997E+03   -0.19678E-05   -0.19648E-05  1248   0.114E-02
DAV:  10    -0.418503064795E+03   -0.79736E-06   -0.79617E-06  1248   0.695E-03    0.391E+01
DAV:  11    -0.409122240809E+03    0.93808E+01   -0.47174E+00  2496   0.598E+00    0.341E+01
DAV:  12    -0.393017308166E+03    0.16105E+02   -0.78579E+01  2288   0.241E+01    0.196E+01
DAV:  13    -0.392867053009E+03    0.15026E+00   -0.99992E+00  2112   0.885E+00    0.118E+01
DAV:  14    -0.392232402868E+03    0.63465E+00   -0.10163E+00  2336   0.289E+00    0.779E+00
DAV:  15    -0.391914142726E+03    0.31826E+00   -0.17821E+00  2128   0.290E+00    0.298E+00
DAV:  16    -0.391920664614E+03   -0.65219E-02   -0.15550E-01  2192   0.123E+00    0.146E+00
DAV:  17    -0.391913112667E+03    0.75519E-02   -0.11686E-01  2144   0.888E-01    0.119E+00
DAV:  18    -0.391901673271E+03    0.11439E-01   -0.41159E-02  2120   0.535E-01    0.471E-01
DAV:  19    -0.391902290698E+03   -0.61743E-03   -0.17945E-02  2256   0.305E-01    0.367E-01
DAV:  20    -0.391901664399E+03    0.62630E-03   -0.87993E-03  2128   0.221E-01    0.341E-01
DAV:  21    -0.391901405796E+03    0.25860E-03   -0.37255E-03  2224   0.169E-01    0.132E-01
DAV:  22    -0.391901429823E+03   -0.24027E-04   -0.12711E-03  2064   0.103E-01    0.990E-02
DAV:  23    -0.391901353266E+03    0.76557E-04   -0.28209E-04  1328   0.549E-02
   1 F= -.39959478E+03 E0= -.39959478E+03  d E =-.399595E+03
 curvature:   0.00 expect dE= 0.000E+00 dE for cont linesearch  0.000E+00
 trial: gam= 0.00000 g(F)=  0.611E-02 g(S)=  0.000E+00 ort = 0.000E+00 (trialstep = 0.100E+01)
 search vector abs. value=  0.611E-02
 reached required accuracy - stopping structural energy minimisation
 writing wavefunctions
I tried everything with Nvidia compilers - no optimization, different libraries. Nothing helps. What can be an issue?

Thanks,
Sergey

Re: Strange results from Nvidia compiler: vasp 6.4.2

Posted: Mon Sep 25, 2023 8:36 am
by merzuk.kaltak
Dear Sergey,

please provide us some input and output files (preferably for a small system).
We would like to reproduce your problem.
Also, please let us know which libraries (and versions) you use to compile and link vasp to.
If possible, please upload also the makefile.include used.

Re: Strange results from Nvidia compiler: vasp 6.4.2

Posted: Tue Oct 03, 2023 6:57 pm
by sergey_lisenkov1
Good evening,

I apologize for the late reply.

Please find attached the set of input files and makefile.include I used. The test file is not small, but it really takes 5 minutes to get this error.
I used lapack/blas/scalapack as shipped with Nvidia SDK set, and FFTW-3.3.10.

Re: Strange results from Nvidia compiler: vasp 6.4.2

Posted: Wed Oct 04, 2023 9:36 am
by merzuk.kaltak
Dear Sergey,

Please upload the input files including OUTCAR of the successful run as well.

Re: Strange results from Nvidia compiler: vasp 6.4.2

Posted: Mon Oct 09, 2023 12:43 pm
by merzuk.kaltak
Dear Sergey,

inspecting your makefile.include I found a few points that might be responsible for the issue.
You set a global -O2 optimization flag with following

Code: Select all

OFLAG       = -O2 -fast -Mcache_align
The recommended makefile.include for nvhpc typically have only a

Code: Select all

OFLAG = -fast
If possible, please use the latter.

Furthermore, it seems you have compiled your own scalapack with

Code: Select all

# BLAS (mandatory)
BLAS        = -lblas

# LAPACK (mandatory)
LAPACK      = -llapack

# scaLAPACK (mandatory)
SCALAPACK   =  -L$(HOME)/arch/nvhpc/scalapack-2.2.0-spectrum_mpi/lib/ -lscalapack -llapack  -lblas
We usually recommend to use the scalapack shipped with the hpc_sdk suite.

Last but not least and maybe most important.
It seems you compiled fftw with hpc_sdk and link your vasp executable to this library with

Code: Select all

# FFTW (mandatory)
FFTW_ROOT  ?= $(HOME)/arch/nvhpc/fftw-3.3.10-nv23.7/
If this is correct, then please try compiling fftw with gcc and link vasp to it.

Re: Strange results from Nvidia compiler: vasp 6.4.2

Posted: Tue Oct 10, 2023 12:32 pm
by sergey_lisenkov1
Here is a good output: vasp compiled with GNU-11.2.1 and OpenMPI, OpenBLAS and FFTW compiled with gnu.

Re: Strange results from Nvidia compiler: vasp 6.4.2

Posted: Tue Oct 10, 2023 1:43 pm
by sergey_lisenkov1
I think you are correct in your suggestion: it was caused by nvidia compiled FFTW. If I use GNU compiled FFTW, this error disappeared.

Thanks for your help!

Sergey