VASP 6.3.0 crashes several nodes on lonestar 6-Update

Problems running VASP: crashes, internal errors, "wrong" results.

Moderators: Global Moderator, Moderator

Locked
Message
Author
nicholas_dimakis1
Newbie
Newbie
Posts: 17
Joined: Tue Sep 15, 2020 3:36 pm

VASP 6.3.0 crashes several nodes on lonestar 6-Update

#1 Post by nicholas_dimakis1 » Mon Jun 06, 2022 8:18 pm

Hello

I am running a job on the TACC lonestar 6 supercomputer and it seems that VASP suddenly uses 150 GB of RAM from 50 GB before, leading the crashing some of the nodes. All files are attached. The structure contains 142 atoms and the run is for band structure calculations.

I am using VASP 6.3.0 and use 20 nodes with 640 CPUs in total.

Thank you-Nick
You do not have the required permissions to view the files attached to this post.

martin.schlipf
Global Moderator
Global Moderator
Posts: 456
Joined: Fri Nov 08, 2019 7:18 am

Re: VASP 6.3.0 crashes several nodes on lonestar 6-Update

#2 Post by martin.schlipf » Tue Jun 07, 2022 11:45 am

I think the 150Gb is expected. In the OUTCAR file it reports ~5Gb per core and you use 32 cores per node. If you want to reduce the memory, you can reduce the number of KPOINTS by splitting the band structure calculation into multiple separate runs for the different lines. KPAR also leads to increased memory demand, so you may want to replace it with NCORE instead.

Code: Select all

 total amount of memory used by VASP MPI-rank0  5551890. kBytes
=======================================================================

   base      :      30000. kBytes
   nonl-proj :    4339312. kBytes
   fftplans  :      28532. kBytes
   grid      :     250149. kBytes
   one-center:       4416. kBytes
   wavefun   :     899481. kBytes
Also, I found 2 issues in your INCAR file: You use a tab instead of spaces after LSCALAPACK so that this tag is ignored. You give the ICHARG tag twice. Note that VASP prints warnings about both these issues to the output.

One last thing: Do you expect the system to be magnetic? If not you should not set ISPIN = 2 then the calculation will be twice as efficient.

Locked