Unknown VASP Error before entering SCF loop

Message

burakgurlek · #1 Post by **burakgurlek** » Mon Jun 30, 2025 3:55 pm

Hi,

I am simulating a rather small system with very demanding settings. I got 125 exit error from Slurm as you can see in the attachment. I tried to play with different parallelization schemas and reduced number of bands etc. but could not run more than 6min. It is likely that the memory is the issue but reducing the number of bands did not alleviate the problem.

Files.7z

I would appreciate the help

Regards,
Burak

#2 Post by **andreas.singraber** » Wed Jul 02, 2025 2:37 pm

Hello Burak,

I could reproduce this behavior and I am pretty sure that there is just not enough memory which needs to be allocated after some initialization period (6min in your case). Did you try to measure the memory usage for a simpler setup (less k-points, smaller ENCUT,...)?

Up until now I only performed rudimentary tests and I will need some more time to have a closer look. In the meantime, could you explain how you ended up with the current settings, in particular,

Code: Select all

PREC = High
ALGO = Exact
ENCUT = 1000
NBANDS = 1160

and also the k-points

Code: Select all

Automatic mesh
0              ! number of k-points = 0 ->automatic generation scheme
Gamma          ! generate a Gamma centered grid
24  4  6        ! subdivisions N_1, N_2 and N_3 along recipr. l. vectors
0. 0. 0.       ! shift of the mesh

Are you sure 24x4x6 is the right grid for the given structure? From the lattice vectors I would rather set something based on 4x2x1, i.e, the second number should be higher than the third one, right? Anyway, this may be unrelated to your problem.

All the best,
Andreas Singraber

#3 Post by **andreas.singraber** » Thu Jul 03, 2025 11:58 am

Hello again!

I performed a few more tests to give you an impression on how the memory scales with the number of k-points and energy cutoff in your case. For this I used a machine with 1 TB of memory and started 32 MPI ranks. The setup is similar to yours, however, I used different k-grids and energy cutoffs. For each data point in the plot I ran VASP only until the memory consumption equilibrated. The k-point grids were

6 k-points: 4 x 2 x 1
36 k-points: 8 x 4 x 2
260 k-points: 16 x 8 x 4

mem-vs-encut-kpoints.png

As you can see there is no chance to run VASP with your current setup even on this 1 TB machine. Maybe you can make similar tests for your machine and see if there are some reduced settings which are still acceptable?

All the best,
Andreas Singraber

burakgurlek · #4 Post by **burakgurlek** » Fri Jul 04, 2025 9:45 am

Dear Andreas,

thanks for looking into this in detail. I see your point and I also run few tests on my side. Just to clarify, I would ask a few question:

1- In your plot, the memory consumption is per core/task? Otherwise, none of the scenarios reach 1TB. You had 32 MPI tasks so per task you have like 30GB approx.

2- Previously, I was estimating the memory requirement from VASP manual: https://www.vasp.at/wiki/index.php/Memory_requirements
However, the number I got here is different than what VASP estimates (assuming KPAR =1, to remove extra load), e.g. kpoints of 12x4x4 and ENCUT = 1000, I have

Code: Select all

 Dimension of arrays:
   k-points           NKPTS =    100   k-points in BZ     NKDIM =    100   number of bands    NBANDS=   1160
   number of dos      NEDOS =   2000   number of ions     NIONS =     16
   non local maximal  LDIM  =     11   non local SUM 2l+1 LMDIM =     41
   total plane-waves  NPLWV = 190512
   max r-space proj   IRMAX =      1   max aug-charges    IRDMAX=  23165
   dimension x,y,z NGX =    28 NGY =   54 NGZ =  126
   dimension x,y,z NGXF=    56 NGYF=  108 NGZF=  252
   support grid    NGXF=    56 NGYF=  108 NGZF=  252

This amounts to

Code: Select all

NKDIM*NBANDS*NRPLWV*16

=354 GB. I assume this would be total memory estimated (memory requirement for NGXF etc is much smaller) however, VASP prints

Code: Select all

--------------------------------------- Iteration      1(   1)  ---------------------------------------


    POTLOK:  cpu time      0.0253: real time      0.0253
    SETDIJ:  cpu time      0.0053: real time      0.0053

 total amount of memory used by VASP MPI-rank0 29394053. kBytes
=======================================================================

   base      :      30000. kBytes
   nonl-proj :     323591. kBytes
   fftplans  :       2529. kBytes
   grid      :       4596. kBytes
   one-center:        161. kBytes
   wavefun   :   29033176. kBytes

How should I understand the difference?

3- For the estimate you did, what you mean by "until the memory consumption equilibrated". Do you read the memory consumption from the system or from the VASP output? In most cases, I cannot see even the estimation from VASP. There is an estimation before SCF starts and just after it from VASP.

4- Can you comment on the effect of parallelzing this over many nodes? The memory consumption is due to wavefunction storage, Does this get reduced over many nodes? For example, if my estimation would be 3TB, Can I run this with 4 1TB nodes assuming KPAR =1? I think Kpar\neq 1 will bring extra scaling.

Best Regards,
Burak

My Community

Unknown VASP Error before entering SCF loop

Unknown VASP Error before entering SCF loop

Re: Unknown VASP Error before entering SCF loop

Re: Unknown VASP Error before entering SCF loop

Re: Unknown VASP Error before entering SCF loop