Dear Andreas,
thanks for looking into this in detail. I see your point and I also run few tests on my side. Just to clarify, I would ask a few question:
1- In your plot, the memory consumption is per core/task? Otherwise, none of the scenarios reach 1TB. You had 32 MPI tasks so per task you have like 30GB approx.
2- Previously, I was estimating the memory requirement from VASP manual: https://www.vasp.at/wiki/index.php/Memory_requirements
However, the number I got here is different than what VASP estimates (assuming KPAR =1, to remove extra load), e.g. kpoints of 12x4x4 and ENCUT = 1000, I have
Code: Select all
Dimension of arrays:
k-points NKPTS = 100 k-points in BZ NKDIM = 100 number of bands NBANDS= 1160
number of dos NEDOS = 2000 number of ions NIONS = 16
non local maximal LDIM = 11 non local SUM 2l+1 LMDIM = 41
total plane-waves NPLWV = 190512
max r-space proj IRMAX = 1 max aug-charges IRDMAX= 23165
dimension x,y,z NGX = 28 NGY = 54 NGZ = 126
dimension x,y,z NGXF= 56 NGYF= 108 NGZF= 252
support grid NGXF= 56 NGYF= 108 NGZF= 252
This amounts to
=354 GB. I assume this would be total memory estimated (memory requirement for NGXF etc is much smaller) however, VASP prints
Code: Select all
--------------------------------------- Iteration 1( 1) ---------------------------------------
POTLOK: cpu time 0.0253: real time 0.0253
SETDIJ: cpu time 0.0053: real time 0.0053
total amount of memory used by VASP MPI-rank0 29394053. kBytes
=======================================================================
base : 30000. kBytes
nonl-proj : 323591. kBytes
fftplans : 2529. kBytes
grid : 4596. kBytes
one-center: 161. kBytes
wavefun : 29033176. kBytes
How should I understand the difference?
3- For the estimate you did, what you mean by "until the memory consumption equilibrated". Do you read the memory consumption from the system or from the VASP output? In most cases, I cannot see even the estimation from VASP. There is an estimation before SCF starts and just after it from VASP.
4- Can you comment on the effect of parallelzing this over many nodes? The memory consumption is due to wavefunction storage, Does this get reduced over many nodes? For example, if my estimation would be 3TB, Can I run this with 4 1TB nodes assuming KPAR =1? I think Kpar\neq 1 will bring extra scaling.
Best Regards,
Burak