GW0: problems with CALCULATE_XI_REAL and memory insufficiency

Message

pascal_boulet1 · #1 Post by **pascal_boulet1** » Thu Apr 18, 2024 12:57 pm

Dear all,

I am using VASP 6.4.2, and I am trying to calculate the dielectric function using GW0. I don’t think my system is too big for this kind of calculation: 79 occupied orbitals and 10 k-points (Gamma centered 4x4x1 grid).

I have diagonalized exactly the Hamiltonian with 128 bands. Now I am trying to calculate the dielectric function. I am following the tutorial on Si.

Actually, I am facing two problems: one is related to the message “CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.” the other is insufficient memory.

The INCAR is the following:
ALGO = EVGW0R
LREAL = auto
LOPTICS = .TRUE.
LSPECTRAL = .TRUE.
LPEAD = .TRUE.
NOMEGA = 12
NBANDS = 128
NELMGW = 4
ISMEAR = 0
SIGMA = 0.05
EDIFF = 1e-8
KPAR = 8

With this input I get “CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.” and the job stops. Although I understand the message, I am not sure to which keyword it is related. Could you please help me with this?

Now, if I switch to ALGO = EVGW0, the crash disappears. However, with ALGO = EVGW0, I am now having the problem of the “shortage” of memory.

I am using a HPC supercomputer with 8 nodes, 2 processors/node, 64 cores/processors and 256 Go RAM / node. So, it is 2 Go/core, and I guess I should have over 2 To RAM for the job.
I am using MPI+OpenMP: 256 MPI + 4 threads per rank. But using only MPI leads to the same end.

In the OUTCAR I have the following information:
running 256 mpi-ranks, with 4 threads/rank, on 8 nodes
distrk: each k-point on 32 cores, 8 groups
distr: one band on NCORE= 1 cores, 32 groups

total amount of memory used by VASP MPI-rank0 72479. kBytes
available memory per node: 6.58 GB, setting MAXMEM to 6742
...
OPTICS: cpu time 18.2553: real time 4.9331
...
files read and symmetry switched off, memory is now:
total amount of memory used by VASP MPI-rank0 116900. kBytes
...
| This job will probably crash, due to insufficient memory available. |
| Available memory per mpi rank: 6742 MB, required memory: 6841 MB. |

min. memory requirement per mpi rank 6841.5 MB, per node 218927.6 MB

Nothing more.

Note, the command “cat /proc/meminfo | grep MemAvailable” gives “252465220 kB”, But in the log file I get the information:
available memory per node: 6.58 GB, setting MAXMEM to 6742

The figures look contradictory to me between what I have in the OUTCAR or the log file, and what I get from meminfo.

Is there something wrong in my setting or something I misunderstand regarding the memory management?

Thank you for your help and time,
Pascal

#2 Post by **michael_wolloch** » Thu Apr 18, 2024 1:53 pm

Hi Pascal,

please provide a minimal reproducible example of your problem when you post on the forum.

From what I can see from your post, you are having trouble with KPAR.
You set

Code: Select all

KPAR = 8

and the error you receive warns you that KPAR>1 is not implemented:

Code: Select all

CALCULATE_XI_REAL: KPAR>1 not implemented, sorry.

The solution is to set KPAR = 1, which is also the default. However, the cubic scaling GW routines do require more memory than the old GW routines, so while the error you get will disappear, the memory problem may persist. In that case, you should switch to the old GW routines, but still get rid of KPAR, or at least set it to a lower value (4 or 2).

If you use KPAR, memory requirements increase. In your case, you set KPAR=8, so e.g. 16 k points will be split into 8 groups of cores, that work on 2 k points each. However, all those groups work on all orbitals and have to keep a copy of all orbitals in memory. So your memory requirements will increase by roughly a factor of KPAR! By setting KPAR=8, you effectively negate the additional memory you get from increasing the number of compute nodes, because you have to store 8 times as much stuff.

If your problem is memory per core, you can increase it by decreasing the number of cores you use. E.g. fill every node with only 64 mpi ranks, but make sure that they are evenly distributed over both sockets. Now you have 4 GB/c instead of 2. This will also increase memory bandwidth per core, which is often a bottleneck for VASP on very high core count machines.

Make sure to read up on GW calculations here!

Let me know if this helps,
Michael

My Community

GW0: problems with CALCULATE_XI_REAL and memory insufficiency

GW0: problems with CALCULATE_XI_REAL and memory insufficiency

Re: GW0: problems with CALCULATE_XI_REAL and memory insufficiency