Hello,
Thanks for uploading the files.
I noticed that the POTCAR in your Interface directory does not match the POSCAR.
It contains:
Na, P, S, and S pseudopotentials (matching your first bulk system with the two sulfurs with different oxidation states.)
However, your POSCAR has:
Na, P, S, Si (this should probably be Na, P, S, S8, Si, to also match the ML_AB file you are using.
So I would append the Si POTCAR, and make sure that the POSCAR you are using for the interface is set up to also split the sulfur atoms with respect to oxidation state!
When I run your calculation with the input you gave me, it reports in the stdout:
Code: Select all
WARNING: type information on POSCAR and POTCAR are incompatible
POTCAR overwrites the type information in POSCAR
typ 4 type information: Si S
So this fix is important for getting the correct results, but not for the source of the crash, since even after fixing it, I run out of memory immediately with more than 16 MPI ranks on a node with 500 GB of RAM.
A couple of other notes:
-
ML_ISTART is deprecated and replaced by ML_MODE. You are using both, which is unnecessary and a bit confusing get rid of ML_ISTART!
-
ENCUT=500 seems a bit high when the highest ENMAX in your POTCAR is 258. 350 should be enough. Not a mistake, but probably inefficient.
-
Your smearing with SIGMA is very small. This can be bad for electronic convergence. I would increase this to 0.1 unless you have a specific reason it is so low, although the tests I was running converge reasonably fine with your setting.
-
You set ML_MCONF = 6000 in your INCAR file. This increases memory consumption and is part of your Problem. Without it the default would be the value from ML_AB + 1500 = 3703 + 1500 = 5203. Since the memory demand grows linearly with ML_MCONF, this should be avoided unless you have very specific intentions for increasing the parameter. Since you are running at the same temperature, it is doubtful that you will require so many additional configurations.
Now for the main issue, the unexpectedly large memory consumption:
Your ML_LOGFILE gives you an estimate for total memory consumption of about 10.5 GB, so it should fit comfortably on your machine with 700 GB RAM. However, this number is only a rough estimate, AND it is only for the master MPI rank (rank 0). Since VASP is parallelized mainly over MPI, a lot of the arrays are allocated separately for each MPI rank, and memory consumption scales with the number of MPI ranks.
This can be circumvented in part by compiling the code with shared memory support. In your case, this would be especially the
option, which reduces memory footprint for MLFF and GW calculations.
If you have not already, it also helps to use the _gam version of VASP, since you are only dealing with the Gamma point in your system.
I was able to run your system for several hundred steps without issues on 16 MPI ranks with the standard executable, using a maximum of 490 GB of RAM. Recompiling with
and using the gamma-only version, I was able to use 64 MPI ranks and only use 441 GB of RAM. So I am confident that you will be able to get your machine to run it easily with 48 MPI ranks once you enable shared memory support.
Let me know if this resolves your Problems,
cheers, Michael