Page 1 of 1
MLFF memory allocation issue when refit
Posted: Mon Nov 24, 2025 11:05 am
by hao_tang1
Hello,
I've got the following message: "ERROR in BLEA_MB: Allocation of helping array for design matrix (FMAT_TRANS_HELP) did not work."
when I tried to refit a ML_AB for a large system with ML_MCONF=3000 and ML_MB=5540.
The Total memory consumption given by ML_LOGFILE is 26615.9.
I have submitted the job with 80 cores on 20 nodes of a HPC with 196 Go RAM each.
Thanks
files.tgz
Re: MLFF memory allocation issue when refit
Posted: Mon Nov 24, 2025 4:37 pm
by ferenc_karsai
Did you compile with -DscaLAPACK and -Duse_shmem?
If no, please do so, each of them will bring down the memory requirement significantly.
If you compiled with both, then please share all relevant files (ML_AB, INCAR, POSCAR, POTCAR) according to the forum guidelines.
Re: MLFF memory allocation issue when refit
Posted: Mon Nov 24, 2025 5:07 pm
by hao_tang1
Thank you Ferenc for your quick answer.
Yes I have compiled with both -DscaLAPACK and -Duse_shmem.
Because of the size (about 120 Mb), the following files: ML_AB, INCAR, POSCAR, POTCAR as well as the makefile.include I have used could be downloaded at https://transfert.free.fr/7sBHsld
Many thanks
Re: MLFF memory allocation issue when refit
Posted: Thu Nov 27, 2025 11:28 am
by andreas.singraber
Hello!
Thank you for providing the requested files. We had a closer look and came to the conclusion that the memory estimation looks reasonable. Compared to common use cases the memory demand is rather high because there are many atoms in your configurations (up to 637). So the calculated memory consumption for the design matrix (~ 20 GB per core) makes sense. Hence, the calculation should in principle fit into your machines (4 MPI ranks per node ~ 80 GB per node for the design matrix, roughly 108 GB if everything else is also included). There are two possible explanations we can think of
-
Maybe the distribution of MPI ranks is incorrect: for example, if every node has more than 4 cores and they got filled up with ranks, then the memory limit would be exceeded (some nodes stay empty in this case).
-
Another explanation would be additional users or programs running on the same nodes which also require substantial amounts of memory.
Could you please exclude any of these options? Thank you very much!
All the best,
Andreas Singraber
Re: MLFF memory allocation issue when refit
Posted: Mon Dec 01, 2025 11:30 am
by hao_tang1
Hello Andreas,
Thank you very much.
I've carefully checked the MPI ranks distribution. On each node, there are only 4 tasks.
In addition, these nodes are exclusively reserved, thus there is no additional users.
You can find in attached the slurm script I have used for job submission as well as the corresponding slurm.out file.
slurm-files.tgz
Many thanks
Hao T.