BSE in supercells
Posted: Wed Jan 29, 2025 11:05 am
by kaynat_kalvi
Hi, I have a cubic system having 216 same atoms, I want to calculate the dielectric function for it using DFT+mBSE. I am keeping low k-points, to reduce memory and cal time and using 2 nodes, having 128 cpus per node. As BSE doesn't support KPAR, I want to optimize me calculation and needs suggestions for that. Please guide me through this.
Re: BSE in supercells
Posted: Wed Jan 29, 2025 11:39 am
by alexey.tal
There is a number of things that one can do to optimize the performance of the BSE calculation for large cells. We have also optimized out BSE algorithm in VASP 6.5.0. Do you have access to VASP 6.5?
Could you please provide the input files for you system?
Re: BSE in supercells
Posted: Tue Feb 04, 2025 1:53 pm
by alexey.tal
Below I will list the recommendations for optimizing the BSE performance for large supercells in VASP 6.5.
There are two steps in the BSE calculation: setup of the Hamiltonian and the diagonalization.
- The most efficient way to compute the spectrum is the Lanczos iterative algorithm (IBSE=3).
- If VASP is compiled with the flag
, the Hamiltonian is stored and solved in single precision, which usually provides sufficient accuracy but requires half of the memory and compute time.
- It is possible to use OMEGAMAX in BSE to exclude the transitions beyond a given energy range, thus minimizing the rank of the Hamiltonian.
Setting up the matrix:
- The calculation of the BSE Hamiltonian can scale up to
CPU cores after that the computational load cannot be distributed evenly. However, in the case of large supercells, the number of k-points might be quite small. For example, if we have the k-mesh of 2x2x2, the maximum number of cores we can distributed this calculation efficiently is 36. Thus, it is quite important to use flags NBSEBLOCKO and/or NBSEBLOCKV to divide the bands into groups that can be calculated in parallel. For example, if the blocking factor divides the occupied bands into two blocks, the computation can be evenly divided over NKPTS*2*(NKPTS*2+1)/2
or 136 CPU cores. We do not recommend blocking the bands if the number of CPU cores does not exceed NKPTS*(NKPTS+1)/2
- KPAR can and should be used with IBSE=1,2, or 3. The best efficiency can be achieved when
KPAR=number of MPI ranks
, i.e., store a copy of all orbitals on every MPI rank. This however dramatically increases the memory requirements for large supercells and KPAR often has to be limited to a small value or 1.
- It is often sufficient to use lower accuracy for the PRECFOCK, but can significantly reduce compute time for the FFTs, which are the most demanding part of the calculations with large cells.
Furthremore, our BSE code is ported to the Nvidia GPUs, so the BSE calculations for large cells can be run very efficiently on GPUs. The Lanczos algorithm currently doesn't support the GPU offloading, but the time-evolution BSE (IBSE=1) and the exact diagonalization algorithm (IBSE=2) can be fully run on GPUs. Furthermore, the time-evolution algorithm is somewhat slower than the Lanczos algorithm but it nevertheless outperforms the exact diagonalization for large BSE matrices.
Let me know if something is unclear or if you have further questions on BSE.
Re: BSE in supercells
Posted: Mon Feb 10, 2025 10:16 am
by kaynat_kalvi
Currently, I have access to vasp5. could the DFT+mBSE supercell calculation be done on vasp5 for 217tom supercell but only the 2eV regime, like keeping OMEGAMAX=2or 1.5, ?
Right now, I don't have access to GPUs as well.
Do you have suggestions for running on vasp5 and 128 cores CPUs.
Re: BSE in supercells
Posted: Mon Feb 10, 2025 11:36 am
by kaynat_kalvi
When I try using DFT+mBSE in 8 atom supercell, using NCORE =4, the error file says that the LPEAD doesn't support NCORE=! 1, does that mean that these cal doesn't support parallelization and I can't use NPAR, NCORE, KPAR in here?
I think DFT+mBSE is more suited for my system instead of GW+BSE, as GW for 216 atom supercell would be very costly, please comment on this as well.
My last confusion is about two different methods of mBSE publised by VASP, one ( ... c_function), and the 2nd one ( ... lculations). The first one works well for a unit cell so I was using that. But when I read the 2nd one there are many tags which are not present in first example of Si. but I feel the example of improving dielectric functions of Si works is the needed version for all calculations? is that right?
Re: BSE in supercells
Posted: Tue Feb 11, 2025 8:39 am
by alexey.tal
Currently, I have access to vasp5. could the DFT+mBSE supercell calculation be done on vasp5 for 217tom supercell but only the 2eV regime, like keeping OMEGAMAX=2or 1.5, ?
I would recommend using VASP 6.5.0 where the performance of the BSE driver has been largely optimized. In VASP 5, the parallelization over bands is less efficient and tags NBSEBLOCKO and NBSEBLOCKV cannot be used. Furthermore, the most efficient diagonalization algorithm IBSE=3 is not available in VASP 5.
You can use OMEGAMAX, but keep in mind that OMEGMAX excludes transitions beyond a given range, you need to check the convergence with respect to this parameters. It is likely that setting OMEGAMAX=2 would not be enough to get a converged spectrum up to 2 eV and one needs to increase OMEGAMAX further beyond 2 eV.
When I try using DFT+mBSE in 8 atom supercell, using NCORE =4, the error file says that the LPEAD doesn't support NCORE=! 1, does that mean that these cal doesn't support parallelization and I can't use NPAR, NCORE, KPAR in here?
In VASP 5, the BSE algorithm uses parallelization over k-points or over bands depends on the supercell size. The tags NPAR, NCORE and KPAR should not be used and they don't affect the parallelization in BSE.
I think DFT+mBSE is more suited for my system instead of GW+BSE, as GW for 216 atom supercell would be very costly, please comment on this as well.
The most reliable and accurate approach is GW+BSE and mBSE can fail for some cases, for example anisotrpoic systems. However, the GW calculation can be quite costly for large supercells. You can try to use our low-scaling GW algorithm which should be much faster for large cells than the standard one. Also, only available as of VASP 6.
My last confusion is about two different methods of mBSE publised by VASP, one ( ... c_function), and the 2nd one ( ... lculations). The first one works well for a unit cell so I was using that. But when I read the 2nd one there are many tags which are not present in first example of Si. but I feel the example of improving dielectric functions of Si works is the needed version for all calculations? is that right?
In the second page we list some of the tags which are set by default in the first example, but if you have a specific tag that you don't understand how to use, I can explain it in more detail.