VASP Forum

Posted: **Fri Apr 03, 2026 8:36 am**

I was asked to post this issue here by the HPE Cray support team at our site (EPCC, Edinburgh).

We have just compiled VASP 6.6.0 for our small number of MI210 AMD GPU using the provided AMD GPU offload "makefile.include" supplied with VASP 6.6.0. At runtime, on a single node with 4 MI210, for a couple of benchmark cases we regularly use, we consistently see the calculations fail with the following error:

Code: Select all

ACC: libcrayacc/acc_present.c:762 CRAY_ACC_ERROR - Host region (7ffcb137c540 to 7ffcb20fc540) overlaps present region (7ffcae79c540 to 7ffcb1d9c540 index 237) but is not contained for 'cr(:)' from fft_base.f90:652

I have included the full error stack below.

Benchmarks available at:

https://github.com/aturner-epcc/2026-01 ... erformance

Modules loaded at compile/run time:

Code: Select all

libfabric/1.12.1.2.2.0.0
craype-network-ofi
perftools-base/25.09.0
xpmem/0.2.119-1.3_0_gnoinfo
cce/20.0.0
craype/2.7.35
cray-dsmml/0.3.1
cray-mpich/9.0.1
cray-libsci/25.09.0
PrgEnv-cray/8.6.0
rocm/6.3.4
craype-accel-amd-gfx90a
craype-x86-milan
cray-fftw/3.3.10.11

HPE Cray support say:

This indicates that you're trying to map a variable into device memory but it's already partially there. Both OpenMP and OpenACC disallow this. Off the top of my head I can think of a couple situations where this can occur.

The first, as noted above, is when you transfer something like X(10:20) and then try to map X(1:15). It's possible you can also see this if you have implicit maps for compute regions. I believe OpenMP addresses this and allows certain things. I believe we handle this but it's possible you're hitting a bug. Is the error occurring on a compute region or data region?

The second, is an issue with stack based arrays. If you don't unmap a stack based array before it goes out of scope then you end up with a bad present table entry. When the stack is reused it's possible to trigger this type of error.

If you generate the runtime debug output with CRAY_ACC_DEBUG=3, it will output when something is added to the present table. Since, the error says index 237, I would search backwards from the error for something like "add to present table index 237". That would allow you find which allocation it is conflicting with.
A simple example of this error might be:
Code: Select all
!$omp target enter data map(to: x(2:n-1))

!$omp target map(x) ! -> x(:) and mapped x(2:n-1) overlap -> error
x = x + 1
!$omp end target

Full error stack

Code: Select all

ACC: libcrayacc/acc_present.c:762 CRAY_ACC_ERROR - Host region (7ffd827192c0 to 7ffd902d32c0) overlaps present region (7ffd827192c0 to 7ffd85e07ac0 index 177) but is not contained for 'cr(:)' from fft_base.f90:666
ACC: libcrayacc/acc_present.c:762 CRAY_ACC_ERROR - Host region (7ffe83083e80 to 7ffe90c3de80) overlaps present region (7ffe83083e80 to 7ffe86772680 index 177) but is not contained for 'cr(:)' from fft_base.f90:666
ACC: libcrayacc/acc_present.c:762 CRAY_ACC_ERROR - Host region (7ffef14a0300 to 7ffeff05a300) overlaps present region (7ffef14a0300 to 7ffef4b8eb00 index 177) but is not contained for 'cr(:)' from fft_base.f90:666
ACC: libcrayacc/acc_present.c:762 CRAY_ACC_ERROR - Host region (7ffc271b0e00 to 7ffc34d6ae00) overlaps present region (7ffc271b0e00 to 7ffc2a89f600 index 177) but is not contained for 'cr(:)' from fft_base.f90:666
 running    4 mpi-ranks, with    1 threads/rank, on    1 nodes
 distrk:  each k-point on    4 cores,    1 groups
 distr:  one band on    4 cores,    1 groups
 Offloading initialized ...    4 GPUs detected
 RCCL MPI communication initialized ...
 vasp.6.6.0 06Mar2026 (build Mar 30 2026 15:06:08) gamma-only                    
 POSCAR found :  2 types and    1080 ions
 scaLAPACK will be used
 LDA part: xc-table for (Slater(with rela. corr.)+CA(PZ)), standard interpolation
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
srun: error: nid200004: tasks 0-3: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=13093507.0

Posted: **Fri Apr 03, 2026 9:29 am**

BTW, I am going to rerun with CRAY_ACC_DEBUG=3 set. I suspect this will produce a lot of output but I will see if I can parse the relevant parts and post here.

Posted: **Fri Apr 03, 2026 9:51 am**

Hi @aturner,

thank you for posting on our VASP forum. I just realized that this problem is similar to what is reported here: https://www.vasp.at/forum/viewtopic.php ... 314#p33314 (not the issue about the GPU pinning but the other part). It seems to be correlated with using only one OMP thread. Can you try using more than 1 thread per mpi rank? This is advised in any case since the ROCm runtime needs at least 1-2 extra CPU cores for fft and lapack helpers.

However, I tried your TiO2 input set and cannot exactly reproduce the problem:

Code: Select all

running    4 mpi-ranks, with    1 threads/rank, on    1 nodes
distrk:  each k-point on    4 cores,    1 groups
distr:  one band on    1 cores,    4 groups
Offloading initialized ...    4 GPUs detected
RCCL MPI communication initialized ...
vasp.6.6.0 06Mar2026 (build Mar 31 2026 06:46:23) gamma-only                    
POSCAR found :  2 types and    1080 ions
Reading from existing POTCAR
scaLAPACK will be used
Reading from existing POTCAR
WARNING: PSMAXN for non-local potential too small
LDA part: xc-table for (Slater(with rela. corr.)+CA(PZ)), standard interpolation
POSCAR, INCAR and KPOINTS ok, starting setup
FFT: planning ... GRIDC
FFT: planning ... GRID_SOFT
FFT: planning ... GRID
WAVECAR not read
entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
RMM:   1     0.103620767768E+06    0.10362E+06   -0.29874E+06  3456   0.196E+03
RMM:   2     0.493893299990E+05   -0.54231E+05   -0.68696E+05  3456   0.460E+02
RMM:   3     0.196602855985E+05   -0.29729E+05   -0.33369E+05  3456   0.224E+02
RMM:   4     0.103909436920E+05   -0.92693E+04   -0.14589E+05  3456   0.234E+02
RMM:   5     0.158606808037E+04   -0.88049E+04   -0.75296E+04  3456   0.148E+02
RMM:   6    -0.502639253360E+04   -0.66125E+04   -0.60374E+04  3456   0.167E+02
RMM:   7    -0.741555659943E+04   -0.23892E+04   -0.20228E+04  3456   0.101E+02
RMM:   8    -0.106669380187E+05   -0.32514E+04   -0.21123E+04  3456   0.944E+01
RMM:   9    -0.124433506827E+05   -0.17764E+04   -0.12896E+04  9780   0.566E+01
RMM:  10    -0.126482353939E+05   -0.20488E+03   -0.19790E+03 10200   0.170E+01
srun: error: x9000c1s0b0n0: task 0: Segmentation fault (core dumped)

But I am also greeted with a segfault in the end. So something seems to be corrupt in memory. My best guess is that this is not related to cr directly. cr just allocates a huge amount of memory and probably the corrupted memory is requested and the runtime trips over it. I will try to figure out why I see a segfault. can you try the following:

1) use more than one OMP thread per mpi rank
2) use the std version instead of the gamma only one

Right now I see still some performance issues with the gamma only version. We should document this better.

Best,
Alex

Posted: **Fri Apr 17, 2026 10:00 am**

Hi Alex,

Thanks for looking into this. I have confirmed that increasing the OMP threads allows the calcualtion to run. I also see the segfault at the end that you do on our system.

Posted: **Tue May 05, 2026 6:48 pm**

Hi @aturner,

Good news: I fixed the segfault, both for one thread and multiple threads, and also GAM only version works now. Bad news: I think it will not be easy to share a fix for this as a patch.

I fixed many things around our RCCL / NCCL interface and some other small things that could lead to memory / undefined behavior and now the segfault is gone for me. These changes will be part of the next release and I will update you here as soon as there are news. Unfortunately, the changes are spread across the code so I cannot make an easy patch fix here. Thank you again for reporting!

If you encounter the segfault more often please reach out to us via the cray hpe team and maybe we can hand out a pre-release fix version for testing in this case. I would be curious if I fixed all problems.

Best regards,
Alex

VASP Forum

VASP 6.6.0 AMD GPU offload: CRAY_ACC_ERROR - Host region overlaps present region

VASP 6.6.0 AMD GPU offload: CRAY_ACC_ERROR - Host region overlaps present region

Re: VASP 6.6.0 AMD GPU offload: CRAY_ACC_ERROR - Host region overlaps present region

Re: VASP 6.6.0 AMD GPU offload: CRAY_ACC_ERROR - Host region overlaps present region

Re: VASP 6.6.0 AMD GPU offload: CRAY_ACC_ERROR - Host region overlaps present region

Re: VASP 6.6.0 AMD GPU offload: CRAY_ACC_ERROR - Host region overlaps present region