VASP 6.3 ACC OMP problem

questions related to VASP with GPU support (vasp.5.4.1, version released in Feb 2016)

Moderators: Global Moderator, Moderator

Locked
Message
Author
Dankomaister
Newbie
Newbie
Posts: 36
Joined: Sat Feb 13, 2016 4:39 pm
License Nr.: 20-0400 5-1605

VASP 6.3 ACC OMP problem

#1 Post by Dankomaister » Thu Feb 03, 2022 7:50 am

Hi,

I have compiled VASP 6.3 using the NVHPC compilers and this makefile

Code: Select all

makefile.include.nvhpc_ompi_mkl_omp_acc
It runs fine if I disable openmp

Code: Select all

export OMP_NUM_THREADS=1
but with openmp for example

Code: Select all

export OMP_NUM_THREADS=2
VASP freezes after "entering main loop".

I have verified this problem on two different machines with different versions of NVHPC (21.11 and 22.1)
Compiling with the makefile I used for VASP 6.2.1 also results in the same problem (it worked fine for VASP 6.2.1)

This seems to be a bug introduced in VASP 6.3, can you confirm this?

/Daniel

alexey.tal
Global Moderator
Global Moderator
Posts: 228
Joined: Mon Sep 13, 2021 12:45 pm

Re: VASP 6.3 ACC OMP problem

#2 Post by alexey.tal » Mon Feb 07, 2022 10:23 am

Hi,

Could you please provide the versions of the libraries and the input files, so that we can try to reproduce this issue.

Best regards,
Alexey

Dankomaister
Newbie
Newbie
Posts: 36
Joined: Sat Feb 13, 2016 4:39 pm
License Nr.: 20-0400 5-1605

Re: VASP 6.3 ACC OMP problem

#3 Post by Dankomaister » Wed Feb 09, 2022 12:41 am

Sure!
The VASP input seems to not matter since I get this problem on all the systems I have tested.
But I will attached the input files for one of them.
Regarding the versions of the libraries used these are:

CUDA/11.4.3
NVHPC/22.1
imkl/2021.4.0
OpenMPI/4.1.1
PMIx/4.1.0
UCX/1.11.2
UCX-CUDA/1.11.2

all installed using EasyBuild

/Daniel
You do not have the required permissions to view the files attached to this post.

Dankomaister
Newbie
Newbie
Posts: 36
Joined: Sat Feb 13, 2016 4:39 pm
License Nr.: 20-0400 5-1605

Re: VASP 6.3 ACC OMP problem

#4 Post by Dankomaister » Tue Feb 15, 2022 12:13 am

Any update on fixing this?

/Daniel

alexey.tal
Global Moderator
Global Moderator
Posts: 228
Joined: Mon Sep 13, 2021 12:45 pm

Re: VASP 6.3 ACC OMP problem

#5 Post by alexey.tal » Tue Feb 15, 2022 2:27 pm

Hi Daniel,

Thank you for sending the files and the library versions.

I tested your calculation on our machines and I wasn't able to reproduce this issue. However, there is a somewhat similar report on the forum. People from Nvidia confirmed that on their machines VASP 6.3 does run with OpenMP threads without any problems.

These type of issues are likely specific to the environment of your computer and aren't bugs. Unfortunately, it is very hard to tell what can be a potential problem in such cases. I would suggest that you check with the administrators of your computer if your job is correctly set up for GPUs and all the modules and variables are correctly loaded.

Dankomaister
Newbie
Newbie
Posts: 36
Joined: Sat Feb 13, 2016 4:39 pm
License Nr.: 20-0400 5-1605

Re: VASP 6.3 ACC OMP problem

#6 Post by Dankomaister » Wed Feb 16, 2022 1:20 am

Hi Alexey,

I saw that post on the forum yesterday which I also believe is the same problem as I have.
I also noticed that the user compiled VASP using compilers/libraries in an EasyBuild environment.

When you test this on your machine do you also use EasyBuild?
If not then I would suggest that you setup an EasyBuild environment and try to reproduce my results.
Since I am the administrator of our HPC I can provide the necessary files/instructions.
It is anyway a good idea for you guys to have an EasyBuild environment available.
Since EasyBuild is one of the most common (and practical) ways of reliably reproducing installations of scientific software on HPC clusters.

Regarding your suggestions about checking that our HPC is setup for GPUs, since I am the administrator of or HPC I do believe it is setup correctly :) and anyhow I tested on a different HPC cluster and was able to reproduce my problem. As I also mentioned VASP 6.2.1 works fine using the same makefile, compilers and libraries. So this is most likely a bug that was introduced in VASP 6.3.

/Daniel

alexey.tal
Global Moderator
Global Moderator
Posts: 228
Joined: Mon Sep 13, 2021 12:45 pm

Re: VASP 6.3 ACC OMP problem

#7 Post by alexey.tal » Thu Feb 17, 2022 8:33 am

I asked Martijn Marsman about this issue and he told me that he has also encountered this problem with NVHPC SDK 21.11. I see that you have tried 21.11 or newer 22.1, and as it is quite likely that this problem was introduced in 21.11, it hasn't been fixed yet in 22.1.
I tested your job with 21.2 and it worked fine.

Could you please check if you can reproduce this issue with an older version of NVHPC SDK?

Dankomaister
Newbie
Newbie
Posts: 36
Joined: Sat Feb 13, 2016 4:39 pm
License Nr.: 20-0400 5-1605

Re: VASP 6.3 ACC OMP problem

#8 Post by Dankomaister » Thu Feb 17, 2022 12:10 pm

Okay sounds promising!
I will try the different version of the NVIDIA HPC SDK and see what the results are!

/Daniel

Dankomaister
Newbie
Newbie
Posts: 36
Joined: Sat Feb 13, 2016 4:39 pm
License Nr.: 20-0400 5-1605

Re: VASP 6.3 ACC OMP problem

#9 Post by Dankomaister » Wed Feb 23, 2022 6:32 am

Okay! I have tested to compile VASP 6.3 with NVHPC version 22.2 but the problem still persists.
However version 21.2 is working! perhaps is a good idea to mention this in the wiki?

/Daniel

Locked