Page 1 of 1

VASP 6.1.2 Test Suite MPI Errors

Posted: Sun Jan 17, 2021 12:52 am
by jglazar
Hi VASP forum,

I'm trying to install VASP 6.1.2 on my group's computing cluster, which is configured using Intel OpenMPI on Scientific Linux 7.7. When I run the tests, some of the results come back with a 'MPI_Comm_Rank' error. If I `make veryclean` before `make all,` there's fewer test errors and the `MPI_Comm_Rank` errors go away. However, I get `MPI_Bcast` errors instead.

Has anyone else run into these issues? Any ideas for workarounds? I noticed a lot of the GW calculations fail in case that's any help for troubleshooting. I've been working with the system administrators and they're a bit stuck at present. I'd appreciate any ideas!

Best,
James Glazar

Contents of folder in tests.tar.gz:
makefile.include -- my current build file based off the included makefile.include.linux_intel and my group's old VASP 5.4.4 makefile.include
tests_short.out -- truncated output of the `make test` test suite. It shows the `MPI_Bcast` error.
tests_mpi_comm.out -- truncated output of the `make test` test suite, without having done the `make veryclean` command beforehand. It shows the `MPI_Comm_Rank` error
tests_all.out -- a full output of the `make test_all` test suite, but from a build with a slightly different makefile.include. It shows the `MPI_Bcast` errors.

Re: VASP 6.1.2 Test Suite MPI Errors

Posted: Mon Jan 18, 2021 12:38 pm
by henrique_miranda
I don't have access to your configurations so it is difficult for me to reproduce this issue.
Looking briefly at your makefile.include I see that you are using mpif90. Is this intended? Or should you not be using mpiifort?
If I am not mistaken mpif90 links with the gnu fortran compiler instead of the intel one.
You can check this by doing:

Code: Select all

$ mpif90 -v
$ mpiifort -v

Re: VASP 6.1.2 Test Suite MPI Errors

Posted: Tue Jan 19, 2021 12:50 am
by jglazar
Hi Henrique,

Thank you for the quick reply! The system I'm using has mpif90 equivalent to mpiifort, though the `man mpif90` command does mention that mpif90 is deprecated and will disappear soon. `mpif90 -v` and `mpiifort -v` both yield `ifort version 17.0.3`.

Best,
James

Re: VASP 6.1.2 Test Suite MPI Errors

Posted: Mon Jan 25, 2021 3:37 pm
by henrique_miranda
Hi James,

I have a little bit more information regarding this issue.
We have also encountered issues related to 'MPI_Bcast' in some of our machines.
We have a couple of suggestions that might help to solve this issue:
1. You can try compiling your own openMPI version locally (preferably one listed here: https://www.vasp.at/wiki/index.php/Toolchains) and then compiling VASP using that version.
2. You can try compiling using a different implementation of MPI like mpich.
3. If you are compiling VASP 6.1.2 you can try changing the macro #define MPI_bcast_with_barrier in mpi.F
4. In VASP 6.2.0 (recently released) you can try changing the macro #define MPI_avoid_bcast to avoid 'MPI_Bcast' alltogether

Please let us know if any of these suggestions work for you as this information might be useful for other users :)

Re: VASP 6.1.2 Test Suite MPI Errors

Posted: Mon Feb 15, 2021 9:41 pm
by jglazar
Hi Henrique,

After working with the staff who maintain the computing cluster, I figured out that the errors stemmed from using OpenMPI 2.1.1. After compiling VASP 6.2.0 using OpenMPI 4.1.0, the errors went away!

Thanks for all your help. Next up, I'm going to try compiling the GPU version of VASP.

Best regards,
James

Re: VASP 6.1.2 Test Suite MPI Errors

Posted: Tue Feb 16, 2021 9:49 am
by henrique_miranda
Reading back your answer:
`mpif90 -v` and `mpiifort -v` both yield `ifort version 17.0.3`.
I see that this information that you posted might have been incomplete.
the `ifort version 17.0.3` means the version of the compiler used to compile openmpi but it does not say which version of openmpi you are using. I read `ifort version 17.0.3` and assumed you were using `intel mpi`.
The complete output of `mpif90 -v` and `mpiifort -v` should also have the version of openmpi.
OpenMPI 2.1.1 is just too old.

Anyway, I am glad you were able to figure it out :)