Page 1 of 1

Use more than 1 node with GPUs

Posted: Thu Feb 25, 2021 3:35 pm
by david_keller
Is the use of more than one node with GPU(s) supported?

When I try a run with two nodes, each having a GPU I get:

Code: Select all

" Data for JOB [41425,1] offset 0 Total slots allocated 40

 Mapper requested: NULL  Last mapper: round_robin  Mapping policy: BYCORE:NOOVERSUBSCRIBE  Ranking policy: SLOT
 Binding policy: CORE:IF-SUPPORTED  Cpu set: NULL  PPR: NULL  Cpus-per-rank: 0
        Num new daemons: 0      New daemon starting vpid INVALID
        Num nodes: 1

 Data for node: gpu001  State: 3        Flags: 11
        Daemon: [[41425,0],0]   Daemon launched: True
        Num slots: 40   Slots in use: 2 Oversubscribed: FALSE
        Num slots allocated: 40 Max slots: 0
        Num procs: 2    Next node_rank: 2
        Data for proc: [[41425,1],0]
                Pid: 0  Local rank: 0   Node rank: 0    App rank: 0
                State: INITIALIZED      App_context: 0
                Locale:  [B/././././././././././././././././././.][./././././././././././././././././././.]
                Binding: [B/././././././././././././././././././.][./././././././././././././././././././.]
        Data for proc: [[41425,1],1]
                Pid: 0  Local rank: 1   Node rank: 1    App rank: 1
                State: INITIALIZED      App_context: 0
                Locale:  [./B/./././././././././././././././././.][./././././././././././././././././././.]
                Binding: [./B/./././././././././././././././././.][./././././././././././././././././././.]
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
  O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    2 mpi-ranks, with    1 threads/rank
 distrk:  each k-point on    2 cores,    1 groups
 distr:  one band on    1 cores,    2 groups
 OpenACC runtime initialized ...    1 GPUs detected
 -----------------------------------------------------------------------------
|                                                                             |
|     EEEEEEE  RRRRRR   RRRRRR   OOOOOOO  RRRRRR      ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     EEEEE    RRRRRR   RRRRRR   O     O  RRRRRR       #       #       #      |
|     E        R   R    R   R    O     O  R   R                               |
|     E        R    R   R    R   O     O  R    R      ###     ###     ###     |
|     EEEEEEE  R     R  R     R  OOOOOOO  R     R     ###     ###     ###     |
|                                                                             |
|     M_init_nccl: Error in ncclCommInitRank                                  |
|                                                                             |
|       ---->  I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <----       |"

|                                                                             |

Re: Use more than 1 node with GPUs

Posted: Tue Mar 02, 2021 8:35 am
by merzuk.kaltak
Dear David,

The OpenACC port allows for MPI+GPU usage, but is restricted to 1 MPI rank per 1 GPU.
2 MPI ranks with 1 GPU (as in your case) is not supported.

Re: Use more than 1 node with GPUs

Posted: Tue Mar 02, 2021 7:20 pm
by david_keller
But I was asking if you can have mulitple nodes each with one GPU and 1CPU working on a VASP 6.2.0 run?

Re: Use more than 1 node with GPUs

Posted: Thu Mar 04, 2021 1:04 pm
by merzuk.kaltak
This should work, if the MPI ranks are distributed on the nodes such that the number of MPI ranks on one node is the same as the number of detected GPUs.