This is a somehow cont'd discussion of the question here, as quoted below:
About the above comments given by Michael, I've the following puzzles:michael_wolloch wrote: ↑Thu Apr 11, 2024 7:03 am Dear Zhao,
this post has gotten a bit far from the original question for my taste. If you want to discuss benchmarking and the intricacies of process pinning, I would suggest making a new post in the "users for users" section.
You are mixing openMPI and intelMPI command line arguments here. Without going into detail, it is important to know where the processes end up. Use -genv I_MPI_DEBUG=4 for intelMPI and --report-bindings for OpenMPI to check.What confuses me is: why does -bind-to core not lead to a significant reduction in computational efficiency compared to -genv I_MPI_PIN 1 -genv I_MPI_PIN_DOMAIN core?
1. Intelmpi also has the -bind-to option as shown below:
Code: Select all
$ mpirun --version
Intel(R) MPI Library for Linux* OS, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
Copyright 2003-2023, Intel Corporation.
$ mpirun --help | grep -- -bind-to
-bind-to process binding
2. I debug the above two options with intelmpi for process pinning as follows:
Code: Select all
werner@X10DAi:~/Public/hpc/vasp/benchmark/amd/Cr72_3x3x3K_350eV_10DAV$ mpirun -genv I_MPI_DEBUG=4 -bind-to core -np 4 vasp_std
[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/2023.2.0/mpi/2021.10.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 41239 X10DAi {0,1,2,3,4,5,6,7,8,9,10,44,45,46,47,48,49,50,51,52,53,54}
[0] MPI startup(): 1 41240 X10DAi {11,12,13,14,15,16,17,18,19,20,21,55,56,57,58,59,60,61,62,63,64,65}
[0] MPI startup(): 2 41241 X10DAi {22,23,24,25,26,27,28,29,30,31,32,66,67,68,69,70,71,72,73,74,75,76}
[0] MPI startup(): 3 41242 X10DAi {33,34,35,36,37,38,39,40,41,42,43,77,78,79,80,81,82,83,84,85,86,87}
running 4 mpi-ranks, on 1 nodes
distrk: each k-point on 4 cores, 1 groups
distr: one band on 4 cores, 1 groups
vasp.6.4.2 20Jul23 (build Feb 29 2024 20:51:29) complex
werner@X10DAi:~/Public/hpc/vasp/benchmark/amd/Cr72_3x3x3K_350eV_10DAV$ mpirun -genv I_MPI_DEBUG=4 -genv I_MPI_PIN 1 -genv I_MPI_PIN_DOMAIN core -np 4 vasp_std
[0] MPI startup(): Intel(R) MPI Library, Version 2021.10 Build 20230619 (id: c2e19c2f3e)
[0] MPI startup(): Copyright (C) 2003-2023 Intel Corporation. All rights reserved.
[0] MPI startup(): library kind: release
[0] MPI startup(): libfabric version: 1.18.0-impi
[0] MPI startup(): libfabric provider: tcp
[0] MPI startup(): File "" not found
[0] MPI startup(): Load tuning file: "/opt/intel/oneapi/2023.2.0/mpi/2021.10.0/etc/tuning_skx_shm-ofi.dat"
[0] MPI startup(): Rank Pid Node name Pin cpu
[0] MPI startup(): 0 42573 X10DAi {0,44}
[0] MPI startup(): 1 42574 X10DAi {1,45}
[0] MPI startup(): 2 42575 X10DAi {22,66}
[0] MPI startup(): 3 42576 X10DAi {23,67}
running 4 mpi-ranks, on 1 nodes
distrk: each k-point on 4 cores, 1 groups
distr: one band on 4 cores, 1 groups
vasp.6.4.2 20Jul23 (build Feb 29 2024 20:51:29) complex
Regards,
Zhao