Trouble linking libbeef with CRAYHIP port of VASP 6.6.0 (on AMD Instinct MI300A GPUs)
Dear VASP devs and Admins,
I have been building the CRAYHIP port of VASP 6.6.0 for AMD Instinct MI300A GPUs using the Cray compilers (CCE 19.0.0) on a Cray EX4000 machine. As part of my build, I'd like to have support for the BEEF functional and have been following the instructions at https://www.vasp.at/wiki/Makefile.inclu ... (optional) to build libbeef and link VASP against it. However, I'm running into weird behavior of when attempting to link VASP against libbeef.
I built libbeef using the same Cray compiler version (19.0.0), Cray MPICH version, and ROCm version as I used to build VASP (modules cray-mpich, rocm/6.3.0, cray-hdf5/1.14.3.5, craype-accel-amd-gfx942, and cray-fftw/3.3.10.10). The libbeef package was configured using only a prefix:
Code: Select all
# Configure build.
./configure --prefix="$PREFIX"
# Build.
make
# Install to prefix.
make install
without the --enable-optmax flag to minimize compatibility issues between the login node and GPU node CPUs, and the build succeeded.
My VASP makefile.include looks like this:
Code: Select all
# Precompiler options
CPP_OPTIONS = -DHOST=\"LinuxFTN\" \
-DMPI -DMPI_BLOCK=8000 -Duse_collective \
-DscaLAPACK \
-DCACHE_SIZE=4000 \
-Davoidalloc \
-DMPI_INPLACE \
-Dvasp6 \
-Dtbdyn \
-Dfock_dblbuf
# activate OpenMP and gpu offloading
CPP_OPTIONS += -D_OPENMP \
-DOMP_OFFLOAD \
-DCRAYHIP
CPP = cpp --traditional -E -P -Wno-endif-labels $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)
FC = ftn -hnoacc -homp
FCL = $(FC)
FREE = -ffree -N 1023
FFLAGS = -dC -rmo -emEb
# lower the ipa level for inlining to 0 to avoid compiler problems
FFLAGS += -hipa0
# suppress warnings
FFLAGS += -m 4
# O2 recommended for optimal GPU performance, O1 significantly slower in certain
# GPU kernels
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0
# fine grain control over lapack, by default ftn will link libsci with the
# appropriate configuration
# LAPACK = -L${CRAY_LIBSCI_PREFIX_DIR}/lib -lsci_cray_mpi
# LLIBS = $(LAPACK)
# FFTW_ROOT ?= /opt/cray/pe/fftw/3.3.8.11/x86_rome
LLIBS += -L$(FFTW_ROOT)/lib -lfftw3 -lfftw3_omp
INCS = -I$(FFTW_ROOT)/include
# HIP
CLANG = cc
# ROCM_PATH ?= /opt/rocm
HIPCC ?= ${ROCM_PATH}/bin/hipcc
ROCM_INCS = -I${ROCM_PATH}/include -I${ROCM_PATH}/include/hip -I${ROCM_PATH}/include/rocblas -I${ROCM_PATH}/include/rocsolver -I${ROCM_PATH}/include/rocfft
ROCM_LIBS = -L${ROCM_PATH}/hip/lib -lamdhip64 \
-L${ROCM_PATH}/lib -lrocblas -lrocfft -lrocsolver -lcraymp
# using RCCL aka NCCL for direct multi-GPU communication, recommended for best
# performance
CPP_OPTIONS += -DUSENCCL
ROCM_LIBS += -lrccl
LLIBS += $(ROCM_LIBS)
LIBS += HIP
LLIBS += -LHIP -lHipInterface
#
# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = cc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)
OBJECTS_LIB= linpack_double.o getshmem.o
# For the parser library
CXX_PARS = CC
LLIBS += -lstdc++
# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
# HDF5-support (optional but strongly recommended, and mandatory for some
# features)
CPP_OPTIONS+= -DVASP_HDF5
# HDF5_ROOT ?= /path/to/your/hdf5/installation
LLIBS += -L$(HDF5_ROOT)/lib -lhdf5_fortran
INCS += -I$(HDF5_ROOT)/include
# Link against source distribution of fftlib because we are not using Intel MKL
# for FFTs. See
# https://www.vasp.at/wiki/Makefile.include#fftlib_(recommended_when_using_OpenMP).
CPP_OPTIONS+= -Dsysv
FCL += fftlib.o
CXX_FFTLIB = $(CC_LIB) -fopenmp -std=c++11 -DFFTLIB_THREADSAFE
INCS_FFTLIB = -I./include -I$(FFTW_ROOT)/include
LIBS += fftlib
LLIBS += -ldl
# Enable the VASP-2-Wannier90 interface and link against the Wannier90 library.
# Note that my Wannier90 module (wannier90/3.1.0/gpu-cray-offload-cce19) defines
# the WANNIER90_ROOT variable.
CPP_OPTIONS += -DVASP2WANNIER90
LLIBS += -L$(WANNIER90_ROOT)/lib -lwannier
# Enable support for BEEF XC functionals. Note that my custom
# libbeef module already defines $LIBBEEF_ROOT.
CPP_OPTIONS += -Dlibbeef
# LIBBEEF_ROOT ?= /path/to/your/libbeef/installation
LLIBS += -L$(LIBBEEF_ROOT)/lib -lbeef
# Get major version of crayftn
CRAYFTNVER=$(shell crayftn --version 2>/dev/null | grep "Version" | sed -n 's/.*Version \([0-9]\+\)\..*/\1/p')
CPP_OPTIONS += -D__DCRAYFTN_VERSION=$(CRAYFTNVER)
### special cray workarounds cce v19.0.0, remove for cce20
# error Unsupported OpenMP construct Calls -- _cray_dv_broadcast : W_G%CPTWFP=0
OBJECTS_O2 += rot.o
# fexcg has to be higher optimization level for kernel not too spill
OBJECTS_O2 += fexcg.o mbj.o ldalib.o ggalib.o mggalib.o
# error: unexpected type in TYPE_DEREF l818 (copyin_wavefun1_array)
OBJECTS_O1 += openmp.o
# error: unexpected type in TYPE_DEREF l724 (twoelectron4o_acc)
OBJECTS_O1 += twoelectron4o.o
# error: unexpected type in TYPE_DEREF l377 (calculate_local_field_fock)
OBJECTS_O1 += local_field.o
# for the next problem we use OBJECTS_O3 to remove omp
FFLAGS_3 += -hnoomp
# error: Found inner_ref/inner_def object without Fortran internal procedure l5515
OBJECTS_O3 += bse.o
# error: Found inner_ref/inner_def object without Fortran internal procedure l1644
OBJECTS_O3 += GG_base.o
# MLFF problems with ISTART=2
OBJECTS_O1 += ml_ff_math.o ml_ff_ff2.o
#################
During the VASP build, all object files compile successfully, but the linker reports not being able to find three functions from libbeef:
Code: Select all
ftn -hnoacc -homp fftlib.o -o vasp c2f_interface.o simd.o base.o string.o vhdf5_struct.o tutor.o version.o build_info.o command_line.o openmp_struct.o openacc_struct.o offload_struct.o debug_struct.o debug.o mpi.o vhdf5_base.o incar_reader.o reader_base.o mpi_shmem.o main_mpi.o license.o mathtools.o profiling.o findiff_struct.o bse_struct.o mgrid_struct.o pot_struct.o setex_struct.o hamil_struct.o radial_struct.o pseudo_struct.o wave_struct.o nl_struct.o mkpoints_struct.o bandgap_struct.o poscar_struct.o esf_struct.o afqmc_struct.o minimax_struct.o locproj_struct.o msdgw_struct.o fock_glb.o chi_glb.o smart_allocate.o xml.o ini.o constant.o ml_ff_c2f_interface.o ml_ff_prec.o ml_ff_string.o ml_ff_tutor.o ml_ff_constant.o ml_ff_mpi_help.o ml_ff_neighbor.o ml_ff_taglist.o ml_ff_struct.o ml_ff_mpi_shmem.o vdwforcefield_glb.o jacobi.o scala_struct.o scala.o nvcuda.o crayhip.o intelmkl.o openmp.o openacc.o offload.o scalapack_wrappers.o blas_wrappers.o lapack_wrappers.o asa.o lattice.o plugins.o poscar.o fft_comm.o fftw.o fft_wrappers.o fft_base.o mgrid.o ml_asa2.o ml_ff_mpi.o ml_ff_helper.o ml_ff_logfile.o ml_ff_math.o ml_ff_neighbor_help.o ml_ff_emp_pot.o ml_ff_iohandle.o ml_ff_memory.o ml_ff_abinitio.o ml_ff_ff2.o ml_ff_ff3.o ml_ff_ff.o ml_ff_mlff.o vaspml.o ldalib.o wpbe.o ggalib.o mbj.o mggalib.o vdw_nl.o xc_driver.o setex.o pseudo.o radial.o gridq.o coulomb_cutoff.o ebs.o symlib.o gauss_quad.o m_unirnk.o mkpoints.o random.o wave.o wave_mpi.o wave_high.o bext.o spinsym.o dipol_struct.o symmetry.o lattlib.o nonl.o nonlr.o nonl_high.o dfast.o choleski2.o mix.o hamil.o constrmag.o cl_shift.o relativistic.o LDApU.o paw_base.o tau_mu.o fexcg.o egrad.o pawsym.o pawfock.o pawlhf.o diis.o rhfatm.o hyperfine.o fock_ace.o mkpoints_full.o charge.o us.o extpot.o paw.o Lebedev-Laikov.o stockholder.o pot_electrostat.o dipol.o solvation.o scpc.o fermi_energy.o tet.o dos.o elf.o hamil_rot.o chain.o dyna.o vhdf5.o checkpointing.o fileio.o bandgap_tools.o pot.o sphpro.o core_rel.o aedens.o wavpre.o wavpre_noio.o broyden.o dynbr.o reader.o writer.o xml_writer.o brent.o stufak.o opergrid.o stepver.o fast_aug.o fock_multipole.o fock.o fock_dbl.o fock_frc.o supercell.o mkpoints_change.o subrot_cluster.o sym_grad.o mymath.o npt_dynamics.o subdftd3.o subdft_sd3_d4.o libmbd.o internals.o dynconstr.o dimer_heyden.o dvvtrajectory.o vdwforcefield.o nmr.o pead.o k-proj.o subrot.o subrot_scf.o paircorrection.o rpa_force.o ml_reader.o ml_interface_writer.o ml_interface.o coulomb_cutoff_gradients.o force.o pwlhf.o gw_model.o optreal.o steep.o rmm-diis.o davidson.o david_full.o david_inner.o root_find.o lcao_bare.o locproj.o electron_common.o electron.o rot.o electron_all.o shm.o pardens.o optics.o constr_cell_relax.o stm.o elpol.o hamil_lr.o rmm-diis_lr.o subrot_lr.o lr_helper.o hamil_lrf.o elinear_response.o ilinear_response.o linear_optics.o setlocalpp.o wannier.o electron_OEP.o electron_lhf.o twoelectron4o.o minimax_ini.o minimax_dependence.o minimax_functions1D.o minimax_functions2D.o minimax_varpro.o minimax.o msdgw.o umco.o mlwf.o ratpol.o pade_fit.o screened_2e.o wave_cacher.o crpa.o chi_base.o wpot.o local_field.o bse_base.o ump2.o ump2kpar.o fcidump.o ump2no.o core_con_mat.o bse_te.o bse_lanczos.o bse.o bse_driver.o time_propagation.o esf.o acfdt.o afqmc.o rpax.o chi.o dmft.o GG_base.o acfdt_GG.o greens_orbital.o lt_mp2.o rnd_orb_mp2.o greens_real_space.o chi_GG.o chi_super.o sydmat.o rmm-diis_mlr.o linear_response_NMR.o converse.o wannier_interpol.o wave_interpolate.o wave_rotator.o wave_window.o wap.o elphon_potential_struct.o elphon_base.o elphon_triplets.o elphon_potential.o elphon_accumulators.o elphon_accumulators_ph.o elphon_accumulators_high.o elphon_kgrid.o transport.o elphon_common.o elphon_mels_pw.o elphon_mels_wann.o elphon_mels.o elphon_superconductivity.o elphon_driver.o phonon.o elphon_derivative.o wannier_mats.o elphon.o finite_diff.o linear_response.o auger.o dmatrix.o embed.o rpa_high.o main.o -Llib -ldmy -Lparser -lparser -Wl,--verbose -L/opt/cray/pe/fftw/3.3.10.10/x86_genoa/lib -lfftw3 -lfftw3_omp -L/software/rocm/rocm-6.3.0/hip/lib -lamdhip64 -L/software/rocm/rocm-6.3.0/lib -lrocblas -lrocfft -lrocsolver -lcraymp -lrccl -LHIP -lHipInterface -lstdc++ -ldl -L/projectdir/projects/research/modulized-software/wannier90/3.1.0/gpu-cray-offload-cce19/lib -lwannier -L/projectdir/projects/research/modulized-software/libbeef/0.1.3/gpu-cray-offload-cce19/lib -lbeef
warning: linking module flags 'Debug Info Version': IDs have conflicting values ('i32 4' from /tmp/samueldy/cooltmpdir-iaEVeh/mpi-cce-openmp-amdgcn-amd-amdhsa.amdgpu with 'i32 3' from llvm-link)
warning: linking module flags 'Debug Info Version': IDs have conflicting values ('i32 4' from /tmp/samueldy/cooltmpdir-iaEVeh/mathtools-cce-openmp-amdgcn-amd-amdhsa.amdgpu with 'i32 3' from llvm-link)
warning: linking module flags 'Debug Info Version': IDs have conflicting values ('i32 4' from /tmp/samueldy/cooltmpdir-iaEVeh/setex_struct-cce-openmp-amdgcn-amd-amdhsa.amdgpu with 'i32 3' from llvm-link)
<...many more warning lines like this...>
lld: error: undefined symbol: beefx_
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced 9 more times
lld: error: undefined symbol: beeflocalcorrspin_
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced 1 more times
lld: error: undefined symbol: beeflocalcorr_
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced by /tmp/samueldy/cooltmpdir-iaEVeh/vasp-cce-openmp__llc.amdgpu:(fexcg_$ck_L726_58)
>>> referenced 1 more times
make[2]: *** [makefile:153: vasp] Error 1
make[2]: Leaving directory '/home/samueldy/Downloads/vasp-6.6.0-hpcmp/build/gam'
cp: cannot stat 'vasp': No such file or directory
make[1]: *** [makefile:150: all] Error 1
make[1]: Leaving directory '/home/samueldy/Downloads/vasp-6.6.0-hpcmp/build/gam'
make: *** [makefile:17: gam] Error 2
(I see a large number of warnings about IDs having mismatching values when linking module flags, but my understanding is that these warnings can be safely ignored. (See "linker warning" box at https://docs.nersc.gov/development/prog ... compiling/.))
If I readelf -Ws --dyn-syms the symbols inside my build of libbeef.a, I clearly see the three symbols in question:
Code: Select all
$ readelf -Ws --dyn-syms libbeef.a | grep beef
File: libbeef.a(beefun.o)
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS beefun.c
10: 0000000000000000 4 OBJECT LOCAL DEFAULT 6 beeftype
11: 0000000000000000 4 OBJECT LOCAL DEFAULT 7 beeforder
45: 0000000000000200 7688 OBJECT LOCAL DEFAULT 7 beefmat
691: 0000000000000020 1058 FUNC GLOBAL DEFAULT 2 beefx_
695: 0000000000000450 466 FUNC GLOBAL DEFAULT 2 beeflocalcorr_
697: 0000000000000630 600 FUNC GLOBAL DEFAULT 2 beefxpot_
698: 0000000000000890 302 FUNC GLOBAL DEFAULT 2 beeflocalcorrpot_
699: 00000000000009c0 556 FUNC GLOBAL DEFAULT 2 beeflocalcorrspin_
701: 0000000000000bf0 332 FUNC GLOBAL DEFAULT 2 beeflocalcorrpotspin_
702: 0000000000000d40 9 FUNC GLOBAL DEFAULT 2 beefsetmode_
703: 0000000000000d50 7 FUNC GLOBAL DEFAULT 2 beefrandinit_
705: 0000000000000d60 10 FUNC GLOBAL DEFAULT 2 beefrandinitdef_
706: 0000000000000d70 431 FUNC GLOBAL DEFAULT 2 beefensemble_
711: 0000000000000f20 107 FUNC GLOBAL DEFAULT 2 beef_set_type_
File: libbeef.a(pbecor.o)
Not sure I understand what's going on. The beefx_, beeflocalcorr_, beeflocalcorrspin_ symbols are reported as being referenced from the fexcg module, but in the source code the function calls appear only in xc_driver.F.
If I turn on verbose linker messages by adding -Wl,--verbose to LLIBS, and specifically include the LLIBS += -L$(LIBBEEF_ROOT)/lib -lbeef line in makefile.include without also setting the -Dlibbeef preprocessor flag (i.e., to avoid VASP trying to reach out for libbeef symbols), I am still able to see the linker successfully locate and load the libbeef.a library:
Code: Select all
<...more linker debug lines...>
attempt to open lib/libbeef.so failed
attempt to open lib/libbeef.a failed
attempt to open parser/libbeef.so failed
attempt to open parser/libbeef.a failed
attempt to open /opt/cray/pe/fftw/3.3.10.10/x86_genoa/lib/libbeef.so failed
attempt to open /opt/cray/pe/fftw/3.3.10.10/x86_genoa/lib/libbeef.a failed
attempt to open /software/rocm/rocm-6.3.0/hip/lib/libbeef.so failed
attempt to open /software/rocm/rocm-6.3.0/hip/lib/libbeef.a failed
attempt to open /software/rocm/rocm-6.3.0/lib/libbeef.so failed
attempt to open /software/rocm/rocm-6.3.0/lib/libbeef.a failed
attempt to open HIP/libbeef.so failed
attempt to open HIP/libbeef.a failed
attempt to open /projectdir/projects/research/modulized-software/wannier90/3.1.0/gpu-cray-offload-cce19/lib/libbeef.so failed
attempt to open /projectdir/projects/research/modulized-software/wannier90/3.1.0/gpu-cray-offload-cce19/lib/libbeef.a failed
attempt to open /projectdir/projects/research/modulized-software/libbeef/0.1.3/gpu-cray-offload-cce19/lib/libbeef.so failed
attempt to open /projectdir/projects/research/modulized-software/libbeef/0.1.3/gpu-cray-offload-cce19/lib/libbeef.a succeeded
/projectdir/projects/research/modulized-software/libbeef/0.1.3/gpu-cray-offload-cce19/lib/libbeef.a
<...more linker debug lines...>
However, as soon as I enable the -Dlibbeef preprocessor flag, I get the aforementioned errors about not being able to locate the beefx_, beeflocalcorrspin_, and beeflocalcorr_ symbols and, strangely, I no longer see any of the -Wl,--verbose debug output about searching directories for libraries. Almost as if the linker didn't attempt to search for libraries at all.
The libbeef library I produced appears valid. For example, I am able to build a small C program (using cc from CCE 19.0.0) and link it against libbeef.a. This program successfully loads and calls the beefx_ function from libbeef. Not super familiar with how C foreign function interfaces work in Fortran, but I'm not seeing anywhere in the VASP 6.6.0 sources where there's a INTERFACE defined to map the uppercase BEEFX, BEEFLOCALCORRSPIN, and BEEFLOCALCORR names to the lowercase, underscored beefx_, beeflocalcorrspin_, and beeflocalcorr_ symbols.
In the past with Intel compilers, I've simply built libbeef (sometimes with the --enable-optmax) flag, added the relevant libbeef-related lines in my makefile.include, and everything just worked. Not sure if something has changed about the libbeef API, the calling VASP Fortran code, or if this is just a quirk of the Cray compilers.
Any ideas on what to try next?
Thanks in advance!