We are finding using the OpenACC GPU version to be beneficial as far as run times with the exception of time spent in "CORREC" corrector section and the "EDDIAG" section.
-------------------------------------- Iteration 1( 11) ---------------------------------------
GPU
POTLOK: cpu time 0.0137: real time 0.0137
SETDIJ: cpu time 0.7244: real time 0.7261
TRIAL : cpu time 1.5328: real time 1.5365
CORREC: cpu time 1.4981: real time 1.5017
EDDIAG: cpu time 0.3957: real time 0.3967
CHARGE: cpu time 0.3385: real time 0.3393
CPU
POTLOK: cpu time 0.0149: real time 0.0150
SETDIJ: cpu time 0.0562: real time 0.0563
TRIAL : cpu time 1.6162: real time 1.6201
CORREC: cpu time 0.9915: real time 0.9940
EDDIAG: cpu time 0.7746: real time 0.7817
CHARGE: cpu time 0.1655: real time 0.1660
Perhaps this section does not run on the GPU yet??