Page 1 of 1

Frequent problem

Posted: Thu Jun 28, 2007 10:30 pm
by midair77
Hi, all. It happens to me that we have this segfault error quire requently for a variety of job sizes. For this particular job, we ran on 4 nodes with 16 processors and each node has 8 Gig of RAM.

running on 16 nodes
each image running on 2 nodes
distr: one band on 1 nodes, 2 groups
vasp.4.6.19 08Dec03 complex
01/POSCAR found : 4 types and 15 ions

-----------------------------------------------------------------------------
| |
| ADVICE TO THIS USER RUNNING 'VASP/VAMP' (HEAR YOUR MASTER'S VOICE ...): |
| |
| You enforced a specific xc-type in the INCAR file, |
| a different type was found on the POTCAR file |
| I HOPE YOU KNOW, WHAT YOU ARE DOING |
| |
-----------------------------------------------------------------------------

LDA part: xc-table for Pade appr. of Perdew
00/POSCAR found : 4 types and 15 ions
09/POSCAR found : 4 types and 15 ions
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ...
reading WAVECAR
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
entering main loop
N E dE d eps ncg rms rms(c)
RMM: 1 0.167129221272E+04 0.16713E+04 -0.38097E+04 780 0.105E+03
*******
1 F= -.62961368E+03 E0= -.62951081E+03 d E =-.629614E+03
rm_l_2_10283: p4_error: interrupt SIGx: 15
bm_list_10389: p4_error: listener select: -1
rm_l_4_10285: (972.683594) net_send: could not write to fd=6, errno = 9
rm_l_4_10285: p4_error: net_send write: -1
rm_l_3_10284: (972.683594) net_send: could not write to fd=6, errno = 9
rm_l_3_10284: p4_error: net_send write: -1
rm_l_4_10285: (972.683594) net_send: could not write to fd=5, errno = 104
rm_l_6_10015: (972.699219) net_send: could not write to fd=6, errno = 9
rm_l_6_10015: p4_error: net_send write: -1
rm_l_7_10017: (972.699219) net_send: could not write to fd=7, errno = 9
rm_l_7_10017: p4_error: net_send write: -1
rm_l_8_10272: (972.687500) net_send: could not write to fd=6, errno = 9
rm_l_8_10272: p4_error: net_send write: -1
rm_l_9_10275: (972.687500) net_send: could not write to fd=8, errno = 9
rm_l_9_10275: p4_error: net_send write: -1

In the corresponding error file
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
forrtl: error (78): process killed (SIGTERM)
mpiexec: Warning: tasks 0-9 died with signal 11 (Segmentation fault).

I also found this after executing dmesg command or in /var/log/messages

vaspmpitst[10388]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6
vaspmpitst[10386]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6
vaspmpitst[10387]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6
vaspmpitst[10371]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6

Could the experts here tell me what could be the possible causes for this kind of error and how to fix it?

Thank you so much.

Frequent problem

Posted: Mon Jul 02, 2007 6:13 am
by admin
this error is not vasp-related, it is most probably due to some MPI error. please contact your system administrator