Frequent problem

Problems running VASP: crashes, internal errors, "wrong" results.

Moderators: Global Moderator, Moderator

Post Reply
Message
Author
midair77
Newbie
Newbie
Posts: 11
Joined: Mon Apr 02, 2007 11:32 pm

Frequent problem

#1 Post by midair77 » Thu Jun 28, 2007 10:30 pm

Hi, all. It happens to me that we have this segfault error quire requently for a variety of job sizes. For this particular job, we ran on 4 nodes with 16 processors and each node has 8 Gig of RAM.

running on 16 nodes
each image running on 2 nodes
distr: one band on 1 nodes, 2 groups
vasp.4.6.19 08Dec03 complex
01/POSCAR found : 4 types and 15 ions

-----------------------------------------------------------------------------
| |
| ADVICE TO THIS USER RUNNING 'VASP/VAMP' (HEAR YOUR MASTER'S VOICE ...): |
| |
| You enforced a specific xc-type in the INCAR file, |
| a different type was found on the POTCAR file |
| I HOPE YOU KNOW, WHAT YOU ARE DOING |
| |
-----------------------------------------------------------------------------

LDA part: xc-table for Pade appr. of Perdew
00/POSCAR found : 4 types and 15 ions
09/POSCAR found : 4 types and 15 ions
POSCAR, INCAR and KPOINTS ok, starting setup
WARNING: wrap around errors must be expected
FFT: planning ...
reading WAVECAR
WARNING: random wavefunctions but no delay for mixing, default for NELMDL
entering main loop
N E dE d eps ncg rms rms(c)
RMM: 1 0.167129221272E+04 0.16713E+04 -0.38097E+04 780 0.105E+03
*******
1 F= -.62961368E+03 E0= -.62951081E+03 d E =-.629614E+03
rm_l_2_10283: p4_error: interrupt SIGx: 15
bm_list_10389: p4_error: listener select: -1
rm_l_4_10285: (972.683594) net_send: could not write to fd=6, errno = 9
rm_l_4_10285: p4_error: net_send write: -1
rm_l_3_10284: (972.683594) net_send: could not write to fd=6, errno = 9
rm_l_3_10284: p4_error: net_send write: -1
rm_l_4_10285: (972.683594) net_send: could not write to fd=5, errno = 104
rm_l_6_10015: (972.699219) net_send: could not write to fd=6, errno = 9
rm_l_6_10015: p4_error: net_send write: -1
rm_l_7_10017: (972.699219) net_send: could not write to fd=7, errno = 9
rm_l_7_10017: p4_error: net_send write: -1
rm_l_8_10272: (972.687500) net_send: could not write to fd=6, errno = 9
rm_l_8_10272: p4_error: net_send write: -1
rm_l_9_10275: (972.687500) net_send: could not write to fd=8, errno = 9
rm_l_9_10275: p4_error: net_send write: -1

In the corresponding error file
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
p4_error: latest msg from perror: Bad file descriptor
forrtl: error (78): process killed (SIGTERM)
mpiexec: Warning: tasks 0-9 died with signal 11 (Segmentation fault).

I also found this after executing dmesg command or in /var/log/messages

vaspmpitst[10388]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6
vaspmpitst[10386]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6
vaspmpitst[10387]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6
vaspmpitst[10371]: segfault at 00007fff0e4f78a8 rip 00000000005f82f7 rsp
00007fff0e4f78b0 error 6

Could the experts here tell me what could be the possible causes for this kind of error and how to fix it?

Thank you so much.
Last edited by midair77 on Thu Jun 28, 2007 10:30 pm, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2922
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

Frequent problem

#2 Post by admin » Mon Jul 02, 2007 6:13 am

this error is not vasp-related, it is most probably due to some MPI error. please contact your system administrator
Last edited by admin on Mon Jul 02, 2007 6:13 am, edited 1 time in total.

Post Reply