Job Running Error

Question on input files/tags, interpreting output, etc.

Please check whether the answer to your question is given in the VASP online manual or has been discussed in this forum previously!

Moderators: Global Moderator, Moderator

Locked
Message
Author
abdul_jaleel1
Newbie
Newbie
Posts: 26
Joined: Wed Mar 11, 2020 12:30 pm
License Nr.: 20-0027
Location: Pakistan

Job Running Error

#1 Post by abdul_jaleel1 » Thu May 26, 2022 11:05 am

I am usually unable to submit job more then almost 90 atoms. Please have a look to error and suggest solution.
ERROR
WARNING: random wavefunctions but no delay for mixing, default for NELMDL

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 186419 RUNNING AT physics
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 186420 RUNNING AT physics
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 2 PID 186421 RUNNING AT physics
= KILLED BY SIGNAL: 11 (Segmentation fault)
===========================================================================
[glow]job script
[/glow]

#!/bin/bash
#PBS -q batch
#PBS -N testing
#PBS -l nodes=1:ppn=32
cd $PBS_O_WORKDIR
export OMP_NUM_THREADS=1
source /home/abdul/.bashrc
mpirun -np 32 /home/abdul/VASP/vasp.6.1.2.fixcell/bin/vasp_std > vasprun.l
[glow]POSCAR[/glow]
This file is generated by VASPKIT code
1.000000
14.0653711830732568 0.0000000000000000 0.0000000000000000
-7.0326378837600583 12.1817370217174652 0.0000000000000000
0.0000000000000000 0.0000000000000000 35.0135993958000000
In W Se Te
24 16 24 32
Direct
0.0008398486615709 0.0026303932638031 0.6395539078451030 In1
0.0008398486615709 0.5026303932638031 0.6395539078451030 In2
0.5008398486615709 0.0026303932638031 0.6395539078451030 In3
0.5008398486615709 0.5026303932638031 0.6395539078451030 In4
0.1685780852979015 0.3387844397100557 0.6393165815766640 In5
0.1685780852979015 0.8387844397100557 0.6393165815766640 In6
0.6685780852979015 0.3387844397100557 0.6393165815766640 In7
0.6685780852979015 0.8387844397100557 0.6393165815766640 In8
0.3368150410057580 0.1704069459662106 0.6395061673677972 In9
0.3368150410057580 0.6704069459662106 0.6395061673677972 In10
0.8368150410057580 0.1704069459662106 0.6395061673677972 In11
0.8368150410057580 0.6704069459662106 0.6395061673677972 In12
0.0000208924376611 0.0021745229119978 0.5597127532106815 In13
0.0000208924376611 0.5021745229119978 0.5597127532106815 In14
0.5000208924376611 0.0021745229119978 0.5597127532106815 In15
0.5000208924376611 0.5021745229119978 0.5597127532106815 In16
0.1684358981286210 0.3393404780794760 0.5595405345313570 In17
0.1684358981286210 0.8393404780794760 0.5595405345313570 In18
0.6684358981286210 0.3393404780794760 0.5595405345313570 In19
0.6684358981286210 0.8393404780794760 0.5595405345313570 In20
0.3372682688618494 0.1703286449131534 0.5597246691324855 In21
0.3372682688618494 0.6703286449131534 0.5597246691324855 In22
0.8372682688618494 0.1703286449131534 0.5597246691324855 In23
0.8372682688618494 0.6703286449131534 0.5597246691324855 In24
0.2459844288476805 0.0854796391002310 0.3713128147671061 W1
0.2459844288476805 0.5854796391002310 0.3713128147671061 W2
0.7459844288476805 0.0854796391002310 0.3713128147671061 W3
0.7459844288476805 0.5854796391002310 0.3713128147671061 W4
0.2461496886574750 0.3367528139602705 0.3713504010830440 W5 ..................................

[shadow]System Information[/shadow]
%Cpu(s): 28.9 us, 0.0 sy, 0.0 ni, 71.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 21129285+total, 17721987+free, 32960643+used, 11123176 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 17762040+avail Mem

martin.schlipf
Global Moderator
Global Moderator
Posts: 466
Joined: Fri Nov 08, 2019 7:18 am

Re: Job Running Error

#2 Post by martin.schlipf » Fri May 27, 2022 7:42 am

Please prepare all input and the important output files as a zip-file as specified in the sticky post, otherwise we cannot reproduce this issue.
admin wrote: Mon Sep 24, 2018 1:44 pm
  • Provide a report in form of a zip-file in the attachment of your post that contains:
    • all input files of the job, that is POSCAR, INCAR, KPOINTS, POTCAR
    • OUTCAR and stdout of the run
    • your jobscript, if you use one
That being said, the most likely error seems to be insufficient memory. VASP will report some memory requirements to the OUTCAR file. If your code crashes before that point you can try to reduce the memory demand by reducing e.g. ENCUT or KPOINTS. Should these changes be sufficient to make the system work, you need to request more memory to run the final calculation.

abdul_jaleel1
Newbie
Newbie
Posts: 26
Joined: Wed Mar 11, 2020 12:30 pm
License Nr.: 20-0027
Location: Pakistan

Re: Job Running Error

#3 Post by abdul_jaleel1 » Fri May 27, 2022 10:36 am

All the required data is attached. I have checked the system memory , Please have a look I don't know why problem occurs.

%Cpu(s): 28.9 us, 0.0 sy, 0.0 ni, 71.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 21129285+total, 17721987+free, 32960643+used, 11123176 buff/cache
KiB Swap: 4194300 total, 4194300 free, 0 used. 17762040+avail Mem
You do not have the required permissions to view the files attached to this post.

martin.schlipf
Global Moderator
Global Moderator
Posts: 466
Joined: Fri Nov 08, 2019 7:18 am

Re: Job Running Error

#4 Post by martin.schlipf » Fri May 27, 2022 1:23 pm

I noticed you set ISTART=1. You typically shouldn't do this and instead rely on the default. Otherwise you may run into issues if no WAVECAR is present.
The reason ISTART exists is for the opposite case, when you do not want to continue from an existing WAVECAR, you can set it to 0 to start from scratch.

As for your system, on my machine it runs fine with the number of cores you used. Memory should not be an issue actually, because it is a DFT calculation and should require less than a GB per core. So unless you have really little memory available it should be fine.
The system is also so small that you may want to try to run with less cores, because you may not benefit from all the cores. Otherwise you can explore adding INCAR tags to select more parallelization levels (NCORE, KPAR). Perhaps you can also try to symmetrize the structure, because you get a warning about the symmetry of the system.

Finally, you could try to use a newer version of VASP.

Locked