Jobs hanging/freezing

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
Ben_Ellis1
Newbie
Newbie
Posts: 1
Joined: Mon Mar 28, 2022 2:54 pm

Jobs hanging/freezing

#1 Post by Ben_Ellis1 » Fri Jul 04, 2025 1:20 pm

I am experiencing VASP simulation/job issues when running standard VASP std calculations. The jobs will periodically freeze or hang and stop writing output. Ssh’ing onto the HPC node shows the std VASP instances are still running and the RAM haven’t maxed out. This seems to occur randomly, and is not dependent on number of nodes or number of cores. This problem also seems to occur independently of the VASP version, I have run this with both VASP/6.1.1. and /6.4.2. This has happened on a range of materials from metal oxides, metals, and inorganics, and seems to be unaffected by INCAR settings (with and without KPAR).

The systems that we have experienced this on are both Red Hat Linux x86.64 bit platforms, with up to 64 or 128 cores per node. We are using the gcc/12.3.0 compiler with openmpi.

Has anyone seen this issue before, or have any suggestions as to why this is happening? I am happy to provide more information if needed.


andreas.singraber
Global Moderator
Global Moderator
Posts: 300
Joined: Mon Apr 26, 2021 7:40 am

Re: Jobs hanging/freezing

#2 Post by andreas.singraber » Mon Jul 14, 2025 10:28 am

Hello!

Sorry for this very late reply! Since the problem you describe seems to appear over many different systems, INCAR setups, and even VASP versions it is hard to imagine that there is a direct cause inside the VASP source code. Can you be more specific about how you identify that the jobs "freeze"? If you log into via ssh and run top can you still see the VASP instances running at 100%? The file output is not a good indicator whether a program actually hangs because it is usually buffered via the OS. That means, that even when the VASP source code already issued a write statement you may not see the file output immediately. Instead the data will first be written to an OS-controlled buffer and not be written to the file. Only when the buffer is full the OS will write the data finally to their corresponding files. On HPC systems this can take a considerable amount of time (depending on the amount of data that is written) and it may look like the job is frozen. For example, the output in the OUTCAR file may abruptly stop (even in the middle of a line) and continue only after a longer calculation when enough output lines have accumulated in the OS buffer. Although annoying this behavior is totally normal and there is no need to be worried.

A quick test to see if there are any actual hang-ups in the middle of a VASP run could be to look at the LOOP or LOOP+ timings provided in the OUTCAR file. Compare the output of different machines, one with and one without "freezing". Check if you can spot unusual long iterations on the machines with "freezing". Or are the numbers comparable?

All the best,
Andreas Singraber


Post Reply