Bug: pppm/dplr + fix dplr hangs or crashes with multi-MPI (except ntasks=1), including inconsistent behavior with restart and read_data #5362

Ziyang-You · 2026-04-02T10:17:30Z

Ziyang-You
Apr 2, 2026

When using pair_style deepmd + fix dplr + kspace_style pppm/dplr, the simulation only runs reliably with 1 MPI process (ntasks=1). With multiple MPI processes (ntasks ≥ 2), it frequently hangs indefinitely during PPPM initialization or crashes with MPI collective errors. The behavior is inconsistent — some systems (from read_data) can run with multi-MPI, while others (especially from read_restart) fail.

This makes DPLR simulations impractical on HPC clusters, as single-MPI runs are extremely slow.

Environment

DeepMD-kit version: v3.1.3 (commit b2c8511, 2026-03-18)
LAMMPS version: 22 Jul 2025 - Update 2 (stable_22Jul2025_update2)
Backend: CPU (TensorFlow + PyTorch)
Architecture: aarch64 (Linux)
Installation: pip install deepmd-kit[cpu,lmp]

Input script, sbatch inputs and out
in.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: pppm/dplr + fix dplr hangs or crashes with multi-MPI (except ntasks=1), including inconsistent behavior with restart and read_data #5362

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Bug: pppm/dplr + fix dplr hangs or crashes with multi-MPI (except ntasks=1), including inconsistent behavior with restart and read_data #5362

Uh oh!

Ziyang-You Apr 2, 2026

Replies: 0 comments

Ziyang-You
Apr 2, 2026