|
| 1 | +name: Build and Deploy LAMMPS |
| 2 | +description: Build a Docker container and deploy it as a Kubernetes Job. |
| 3 | +plan: |
| 4 | +- agent: build |
| 5 | + context: |
| 6 | + environment: "AWS CPU instance in Kubernetes to run across nodes" |
| 7 | + application: lammps-reax |
| 8 | + platforms: linux/amd64,linux/arm64 |
| 9 | + container: ghcr.io/converged-computing/fractale-agent-experiments:lammps-reax |
| 10 | + push: true |
| 11 | + max_attempts: 10 |
| 12 | + details: | |
| 13 | + Ensure all globbed files from examples/reaxff/HNS from the root of the lammps codebase are in the WORKDIR. Clone the latest branch of LAMMPS. You MUST put lmp on the PATH. You MUST install libgomp1. |
| 14 | + This will be run with a workload manager that can bootstrap MPI. You MUST install MPI but you do not need ssh. |
| 15 | + You MUST install OpenMPI 4.1.2 with libfabric --with-efa for AWS. |
| 16 | + You MUST build the container for a multi-node MPI environment |
| 17 | +
|
| 18 | +- agent: minicluster |
| 19 | + context: |
| 20 | + environment: "AWS CPU instance in Kubernetes" |
| 21 | + container: ghcr.io/converged-computing/fractale-agent-experiments:lammps-reax |
| 22 | + max_attempts: 10 |
| 23 | + max_runtime: 300 |
| 24 | + optimize: | |
| 25 | + You MUST maximize the LAMMPS FOM, k-atom or m-atom steps per second. |
| 26 | + You MUST run choose the problem size to maximize FOM. You MUST start with 1, then 2 nodes. You MUST set environment variables for MPI to use EFA with libfabric. |
| 27 | + resources: | |
| 28 | + The resource spec you got earlier is for an autoscaling cluster, so the nodes possible are not there. You must add the nodeSelector to use: |
| 29 | + m7g.16xlarge, 64 CPU, 256 GiB Memory, ARM (Graviton3) |
| 30 | + You are still limited to up to 4 nodes. |
| 31 | + testing: |
| 32 | + Run in.reaxff.hns in the pwd with lmp -v x 2 -v y 2 -v z 2 -in ./in.reaxff.hns -nocite for testing only. |
| 33 | + details: | |
| 34 | + You MUST run on up to 4 nodes and you MUST use only 1 node to first test. |
| 35 | + The Flux Operator uses flux run in the pwd with the tasks determined by the spec.tasks. |
| 36 | + You MUST set resource requests and limits to use vpc.amazonaws.com/efa: 1 |
| 37 | + Since this is an ARM instance you MUST change the flux.container.image to be |
| 38 | + ghcr.io/converged-computing/flux-view-rocky:arm-9. Otherwise, do not change or set it. |
| 39 | + If you are using an ARM instance you MUST also set flux.arch: "arm". |
0 commit comments