Proteus is an automated computational research pipeline designed for Materials Science and Pharmacology. It simulates the self-assembly and nanoprecipitation of polymer chains. By automating the transition from simple chemical text codes (SMILES) to complex physical Molecular Dynamics (MD) simulations, it allows researchers to rapidly screen and analyze how different polymers fold and interact with other molecules.
A major focus of the project is "The Payload" (Drug Encapsulation)—simulating how a polymer "suitcase" (nanoparticle) wraps around and traps smaller drug molecules for targeted medical delivery.
Proteus acts as a virtual crash test for medicine and materials. In the real world, synthesizing and testing hundreds of polymer variations in a lab is incredibly expensive and time-consuming. Proteus solves this by allowing researchers to:
- Screen rapidly: Test thousands of polymer/drug combinations on a computer for pennies.
- Measure Efficiency: Automatically calculate the "Encapsulation Efficiency" (what percentage of the drug actually stayed inside the polymer).
- Visualize: Generate high-quality 3D animations and cross-sections of the molecular folding process to see exactly why a design succeeded or failed.
- Save Resources: Identify the top 2 or 3 candidates virtually, ensuring physical lab resources are only spent on the most promising designs.
Proteus is built on a highly optimized, modular pipeline:
-
Module I - Topology Architect (
src/topology.py): Uses RDKit and the Universal Force Field (UFF) to convert 1D SMILES strings into 3D geometries. It automatically detects bonds, angles, and proper dihedrals, assigning physical properties using a CHONS (Carbon, Hydrogen, Oxygen, Nitrogen, Sulfur) OPLS-AA force field. -
Module II - Simulation Engine (
src/simulation.py): Generates and executes LAMMPS simulation scripts. It uses Langevin dynamics to simulate an implicit solvent (like water or ethanol) with specifically tuned viscosity/damping to create realistic molecular drifting and folding. -
Module III - Analytics (
src/analysis.py): Features a highly memory-efficient parser that reads gigabyte-sized simulation logs line-by-line. It calculates thermodynamic stability, the final size of the nanoparticle (Radius of Gyration,$R_g$ ), and the exact geometric Encapsulation Efficiency of payload molecules. -
Module IV - Visualization (
src/visualization.py): Uses Ovito (Open Visualization Tool) via Python to automatically render color-coded, GPU-accelerated 3D GIFs and cross-sectional views of the simulation.
- Added: Project scaffolding, basic SMILES-to-LAMMPS pipeline.
- Added: Core modules:
topology.py,simulation.py,analysis.py, andmain.pycontroller. - Added: Basic visualization via Ovito for rendering animations.
- Added:
run.shhelper script for easy CLI usage.
- Note: Proteus was briefly developed as a full-stack web application before being split.
- Added: A Next.js frontend featuring a highly complex "Triple-Nested Tesseract" 3D glowing hero component using Three.js and WebGL.
- Added: FastAPI backend, Celery workers, Redis, and PostgreSQL via Docker to handle asynchronous simulation requests from the web.
- Added: Email notification system for completed simulations.
- Removed: All web and backend components were entirely migrated out of this repository to a dedicated platform repo to keep this CLI tool lightweight and focused strictly on the physics engine.
-
Added: An AI inference engine (
src/ml/) designed to instantly predict polymer properties without running the full physics simulation. - Added: A PyTorch-based ChemBERTa (Transformer) model.
-
Achieved: Trained the model on synthetic datasets to predict
$R_g$ with 90% accuracy. -
Removed/Archived: The entire AI suite (
src/ml/, weights, and datasets) was moved toarchive/ai/. The project's philosophy pivoted back to prioritizing high-fidelity physical Molecular Dynamics over AI estimations.
- Added: "The Payload" (Drug Encapsulation) feature. The CLI now accepts
--payloadand--payload_countto inject secondary drug molecules into the polymer solvent. - Added: Comprehensive CHONS OPLS-AA forcefield support.
- Added: Dynamic bond order detection (Single, Double, Triple) and automatic dihedral (torsion) detection using RDKit for realistic polymer stiffness.
- Added:
VARIABLES.mdestablished as the Single Source of Truth for all physical parameters. - UX Philosophy Enforced: All complex features, especially the Payload, were made strictly optional to maintain a clean "UX First" user experience.
-
Added:
--htsflag for processing CSV spreadsheets containing hundreds of polymer/payload combinations. -
Added: Automatic ranking systems that sort HTS results by structural stability (
$R_g$ ) or Encapsulation Efficiency. - Added: Automated PDF lab report generation including 2D chemical structures, stability plots, and analysis results.
-
Added:
--cutflag to render 3D cross-sections of the nanoparticles to see the drug trapped inside. -
Added: Solvent profiles (
--solvent water/ethanol) that automatically configure physical simulation parameters.
- Optimized (
topology.py): Replaced manual math with C++ optimized RDKitrdMolTransformsroutines. - Optimized (
simulation.py): Implemented streamingstdoutviasubprocess.Popento prevent memory bloat during long LAMMPS runs. Added GPU scaling. - Optimized (
analysis.py): Rewrote the trajectory parser to seek from the end of the file and read line-by-line, preventing memory crashes when analyzing multi-gigabyte files. - Optimized (
visualization.py): Added element-specific atomic radii and dynamic frame sampling to drastically reduce GIF rendering times while improving visual quality. - Added: A complete unit testing suite using
pytest.