Skip to content

Anfal-AR/single-reactor-scheduling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Single-Reactor Pharmaceutical Batch Scheduling Under Uncertainty

DOI License: MIT JASP Python

Research companion repository for the paper:

Rababah, A. (2025). Heuristic Scheduling Strategies for Single-Reactor Pharmaceutical Batch Production Under Uncertainty: A Comparative Statistical and Machine Learning Analysis. ChemRxiv. https://doi.org/10.26434/chemrxiv-2025-wq0tr


πŸ“‹ Abstract

Pharmaceutical batch production faces significant scheduling challenges due to operational uncertainties including equipment failures, yield variability, and demand fluctuations. This study evaluates three scheduling heuristics (FIFO, SPT, LPT) for single-reactor configurations across varying uncertainty levels using a discrete-event simulation model of a 10,000L bioreactor producing three antibiotic products.

Key Findings:

  • Campaign-based strategies (SPT, LPT) outperform round-robin FIFO by 16.5% in makespan reduction
  • SPT and LPT are statistically equivalent (Cohen's d = 0.12), providing operational flexibility
  • No heuristic Γ— uncertainty interactionβ€”benefits remain consistent across all uncertainty levels
  • Machine learning models achieve 90-97% accuracy in predicting schedule robustness
  • Polynomial SVM achieves best performance (96.7% accuracy, AUC = 0.972)

πŸ”¬ Study Design

Experimental Configuration

Parameter Value
Reactor Configuration Single 10,000L bioreactor
Products 3 antibiotics (A: 48h, B: 72h, C: 120h fermentation)
Scheduling Heuristics FIFO, SPT, LPT
Uncertainty Levels Low, Medium, High
Total Observations 450 (3 Γ— 3 Γ— 50 scenarios)

Uncertainty Parameters

Parameter Low Medium High
Equipment Failure Probability 2% 5% 8%
Yield Variability (CV) Β±5% Β±10% Β±15%
Processing Time Deviation Β±5% Β±10% Β±15%

Scheduling Heuristics Evaluated

  • FIFO (First-In-First-Out): Round-robin cycling through products Aβ†’Bβ†’C
  • SPT (Shortest Processing Time): Campaign-based, shortest products first (Aβ†’Bβ†’C)
  • LPT (Longest Processing Time): Campaign-based, longest products first (Cβ†’Bβ†’A)

πŸ“Š Key Results

Statistical Analysis (Two-Way ANOVA)

Source F p Ξ·Β² Interpretation
Uncertainty Level 346.69 <.001 0.437 Large effect
Heuristic 225.71 <.001 0.285 Large effect
Interaction 0.103 .981 <0.001 No interaction

Model explains ~72% of variance in makespan (Ξ·Β² combined = 0.722)

Mean Makespan by Condition (Hours)

Uncertainty SPT LPT FIFO
Low 1,762 (CV=2.6%) 1,778 (CV=2.5%) 2,157 (CV=3.0%)
Medium 1,914 (CV=6.1%) 1,942 (CV=6.1%) 2,322 (CV=6.5%)
High 2,300 (CV=11.0%) 2,323 (CV=11.7%) 2,680 (CV=11.6%)
Marginal Mean 1,992 2,014 2,387

Machine Learning Classification Performance

Model Test Accuracy AUC F1 MCC
Polynomial SVM 96.7% 0.972 0.967 0.934
Random Forest 94.4% 0.985 0.944 0.890
RBF SVM 94.4% 0.946 0.945 0.891
Decision Tree 94.4% 0.944 0.945 0.884
Gradient Boosting 93.3% 0.997 0.933 0.869
Linear SVM 92.2% 0.920 0.922 0.845
Logistic Regression 90.0% 0.900 0.900 0.799

Feature Importance (Averaged Across Models)

Feature Relative Importance
Heuristic Type 33%
Uncertainty Level 30%
Total Demand 18%
Average Yield 8%
Downtime Hours 6%
Equipment Failure 5%

πŸ“ Repository Structure

single-reactor-scheduling/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ LICENSE                      # MIT License
β”œβ”€β”€ DATA_DICTIONARY.md           # Variable definitions
β”œβ”€β”€ CITATION.cff                 # Citation metadata
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ optimized_dataset.csv            # Base simulation output (450 obs)
β”‚   └── optimized_dataset_ML.csv         # ML-ready with target variables
β”‚
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ create_ml_dataset.py             # Adds ML target variables
β”‚   └── visualization_script.py          # Generates all 8 figures
β”‚
β”œβ”€β”€ jasp/
β”‚   β”œβ”€β”€ statistical_analysis.jasp        # ANOVA, regression, post-hoc
β”‚   └── ml_classification.jasp           # All ML models
β”‚
β”œβ”€β”€ figures/
β”‚   β”œβ”€β”€ Figure1_Main_Results.png         # Makespan by heuristic & interaction
β”‚   β”œβ”€β”€ Figure2_Mechanism_Analysis.png   # Changeover & learning effects
β”‚   β”œβ”€β”€ Figure3_Uncertainty_Effects.png  # Distribution & SPT advantage
β”‚   β”œβ”€β”€ FigureS1_Data_Quality.png        # Data quality dashboard
β”‚   β”œβ”€β”€ FigureS2_Statistical_Diagnostics.png  # ANOVA assumptions
β”‚   └── FigureS3_Detailed_Comparison.png # Violin plots by condition
β”‚
└── paper/
    └── Rababah_2025_Single_Reactor_Scheduling.pdf

πŸ› οΈ Analysis Workflow

Software Requirements

Software Version Purpose
Python 3.8+ Data preparation & visualization
JASP 0.18+ All statistical & ML analysis
pandas 1.5+ Data manipulation
numpy 1.21+ Numerical operations
matplotlib 3.5+ Figure generation
seaborn 0.12+ Statistical visualization
scipy 1.9+ Statistical functions

Division of Labor

Task Tool Script/File
Dataset preparation Python create_ml_dataset.py
Figure generation Python visualization_script.py
Descriptive statistics JASP statistical_analysis.jasp
Two-Way ANOVA JASP statistical_analysis.jasp
Kruskal-Wallis tests JASP statistical_analysis.jasp
Post-hoc comparisons JASP statistical_analysis.jasp
Multiple regression JASP statistical_analysis.jasp
ML Classification JASP ml_classification.jasp

Reproduction Steps

Step 1: Prepare ML Dataset

cd scripts/
python create_ml_dataset.py

This creates optimized_dataset_ML.csv with three target variables:

  • schedule_robust: Binary (0=Vulnerable, 1=Robust)
  • performance_class: 3-class (Excellent/Acceptable/Poor)
  • performance_numeric: Ordinal encoding (0/1/2)

Step 2: Generate Figures

cd scripts/
python visualization_script.py

This generates all 8 publication-quality figures:

Main Paper Figures:

Figure Description Output Files
Figure 1 Main results: boxplots & interaction plot Figure1_Main_Results.png/pdf
Figure 2 Mechanism: changeover time, learning savings, changeover count Figure2_Mechanism_Analysis.png/pdf
Figure 3 Uncertainty effects: distributions & SPT advantage Figure3_Uncertainty_Effects.png/pdf

Supplementary Figures:

Figure Description Output Files
Figure S1 Data quality dashboard (6 panels) FigureS1_Data_Quality.png/pdf
Figure S2 ANOVA diagnostics: Q-Q, residuals, homogeneity FigureS2_Statistical_Diagnostics.png/pdf
Figure S3 Violin plots by heuristic Γ— uncertainty FigureS3_Detailed_Comparison.png/pdf

Summary Tables:

  • Table1_PanelA_Heuristic_Stats.csv
  • Table1_PanelB_Uncertainty_Stats.csv
  • Table1_PanelC_CrossTab.csv

Step 3: Run Statistical Analysis in JASP

  1. Open jasp/statistical_analysis.jasp
  2. Analyses included:
    • Descriptive Statistics
    • Two-Way Factorial ANOVA
    • Kruskal-Wallis Tests (non-parametric confirmation)
    • Post-Hoc Comparisons (Tukey HSD, Dunn's test)
    • Multiple Linear Regression
    • Assumption Diagnostics

Step 4: Run ML Classification in JASP

  1. Open jasp/ml_classification.jasp
  2. Set variable types:
    • schedule_robust β†’ Nominal (target)
    • performance_class β†’ Nominal (multi-class target)
  3. ML models configured:
    • Random Forest (100 trees)
    • Gradient Boosting (100 iterations)
    • SVM (Linear, RBF, Polynomial kernels)
    • Decision Tree (max depth = 30)
    • Logistic Regression

πŸ“– Data Dictionary

See DATA_DICTIONARY.md for complete variable definitions.

Quick Reference

Variable Type Description
scenario_id Integer Unique identifier (1-450)
heuristic Categorical FIFO, SPT, or LPT
uncertainty_level Ordinal Low, Medium, or High
makespan Continuous Total production time (hours)
utilization Continuous Equipment utilization (0-1)
total_changeover_time Continuous Sum of changeover durations
total_learning_savings Continuous Time saved via learning curves
schedule_robust Binary ML target (0/1)
performance_class Categorical ML target (3-class)

πŸ“ˆ Practical Implications

For Production Planning

  • βœ… Use campaign-based scheduling (SPT or LPT) as the default strategy
  • ❌ Avoid FIFO/round-robin except when explicitly required
  • πŸ“Š Expected savings: ~395 hours (~16.5%) per production cycle

For Risk Management

  • 🎯 Deploy ML-based prediction tools to flag high-risk schedules
  • πŸ“ˆ Increase buffer time allocation under high uncertainty
  • πŸ“‰ Monitor CV of schedule outcomes as reliability metric

Decision Rules from ML Models

IF heuristic = FIFO:
    β†’ VULNERABLE schedule (94-101 of 150 cases)
    
IF heuristic = SPT or LPT:
    IF uncertainty = Low or Medium:
        β†’ ROBUST schedule (116-118 of 200 cases)
    IF uncertainty = High:
        β†’ Check demand level for final classification

πŸ“š Citation

If you use this dataset or methodology, please cite:

@article{rababah2025single,
  title={Heuristic Scheduling Strategies for Single-Reactor Pharmaceutical 
         Batch Production Under Uncertainty: A Comparative Statistical 
         and Machine Learning Analysis},
  author={Rababah, Anfal},
  journal={ChemRxiv},
  year={2025},
  doi={10.26434/chemrxiv-2025-wq0tr}
}

πŸ”— Related Work

This study complements multi-reactor scheduling research. For facilities with parallel processing capacity, different scheduling considerations apply.


πŸ“¬ Contact

Anfal Rababah


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • JASP Team for open-source statistical software
  • Python scientific computing community (NumPy, pandas, Matplotlib, Seaborn, SciPy)
  • Anthropic's Claude AI for technical writing assistance