Research companion repository for the paper:
Rababah, A. (2025). Heuristic Scheduling Strategies for Single-Reactor Pharmaceutical Batch Production Under Uncertainty: A Comparative Statistical and Machine Learning Analysis. ChemRxiv. https://doi.org/10.26434/chemrxiv-2025-wq0tr
Pharmaceutical batch production faces significant scheduling challenges due to operational uncertainties including equipment failures, yield variability, and demand fluctuations. This study evaluates three scheduling heuristics (FIFO, SPT, LPT) for single-reactor configurations across varying uncertainty levels using a discrete-event simulation model of a 10,000L bioreactor producing three antibiotic products.
Key Findings:
- Campaign-based strategies (SPT, LPT) outperform round-robin FIFO by 16.5% in makespan reduction
- SPT and LPT are statistically equivalent (Cohen's d = 0.12), providing operational flexibility
- No heuristic Γ uncertainty interactionβbenefits remain consistent across all uncertainty levels
- Machine learning models achieve 90-97% accuracy in predicting schedule robustness
- Polynomial SVM achieves best performance (96.7% accuracy, AUC = 0.972)
| Parameter | Value |
|---|---|
| Reactor Configuration | Single 10,000L bioreactor |
| Products | 3 antibiotics (A: 48h, B: 72h, C: 120h fermentation) |
| Scheduling Heuristics | FIFO, SPT, LPT |
| Uncertainty Levels | Low, Medium, High |
| Total Observations | 450 (3 Γ 3 Γ 50 scenarios) |
| Parameter | Low | Medium | High |
|---|---|---|---|
| Equipment Failure Probability | 2% | 5% | 8% |
| Yield Variability (CV) | Β±5% | Β±10% | Β±15% |
| Processing Time Deviation | Β±5% | Β±10% | Β±15% |
- FIFO (First-In-First-Out): Round-robin cycling through products AβBβC
- SPT (Shortest Processing Time): Campaign-based, shortest products first (AβBβC)
- LPT (Longest Processing Time): Campaign-based, longest products first (CβBβA)
| Source | F | p | Ξ·Β² | Interpretation |
|---|---|---|---|---|
| Uncertainty Level | 346.69 | <.001 | 0.437 | Large effect |
| Heuristic | 225.71 | <.001 | 0.285 | Large effect |
| Interaction | 0.103 | .981 | <0.001 | No interaction |
Model explains ~72% of variance in makespan (Ξ·Β² combined = 0.722)
| Uncertainty | SPT | LPT | FIFO |
|---|---|---|---|
| Low | 1,762 (CV=2.6%) | 1,778 (CV=2.5%) | 2,157 (CV=3.0%) |
| Medium | 1,914 (CV=6.1%) | 1,942 (CV=6.1%) | 2,322 (CV=6.5%) |
| High | 2,300 (CV=11.0%) | 2,323 (CV=11.7%) | 2,680 (CV=11.6%) |
| Marginal Mean | 1,992 | 2,014 | 2,387 |
| Model | Test Accuracy | AUC | F1 | MCC |
|---|---|---|---|---|
| Polynomial SVM | 96.7% | 0.972 | 0.967 | 0.934 |
| Random Forest | 94.4% | 0.985 | 0.944 | 0.890 |
| RBF SVM | 94.4% | 0.946 | 0.945 | 0.891 |
| Decision Tree | 94.4% | 0.944 | 0.945 | 0.884 |
| Gradient Boosting | 93.3% | 0.997 | 0.933 | 0.869 |
| Linear SVM | 92.2% | 0.920 | 0.922 | 0.845 |
| Logistic Regression | 90.0% | 0.900 | 0.900 | 0.799 |
| Feature | Relative Importance |
|---|---|
| Heuristic Type | 33% |
| Uncertainty Level | 30% |
| Total Demand | 18% |
| Average Yield | 8% |
| Downtime Hours | 6% |
| Equipment Failure | 5% |
single-reactor-scheduling/
βββ README.md # This file
βββ LICENSE # MIT License
βββ DATA_DICTIONARY.md # Variable definitions
βββ CITATION.cff # Citation metadata
β
βββ data/
β βββ optimized_dataset.csv # Base simulation output (450 obs)
β βββ optimized_dataset_ML.csv # ML-ready with target variables
β
βββ scripts/
β βββ create_ml_dataset.py # Adds ML target variables
β βββ visualization_script.py # Generates all 8 figures
β
βββ jasp/
β βββ statistical_analysis.jasp # ANOVA, regression, post-hoc
β βββ ml_classification.jasp # All ML models
β
βββ figures/
β βββ Figure1_Main_Results.png # Makespan by heuristic & interaction
β βββ Figure2_Mechanism_Analysis.png # Changeover & learning effects
β βββ Figure3_Uncertainty_Effects.png # Distribution & SPT advantage
β βββ FigureS1_Data_Quality.png # Data quality dashboard
β βββ FigureS2_Statistical_Diagnostics.png # ANOVA assumptions
β βββ FigureS3_Detailed_Comparison.png # Violin plots by condition
β
βββ paper/
βββ Rababah_2025_Single_Reactor_Scheduling.pdf
| Software | Version | Purpose |
|---|---|---|
| Python | 3.8+ | Data preparation & visualization |
| JASP | 0.18+ | All statistical & ML analysis |
| pandas | 1.5+ | Data manipulation |
| numpy | 1.21+ | Numerical operations |
| matplotlib | 3.5+ | Figure generation |
| seaborn | 0.12+ | Statistical visualization |
| scipy | 1.9+ | Statistical functions |
| Task | Tool | Script/File |
|---|---|---|
| Dataset preparation | Python | create_ml_dataset.py |
| Figure generation | Python | visualization_script.py |
| Descriptive statistics | JASP | statistical_analysis.jasp |
| Two-Way ANOVA | JASP | statistical_analysis.jasp |
| Kruskal-Wallis tests | JASP | statistical_analysis.jasp |
| Post-hoc comparisons | JASP | statistical_analysis.jasp |
| Multiple regression | JASP | statistical_analysis.jasp |
| ML Classification | JASP | ml_classification.jasp |
cd scripts/
python create_ml_dataset.pyThis creates optimized_dataset_ML.csv with three target variables:
schedule_robust: Binary (0=Vulnerable, 1=Robust)performance_class: 3-class (Excellent/Acceptable/Poor)performance_numeric: Ordinal encoding (0/1/2)
cd scripts/
python visualization_script.pyThis generates all 8 publication-quality figures:
Main Paper Figures:
| Figure | Description | Output Files |
|---|---|---|
| Figure 1 | Main results: boxplots & interaction plot | Figure1_Main_Results.png/pdf |
| Figure 2 | Mechanism: changeover time, learning savings, changeover count | Figure2_Mechanism_Analysis.png/pdf |
| Figure 3 | Uncertainty effects: distributions & SPT advantage | Figure3_Uncertainty_Effects.png/pdf |
Supplementary Figures:
| Figure | Description | Output Files |
|---|---|---|
| Figure S1 | Data quality dashboard (6 panels) | FigureS1_Data_Quality.png/pdf |
| Figure S2 | ANOVA diagnostics: Q-Q, residuals, homogeneity | FigureS2_Statistical_Diagnostics.png/pdf |
| Figure S3 | Violin plots by heuristic Γ uncertainty | FigureS3_Detailed_Comparison.png/pdf |
Summary Tables:
Table1_PanelA_Heuristic_Stats.csvTable1_PanelB_Uncertainty_Stats.csvTable1_PanelC_CrossTab.csv
- Open
jasp/statistical_analysis.jasp - Analyses included:
- Descriptive Statistics
- Two-Way Factorial ANOVA
- Kruskal-Wallis Tests (non-parametric confirmation)
- Post-Hoc Comparisons (Tukey HSD, Dunn's test)
- Multiple Linear Regression
- Assumption Diagnostics
- Open
jasp/ml_classification.jasp - Set variable types:
schedule_robustβ Nominal (target)performance_classβ Nominal (multi-class target)
- ML models configured:
- Random Forest (100 trees)
- Gradient Boosting (100 iterations)
- SVM (Linear, RBF, Polynomial kernels)
- Decision Tree (max depth = 30)
- Logistic Regression
See DATA_DICTIONARY.md for complete variable definitions.
| Variable | Type | Description |
|---|---|---|
scenario_id |
Integer | Unique identifier (1-450) |
heuristic |
Categorical | FIFO, SPT, or LPT |
uncertainty_level |
Ordinal | Low, Medium, or High |
makespan |
Continuous | Total production time (hours) |
utilization |
Continuous | Equipment utilization (0-1) |
total_changeover_time |
Continuous | Sum of changeover durations |
total_learning_savings |
Continuous | Time saved via learning curves |
schedule_robust |
Binary | ML target (0/1) |
performance_class |
Categorical | ML target (3-class) |
- β Use campaign-based scheduling (SPT or LPT) as the default strategy
- β Avoid FIFO/round-robin except when explicitly required
- π Expected savings: ~395 hours (~16.5%) per production cycle
- π― Deploy ML-based prediction tools to flag high-risk schedules
- π Increase buffer time allocation under high uncertainty
- π Monitor CV of schedule outcomes as reliability metric
IF heuristic = FIFO:
β VULNERABLE schedule (94-101 of 150 cases)
IF heuristic = SPT or LPT:
IF uncertainty = Low or Medium:
β ROBUST schedule (116-118 of 200 cases)
IF uncertainty = High:
β Check demand level for final classification
If you use this dataset or methodology, please cite:
@article{rababah2025single,
title={Heuristic Scheduling Strategies for Single-Reactor Pharmaceutical
Batch Production Under Uncertainty: A Comparative Statistical
and Machine Learning Analysis},
author={Rababah, Anfal},
journal={ChemRxiv},
year={2025},
doi={10.26434/chemrxiv-2025-wq0tr}
}This study complements multi-reactor scheduling research. For facilities with parallel processing capacity, different scheduling considerations apply.
Anfal Rababah
- Email: Anfal0Rababah@gmail.com
- ORCID: 0009-0003-7450-8907
This project is licensed under the MIT License - see the LICENSE file for details.
- JASP Team for open-source statistical software
- Python scientific computing community (NumPy, pandas, Matplotlib, Seaborn, SciPy)
- Anthropic's Claude AI for technical writing assistance