"You can't schedule what you can't measure. But you can estimate it."
In heat treatment operations, time is the invisible raw material. Every rack that enters a muffle furnace occupies it for a block of time that no other job can reclaim. Getting that estimate wrong — or not having it at all — means scheduling slots built on intuition, buffers padded beyond what physics requires, and planners making decisions with one hand tied behind their back.
The plant in this study runs ~300 part numbers through a single quench rack furnace. For 200 of those PNs, historical cycle records exist. For the remaining 100 low-volume PNs, no such history was ever captured — making them effectively invisible to the scheduling system.
This project closes that gap with multiple linear regression. A model trained on 500 observed furnace cycles learns how five process variables jointly determine cycle time. That model is then used in two ways: real-time estimation before a job starts, and a planning gap-fill that gives the scheduling team quantitative cycle time estimates for all 100 unobserved part numbers — for the first time.
- 500 furnace cycle records captured from the muffle furnace PLC and plant MES layer
- Target:
cycle_time_min— total furnace cycle time per rack (continuous, minutes) - Range: 208.1 – 382.4 min | Mean: 299.9 min | Std: 30.3 min
- Source: 200 high-volume part numbers with sufficient production history
| Column | Type | Description |
|---|---|---|
rack_piece_count |
int | Total pieces loaded in the rack |
avg_piece_mass_kg |
float | Weighted average mass per piece (kg) |
part_mix_count |
int | Number of distinct part numbers in the rack (1–5) |
avg_furnace_temp_c |
float | Mean furnace chamber temperature during cycle (°C) |
burner_power_pct |
float | Burner set-point during the hold phase (%) |
cycle_time_min |
float | Target — total furnace cycle time (minutes) |
| Variable(s) | Source System | Notes |
|---|---|---|
rack_piece_count |
MES / Production Order | Logged at rack load confirmation before furnace door closes |
avg_piece_mass_kg |
ERP / Part Master | Nominal mass from engineering drawing or incoming weighing |
part_mix_count |
MES / Work Order Routing | Number of distinct PNs on the rack — derived from the production order |
avg_furnace_temp_c |
Furnace PLC / SCADA | Mean thermocouple reading during the hold phase of the cycle |
burner_power_pct |
Furnace PLC | Burner set-point captured at cycle initiation |
cycle_time_min |
MES / Cycle Log | Timestamp delta: door-close to quench-start, extracted from event log |
In real-world operations, this dataset does not exist in a single system. Rack configuration comes from the MES production order, part mass from the ERP part master, and cycle timestamps from the PLC event log — three separate extractions that require a join on rack ID and shift timestamp before a single row of training data can be assembled.
Algorithm: Multiple Linear Regression — sklearn.linear_model.LinearRegression + statsmodels OLS
Linear regression is the right model here for the same reason it was chosen in classical industrial statistics: when the relationship is additive and each variable measures a distinct physical dimension, OLS is unbiased, interpretable, and statistically verifiable. The five features in this dataset are nearly orthogonal (VIF < 1.02 for all), meaning each coefficient measures a clean, independent process effect.
Two model representations are used simultaneously: scikit-learn for predictions and the simulator, statsmodels for the full inferential layer — p-values, confidence intervals, F-statistic, and regression diagnostics. Both agree on coefficients; statsmodels adds the statistical accountability that turns a number into an engineering argument.
Preprocessing: No scaling required for OLS prediction — coefficients absorb unit differences and express each effect in physical units (minutes per kg, minutes per part type, etc.).
Split: 80/20 train/test, random_state=42.
| Metric | Value | Operational Meaning |
|---|---|---|
| R² | 0.847 | 84.7% of cycle-time variance explained by five process inputs |
| Adj. R² | 0.843 | All five features earn their place — penalty for predictors barely changes R² |
| RMSE | 11.17 min | Typical prediction error — scheduler builds ±12 min buffer into slot assignments |
| MAE | 9.21 min | Median absolute error — under 4% of the 300 min mean cycle time |
| F-statistic | 429.1 (p ≈ 0) | Five features are jointly and highly significant |
| Train / Test | 400 / 100 cycles | 80/20 split, random_state=42 |
| Feature | Coefficient | Direction | Engineering Interpretation |
|---|---|---|---|
avg_piece_mass_kg |
+22.42 min/kg | ↑ Increases cycle | Dominant driver — thermal mass saturates the cycle; +1 kg avg = +22 min |
part_mix_count |
+10.26 min/type | ↑ Increases cycle | Metallurgical complexity — tightest alloy window governs the whole rack |
burner_power_pct |
−0.35 min/% | ↓ Reduces cycle | Primary controllable reduction lever — 10% more power saves ~3.5 min |
rack_piece_count |
+0.30 min/piece | ↑ Increases cycle | Loading density — more pieces means more total mass to heat |
avg_furnace_temp_c |
−0.22 min/°C | ↓ Reduces cycle | Secondary lever — running 40°C hotter shaves ~9 min |
Every coefficient carries p < 0.001 and VIF < 1.02. These are not competing signals — they are additive, orthogonal, and directly actionable.
The model's most operationally significant output is not a point prediction — it is the systematic estimation of cycle times for 100 previously unschedulable part numbers.
Before this project, those 100 low-volume PNs had no slot estimates. Planners either left them out of the schedule or padded their slots with worst-case buffers. The gap-fill exercise applies the trained model to all 100 PNs under a standard rack condition (120 pieces, 880°C, 85% burner, single PN type), producing a complete planning reference table — with known model uncertainty of ±11.2 min.
This is calibrated inference, not extrapolation. The mass range of the low-volume PNs (0.6–4.0 kg) falls entirely within the training envelope.
Three operational scenarios demonstrate the model's planning value:
| Scenario | Configuration | Predicted | Notes |
|---|---|---|---|
| A — Light Rack, High Efficiency | 120 pcs · 1.2 kg · 1 PN · 900°C · 92% | 239.2 min (3.99 h) | Fastest achievable — light, homogeneous, high temp |
| B — Heavy Mixed Rack, Risk Cycle | 200 pcs · 2.8 kg · 4 PNs · 860°C · 78% | 343.6 min (5.73 h) | Worst-case — dense, mixed, cool, under-powered |
| C — Heavy Rack, Corrected Recipe | 200 pcs · 2.8 kg · 4 PNs · 900°C · 95% | 328.7 min (5.48 h) | +40°C and +17% power saves 15.0 min (4.4%) |
The B→C recovery shows that temperature and burner power — the two controllable levers — can partially compensate for a difficult rack configuration. The scheduler now has a number to work with, not a guess.
-
Coefficients are the product, not the model. In industrial regression, the equation itself is the deliverable — each coefficient answers a quantitative question that the process team can act on today without opening a laptop.
-
Adj. R² barely moving from R² validates feature selection. When the penalty for adding predictors barely changes the score (0.847 → 0.843), all five variables carry genuine signal. No feature is diluting the model.
-
Low-volume PNs are not unknowable — they are unmeasured. The gap-fill exercise turns a planning blindspot into a scheduling table. The difference between "we don't know" and "we haven't measured it yet" matters enormously for operations.
-
statsmodels alongside scikit-learn is not redundant — it is necessary. Scikit-learn gives predictions; statsmodels gives accountability. p-values and confidence intervals are what turn a model into an engineering argument that a plant manager will accept.
-
The scheduler's uncertainty is now quantified. Before the model, a planner building a slot for a 2.8 kg, 4-PN rack had no reference. After the model, they have a 343.6 min estimate with a ±11 min known error band. That is the difference between buffering by intuition and buffering by data.
MR_Multi_Factor_Process_Impact/
├── 08_Multi_Factor_Process_Impact.ipynb ← Main notebook
├── quench_data.csv ← complete data set (GitHub public)
├── requirements.txt ← Python dependencies
└── README.md ← This file
✅ This project is completely free. Full dataset and simulator included. If this helped you, check out the rest of the portfolio at lozanolsa.gumroad.com.
git clone https://github.com/LozanoLsa/MR_Multi_Factor_Process_Impact
cd MR_Multi_Factor_Process_Impact
pip install pandas numpy scikit-learn statsmodels matplotlib seaborn scipy jupyter
jupyter notebook 08_Multi_Factor_Process_Impact.ipynb
**Next Option — Run the simulator:**
```bash
python app.pyOr click the badge above to run it directly in Google Colab — no installation needed.
—
Luis Lozano | Operational Excellence Manager · Master Black Belt · Machine Learning
GitHub: LozanoLsa · Gumroad: lozanolsa.gumroad.com
Turning Operations into Predictive Systems — Clone it. Fork it. Improve it.