Heat Treatment Cycle Time Intelligence — Multi-Factor Regression for Quench Rack Scheduling

"You can't schedule what you can't measure. But you can estimate it."

🎯 Business Problem

In heat treatment operations, time is the invisible raw material. Every rack that enters a muffle furnace occupies it for a block of time that no other job can reclaim. Getting that estimate wrong — or not having it at all — means scheduling slots built on intuition, buffers padded beyond what physics requires, and planners making decisions with one hand tied behind their back.

The plant in this study runs ~300 part numbers through a single quench rack furnace. For 200 of those PNs, historical cycle records exist. For the remaining 100 low-volume PNs, no such history was ever captured — making them effectively invisible to the scheduling system.

This project closes that gap with multiple linear regression. A model trained on 500 observed furnace cycles learns how five process variables jointly determine cycle time. That model is then used in two ways: real-time estimation before a job starts, and a planning gap-fill that gives the scheduling team quantitative cycle time estimates for all 100 unobserved part numbers — for the first time.

📊 Dataset

500 furnace cycle records captured from the muffle furnace PLC and plant MES layer
Target: cycle_time_min — total furnace cycle time per rack (continuous, minutes)
Range: 208.1 – 382.4 min | Mean: 299.9 min | Std: 30.3 min
Source: 200 high-volume part numbers with sufficient production history

Column	Type	Description
`rack_piece_count`	int	Total pieces loaded in the rack
`avg_piece_mass_kg`	float	Weighted average mass per piece (kg)
`part_mix_count`	int	Number of distinct part numbers in the rack (1–5)
`avg_furnace_temp_c`	float	Mean furnace chamber temperature during cycle (°C)
`burner_power_pct`	float	Burner set-point during the hold phase (%)
`cycle_time_min`	float	Target — total furnace cycle time (minutes)

Data Origin (Real-World Perspective)

Variable(s)	Source System	Notes
`rack_piece_count`	MES / Production Order	Logged at rack load confirmation before furnace door closes
`avg_piece_mass_kg`	ERP / Part Master	Nominal mass from engineering drawing or incoming weighing
`part_mix_count`	MES / Work Order Routing	Number of distinct PNs on the rack — derived from the production order
`avg_furnace_temp_c`	Furnace PLC / SCADA	Mean thermocouple reading during the hold phase of the cycle
`burner_power_pct`	Furnace PLC	Burner set-point captured at cycle initiation
`cycle_time_min`	MES / Cycle Log	Timestamp delta: door-close to quench-start, extracted from event log

In real-world operations, this dataset does not exist in a single system. Rack configuration comes from the MES production order, part mass from the ERP part master, and cycle timestamps from the PLC event log — three separate extractions that require a join on rack ID and shift timestamp before a single row of training data can be assembled.

🤖 Model

Algorithm: Multiple Linear Regression — sklearn.linear_model.LinearRegression + statsmodels OLS

Linear regression is the right model here for the same reason it was chosen in classical industrial statistics: when the relationship is additive and each variable measures a distinct physical dimension, OLS is unbiased, interpretable, and statistically verifiable. The five features in this dataset are nearly orthogonal (VIF < 1.02 for all), meaning each coefficient measures a clean, independent process effect.

Two model representations are used simultaneously: scikit-learn for predictions and the simulator, statsmodels for the full inferential layer — p-values, confidence intervals, F-statistic, and regression diagnostics. Both agree on coefficients; statsmodels adds the statistical accountability that turns a number into an engineering argument.

Preprocessing: No scaling required for OLS prediction — coefficients absorb unit differences and express each effect in physical units (minutes per kg, minutes per part type, etc.). Split: 80/20 train/test, random_state=42.

📈 Key Results

Metric	Value	Operational Meaning
R²	0.847	84.7% of cycle-time variance explained by five process inputs
Adj. R²	0.843	All five features earn their place — penalty for predictors barely changes R²
RMSE	11.17 min	Typical prediction error — scheduler builds ±12 min buffer into slot assignments
MAE	9.21 min	Median absolute error — under 4% of the 300 min mean cycle time
F-statistic	429.1 (p ≈ 0)	Five features are jointly and highly significant
Train / Test	400 / 100 cycles	80/20 split, `random_state=42`

Feature Coefficients — What Each Variable Costs in Minutes

Feature	Coefficient	Direction	Engineering Interpretation
`avg_piece_mass_kg`	+22.42 min/kg	↑ Increases cycle	Dominant driver — thermal mass saturates the cycle; +1 kg avg = +22 min
`part_mix_count`	+10.26 min/type	↑ Increases cycle	Metallurgical complexity — tightest alloy window governs the whole rack
`burner_power_pct`	−0.35 min/%	↓ Reduces cycle	Primary controllable reduction lever — 10% more power saves ~3.5 min
`rack_piece_count`	+0.30 min/piece	↑ Increases cycle	Loading density — more pieces means more total mass to heat
`avg_furnace_temp_c`	−0.22 min/°C	↓ Reduces cycle	Secondary lever — running 40°C hotter shaves ~9 min

Every coefficient carries p < 0.001 and VIF < 1.02. These are not competing signals — they are additive, orthogonal, and directly actionable.

🗓️ The Planning Gap-Fill

The model's most operationally significant output is not a point prediction — it is the systematic estimation of cycle times for 100 previously unschedulable part numbers.

Before this project, those 100 low-volume PNs had no slot estimates. Planners either left them out of the schedule or padded their slots with worst-case buffers. The gap-fill exercise applies the trained model to all 100 PNs under a standard rack condition (120 pieces, 880°C, 85% burner, single PN type), producing a complete planning reference table — with known model uncertainty of ±11.2 min.

This is calibrated inference, not extrapolation. The mass range of the low-volume PNs (0.6–4.0 kg) falls entirely within the training envelope.

🔧 Simulation & Scenarios

Three operational scenarios demonstrate the model's planning value:

Scenario	Configuration	Predicted	Notes
A — Light Rack, High Efficiency	120 pcs · 1.2 kg · 1 PN · 900°C · 92%	239.2 min (3.99 h)	Fastest achievable — light, homogeneous, high temp
B — Heavy Mixed Rack, Risk Cycle	200 pcs · 2.8 kg · 4 PNs · 860°C · 78%	343.6 min (5.73 h)	Worst-case — dense, mixed, cool, under-powered
C — Heavy Rack, Corrected Recipe	200 pcs · 2.8 kg · 4 PNs · 900°C · 95%	328.7 min (5.48 h)	+40°C and +17% power saves 15.0 min (4.4%)

The B→C recovery shows that temperature and burner power — the two controllable levers — can partially compensate for a difficult rack configuration. The scheduler now has a number to work with, not a guess.

💡 Key Learnings

Coefficients are the product, not the model. In industrial regression, the equation itself is the deliverable — each coefficient answers a quantitative question that the process team can act on today without opening a laptop.
Adj. R² barely moving from R² validates feature selection. When the penalty for adding predictors barely changes the score (0.847 → 0.843), all five variables carry genuine signal. No feature is diluting the model.
Low-volume PNs are not unknowable — they are unmeasured. The gap-fill exercise turns a planning blindspot into a scheduling table. The difference between "we don't know" and "we haven't measured it yet" matters enormously for operations.
statsmodels alongside scikit-learn is not redundant — it is necessary. Scikit-learn gives predictions; statsmodels gives accountability. p-values and confidence intervals are what turn a model into an engineering argument that a plant manager will accept.
The scheduler's uncertainty is now quantified. Before the model, a planner building a slot for a 2.8 kg, 4-PN rack had no reference. After the model, they have a 343.6 min estimate with a ±11 min known error band. That is the difference between buffering by intuition and buffering by data.

🗂️ Repository Structure

MR_Multi_Factor_Process_Impact/
├── 08_Multi_Factor_Process_Impact.ipynb   ← Main notebook
├── quench_data.csv                         ← complete data set (GitHub public)
├── requirements.txt                        ← Python dependencies
└── README.md                               ← This file

✅ This project is completely free. Full dataset and simulator included. If this helped you, check out the rest of the portfolio at lozanolsa.gumroad.com.

🚀 How to Run

git clone https://github.com/LozanoLsa/MR_Multi_Factor_Process_Impact
cd MR_Multi_Factor_Process_Impact
pip install pandas numpy scikit-learn statsmodels matplotlib seaborn scipy jupyter
jupyter notebook 08_Multi_Factor_Process_Impact.ipynb

**Next Option — Run the simulator:**
```bash
python app.py

Or click the badge above to run it directly in Google Colab — no installation needed.

—

👤 Author

Luis Lozano | Operational Excellence Manager · Master Black Belt · Machine Learning
GitHub: LozanoLsa · Gumroad: lozanolsa.gumroad.com

Turning Operations into Predictive Systems — Clone it. Fork it. Improve it.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
08_Multi_Factor_Process_Impact.ipynb		08_Multi_Factor_Process_Impact.ipynb
LICENSE		LICENSE
Quench_Cycle_Time.pdf		Quench_Cycle_Time.pdf
README.md		README.md
app.py		app.py
cover_08.png		cover_08.png
quench_data.csv		quench_data.csv
requirements.txt		requirements.txt
thumb_08.png		thumb_08.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heat Treatment Cycle Time Intelligence — Multi-Factor Regression for Quench Rack Scheduling

🎯 Business Problem

📊 Dataset

Data Origin (Real-World Perspective)

🤖 Model

📈 Key Results

Feature Coefficients — What Each Variable Costs in Minutes

🗓️ The Planning Gap-Fill

🔧 Simulation & Scenarios

💡 Key Learnings

🗂️ Repository Structure

🚀 How to Run

👤 Author

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heat Treatment Cycle Time Intelligence — Multi-Factor Regression for Quench Rack Scheduling

🎯 Business Problem

📊 Dataset

Data Origin (Real-World Perspective)

🤖 Model

📈 Key Results

Feature Coefficients — What Each Variable Costs in Minutes

🗓️ The Planning Gap-Fill

🔧 Simulation & Scenarios

💡 Key Learnings

🗂️ Repository Structure

🚀 How to Run

👤 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages