Author: Mahdi Rajaee
Institution: Politecnico di Torino
Date: 17/02/2025
Gaussian Process (GP) regression is a powerful non-parametric technique for modeling complex, continuous-valued functions. Unlike traditional linear regression, GP regression places a prior over functions, enabling robust uncertainty quantification and flexible function fitting. This report presents two applications of GP regression:
- Synthetic Data Experiment: Generating and analyzing a Gaussian random process filtered by a Gaussian kernel.
- Parkinson’s Disease Dataset: Predicting the
total_UPDRSmeasure (Unified Parkinson’s Disease Rating Scale) from clinical features.
The primary goals are:
- Demonstrating how to build and use GP regression for a time-series (synthetic data).
- Showing how to apply GP regression to real-world clinical data (Parkinson’s dataset).
- Exploring hyperparameter tuning (length scale (r), noise variance (\sigma_n^2)) to minimize prediction error and compare with Linear Least Squares (LLS).
We generate a white Gaussian noise signal (x(t)) of length (N_p = 200), then convolve it with a Gaussian impulse response (h(t)) of finite duration to form a synthetic Gaussian random process (y(t)). Specifically:
-
White Noise:
[ x \sim \mathcal{N}(0, 1) ] for (t \in {0,1,\ldots,N_p-1}). -
Gaussian Filter:
[ h(t) = \exp\left(- \frac{t^2}{T^2}\right) ] normalized to unit energy. -
Filtered Output:
[ y(t) = (x * h)(t), ] convolved withmode='same'so the output matches the original time support. -
Impulse Response & Realization: We visualize:
- Impulse response (h(t)) vs. (t).
- Process realization (y(t)) vs. (t).
The filter design ensures (y(t)) is a zero-mean Gaussian process. Its autocorrelation (R_Y(\tau)) is analytically equivalent to a Gaussian:
[ R_Y(\tau) = \exp\left(-\frac{\tau^2}{2T^2}\right). ]
From a finite set of points ({t_k}), we build the covariance matrix (R) via:
[ R_{ij} = R_Y(t_i - t_j). ]
- Sampling: Select (M = 10) random indices ({t_k}) as training points ({(t_k, y(t_k))}).
- Test Point: Randomly pick another point (t_*) for regression.
- Covariance Matrix: Compute:
[
R_{Y,N} =
\begin{pmatrix}
R_{M\times M} & k \
k^T & d
\end{pmatrix},
]
where
- (R_{M\times M}) is the training covariance submatrix,
- (k) is the cross-covariance with the test point,
- (d) is the test point’s variance.
- GP Prediction: [ \hat{y}(t_*) = k^T , R_{M\times M}^{-1} , \mathbf{y} \quad\text{and}\quad \sigma^2 = d - k^T , R_{M\times M}^{-1} , k. ]
- Results:
- We plot training points (markers), true process (y(t)), and the GP estimate at (t_*).
- Compare with Linear Least Squares (LLS) by fitting (y(t) \approx a,t + b).
- Display the confidence interval (\pm \sigma).
Outcome: Repeated experiments confirm GP regression can accurately reconstruct missing points from a small training set, outperforming a simple LLS fit especially when data is distinctly non-linear.
The Parkinson’s dataset contains clinical features and UPDRS measures. For this experiment:
- Regressand:
total_UPDRS - Regressors:
motor_UPDRS,age,PPE(3 features) - Dataset is shuffled and split into:
- Training set (50%)
- Validation set (25%)
- Test set (25%)
- We normalize each feature by subtracting training mean and dividing by training std.
Following the same GP framework: [ R_{Y}(n,k) = \theta \exp\Bigl(-\frac{|x_n - x_k|^2}{2r^2}\Bigr) + \sigma_{\nu}^2 \delta_{n,k}. ] Because the data is normalized, we set:
- (\theta = 1),
- Hyperparameters to tune: (r^2) (length scale) and (\sigma_{\nu}^2) (noise variance).
Implementation Steps:
- Nearest Neighbors: For computational efficiency, we often pick (N = 10) neighbors from the training set closest to a validation point (x).
- Build Sub-covariance among these (N-1) neighbors plus the target point.
- Predict (\hat{y}) using the GP formula.
- Evaluate the MSE on the validation set.
- Tune (r^2) and (\sigma_{\nu}^2) to minimize validation MSE (grid search).
We grid search over a range of plausible (r^2) and (\sigma_{\nu}^2) values:
- Range for (r^2): e.g.
[0.1, 0.5, 1.0, 2.0, ..., 5.0]. - Range for (\sigma_{\nu}^2): e.g.
[1e-4, 5e-4, 1e-3, ..., 1e-2]. - Compute validation MSE (\min |y_\text{val} - y_\text{val_pred}|^2).
Optimal Hyperparameters are then selected to minimize the validation error.
With the best (r^2) and (\sigma_{\nu}^2) found, we:
- Predict
total_UPDRSfor every test sample. - Un-normalize predictions and compute:
- Mean Error, Standard Deviation, MSE of
y_test - y_hat. - Histogram of prediction errors.
- (R^2) Score to measure goodness of fit.
- Mean Error, Standard Deviation, MSE of
- Plot predicted vs. true unnormalized
total_UPDRSwith (3\sigma) error bars.
Result: A typical improvement over LLS in capturing non-linearities of the clinical data.
This research demonstrates Gaussian Process Regression on both synthetic and real-world datasets. Key insights:
-
Synthetic Data:
- GP regression accurately recovers process values using a small subset of points.
- Confidence intervals reflect true process variability better than linear methods.
-
Parkinson’s Disease Dataset:
- GP regression with tuned hyperparameters provides a stronger predictive performance compared to basic LLS.
- The ability to model non-linear relationships between
motor_UPDRS,age,PPE, andtotal_UPDRSis crucial in medical applications.
-
Hyperparameter Tuning:
- Proper selection of length scale (r) and noise variance (\sigma_{\nu}^2) significantly impacts prediction accuracy.
- Using a validation set is essential to avoid overfitting.
-
Comparison with LLS:
- While LLS can be a baseline approach, GP typically yields lower MSE and provides uncertainty estimates.
Overall, Gaussian Process Regression proves to be a flexible, robust approach for both synthetic and real-world data, and it is highly recommended for tasks where uncertainty quantification and non-linear modeling are paramount.