From 01b32c65e0d0a3628b243dd9ad1900dcb142c4a0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E2=80=98topepo=E2=80=99?= <‘mxkuhn@gmail.com’>
Date: Mon, 5 Aug 2024 13:35:08 -0400
Subject: [PATCH] implementation notes

---
 quantile-regression/README.md | 69 +++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)
 create mode 100644 quantile-regression/README.md

diff --git a/quantile-regression/README.md b/quantile-regression/README.md
new file mode 100644
index 0000000..05776f1
--- /dev/null
+++ b/quantile-regression/README.md
@@ -0,0 +1,69 @@
+# Quantile Regression
+
+This discusses how we can include quantile regression in tidymodels. 
+
+The discussion below is organized by package
+
+## parsnip
+
+Users will probably have to specify the quantiles at the specification time (see section on new models). 
+
+### Engines and Modes
+
+The main question is: Should we make new engines available via `set_engine()` or create a new mode of `"quantile regression"`? 
+
+Pros of `set_engine()`: 
+
+* Most (but not all) quantile regression predictions will be for numeric modes. 
+* There is not much difference between quantile models that require a new mode (unlike censored regression). For example, our yardstick functions to compute loss for quantile regression can ingest the list column of predictions (similar to dynamic survival models). 
+* Where do we specify the list of quantiles? It would be suboptimal to add a main option to each function. The users could set it in `set_engine(),`, but that would require more specialized parsing of the return value of that function. 
+
+Cons of `set_engine()`: 
+
+* There could be confusion around the type of model and prediction being made. Suppose that `rpart` could make quantile predictions (it cannot). Would use an engine of “part” result in the regular CART model or one optimized via quantile loss? 
+* Since the list of quintiles would be specified upfront, it would be advantageous to do this with the new mode (`set_mode(“quantgreg,” quantiles = (1:9) / 10)`). 
+
+To test using a new mode, there is a [parsnip branch](https://github.com/tidymodels/parsnip/tree/quantile-mode) that enables that so you can use: 
+
+```
+pak::pak(c("tidymodels/parsnip@quantile-mode"), ask = FALSE)
+```
+
+### New Models
+
+Some packages already fit these models, notably quantreg and quantregForest. 
+
+We can also make our own (I might do this with brulee). If so, the model functions should try to emulate the main arguments that we have. So for a neural network, use `hidden_units`, `penalty`, etc. 
+
+Many models need a specific training run for each quantile value (since the loss depends on it). For this reason, we would have users specify the required quantiles when they specify the model. 
+
+We should make a general class for these models in an engine-specific package. 
+
+#### Predictions
+
+We currently note tin `?predict.model_fit`: 
+
+> For `type = "quantile"`, the tibble has a `.pred` column, which is a list-column. Each list element contains a tibble with columns `.pred` and `.quantile` (and perhaps other columns).
+
+This should be changed so the list column is called `.pred_quantile` if we want `.pred` to contain the 50% quantile results. 
+
+The format of the list column would be 
+
+```
+tibble(.quantile = numeric(), .pred_quantile = numeric())
+```
+
+## yardstick
+
+We need one or more model performance metrics for quantile loss that work across all quantiles. They will need a new class too (say `“quantile_metric”`). 
+
+## tune
+
+We will need an `estimate_quantiles()` (analogous to functions such as [`estimate_class_prob()`](https://github.com/tidymodels/tune/blob/main/R/grid_performance.R#L108C1-L108C20)). 
+
+Some specific callouts: 
+
+- https://github.com/tidymodels/tune/blob/main/R/grid_performance.R#L4
+- https://github.com/tidymodels/tune/blob/main/R/grid_performance.R#L98
+
+