Skip to content

Post-processing write-up#27

Open
topepo wants to merge 4 commits into
mainfrom
post-proc
Open

Post-processing write-up#27
topepo wants to merge 4 commits into
mainfrom
post-proc

Conversation

@topepo
Copy link
Copy Markdown
Member

@topepo topepo commented May 12, 2023

No description provided.

Comment thread post-processing/readme.qmd Outdated

wflow_individ <-
wflow_2 %>%
add_prob_calibration(cal_object) %>%
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note about where cal_object comes from

Comment thread post-processing/readme.qmd Outdated
# 'object' is a pre-made calibration object
# Data Inputs: class probabilities
# Data Outputs: class probabilities and recomputed class predictions
add_prob_calibration(object, .priority = 1.0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these also have x as a first argument. Where x is the workflow

# Potentially tunable
# Data Inputs: class probabilities
# Data Outputs: class predictions
add_prob_threshold(threshold = numeric(), .priority = 2.0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need a levels argument and an ordered argument? Like probably::make_two_class_pred()

# Potentially tunable
# Data Inputs: class probabilities
# Data Outputs: class predictions
add_cls_eq_zone(value = numeric(), threshold = numeric(), .priority = 3.0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In probably probably::make_class_pred() had a buffer argument that created a range of [threshold - buffer[1], threshold + buffer[2]] where anything inside the buffer range was marked equivocal. Maybe you could use the buffer arg here?


Maybe also name it something similar to add_prob_threshold() like add_prob_threshold_buffered() where:

  • add_prob_threshold() always returns a factor (maybe ordered)
  • add_prob_threshold_buffered() always returns a <class_pred> from probably

# User will have to always set the priority
# Data Inputs: all predictions
# Data Outputs: all predictions (only these columns are retained)
add_post_mutate(..., .priority)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider giving all of these functions a common prefix that differentiates them from the other add_*() functions, like add_post_*():

add_post_calibration() # do you need prob vs reg calibration? can you just "figure it out"? can it be an argument?
add_post_threshold()
add_post_threshold_buffered()
add_post_mutate()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conflicted on this. I agree this would be nice for tab completion, but is inconsistent with the naming convention for preprocessors: add_variables(), add_recipe(), add_formula().

Comment thread post-processing/readme.qmd Outdated
- A container for the list of possible post-processors specified by the user.
- A validation system to resolve conflicts in type or priority.
- An interface to apply the operations to the predicted values.
- The requisite package dependencies (primarily the probably package)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mildly worried about the number of Imports that dev probably has. It is fairly high, and might be worth it to go back and see if some of them can be moved to Suggests as optional deps


- `new_stage_post()`, and `new_action_post()` are existing constructors.

We will require a `.fit_post(workflow, data)` that will execute _only_ the post-processing operations; the `workflows` object will already have trained the `pre` and `fit` stages.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is really an "internal" function, I think we can just assume that the workflow has already trained the pre and fit stages, i.e. we don't need to try to do any checks to see if that is true or not. It should only be used by workflows internally and tune

@simonpcouch
Copy link
Copy Markdown

Looking at this now—will merge some of Davis' suggestions and push some small edits. Will leave a review when finished!

simonpcouch and others added 2 commits May 16, 2023 12:50
Co-authored-by: Davis Vaughan <davis@rstudio.com>
Copy link
Copy Markdown

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few big-picture edits from me beyond Davis' comments!

- An interface to apply the operations to the predicted values.
- The requisite package dependencies (primarily the probably package).

The dwai code may eventually make its way into workflows or probably.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is our plan, I might argue we put this functionality in workflows or probably from the get-go. Feels a bit like this living its own package could eventually feel like technical debt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants