From 5ca59dd5c57e997e27eeb9a13171b52269ba733d Mon Sep 17 00:00:00 2001 From: Phil Henrickson Date: Tue, 20 Aug 2024 13:57:48 -0500 Subject: [PATCH 1/3] updates to targets; adding production --- docs/index.html | 3 +- docs/materials/01-git.html | 128 +++++++-------- docs/materials/02-renv.html | 2 +- docs/materials/03-targets.html | 169 ++++++++++---------- docs/materials/04-production.html | 256 ++++++++++++++++++++++++------ docs/search.json | 96 ++++++----- index.qmd | 5 +- materials/04-production.qmd | 51 +++--- 8 files changed, 436 insertions(+), 274 deletions(-) diff --git a/docs/index.html b/docs/index.html index da75d9b..94c3998 100644 --- a/docs/index.html +++ b/docs/index.html @@ -115,7 +115,7 @@

What They Didn’t Teach You About Data Science

About

-

This workshop focuses on using R/RStudio with Git, renv, and targets. This will be pretty hands on, so it will be helpful to configure as much as possible in advance. I can help debug if we have any issues on the day.

+

This workshop focuses on using R/RStudio with Git/GitHub, renv, and targets. This will be pretty hands on, so it will be helpful to configure as much as possible in advance. I can help debug if we have any issues on the day.

Setup

These are the following pieces to set up in advance:

@@ -137,6 +137,7 @@

Schedule

  • Git + GitHub
  • renv
  • targets
  • +
  • production
  • diff --git a/docs/materials/01-git.html b/docs/materials/01-git.html index 53ad961..4ffe05a 100644 --- a/docs/materials/01-git.html +++ b/docs/materials/01-git.html @@ -3922,7 +3922,7 @@

    -
    +
    @@ -4071,7 +4071,7 @@

    -
    +
    @@ -4193,20 +4193,20 @@

    -
    - @@ -4799,15 +4799,15 @@

    -
    +

    -
    +

    -
    +

    We realize that we shouldn’t be calculating sentiment at the line-level and then aggregating, because short positive statements potentially end up getting as much weight as longer complaints.

    @@ -5513,7 +5513,7 @@

    -
    +
    10:00
    @@ -5694,7 +5694,7 @@

    -
    +
    10:00
    @@ -6061,7 +6061,7 @@

    -
    +
    10:00
    @@ -6182,7 +6182,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/02-renv.html b/docs/materials/02-renv.html index 1e31f26..3364ed7 100644 --- a/docs/materials/02-renv.html +++ b/docs/materials/02-renv.html @@ -2342,7 +2342,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/03-targets.html b/docs/materials/03-targets.html index 2886042..0ac52c6 100644 --- a/docs/materials/03-targets.html +++ b/docs/materials/03-targets.html @@ -9540,8 +9540,8 @@

    Or, we can visualize the pipeline using tar_glimpse().

    -
    - +
    +
    @@ -9551,8 +9551,8 @@

    tar_visnetwork() provides a more detailed breakdown of the pipeline, including the status of individual targets, as well as the functions and where they are used.

    -
    - +
    +

    @@ -9567,7 +9567,7 @@

    #> v skipped target starwars
     #> v skipped target sentences
     #> v skipped target sentiment
    -#> v skipped pipeline [0.043 seconds]
    +#> v skipped pipeline [0.044 seconds]
    @@ -9584,7 +9584,7 @@

    -
    +
    @@ -9800,7 +9800,7 @@

    -
    +
    15:00
    @@ -10612,16 +10612,16 @@

    #> # A tibble: 10 × 2 #> splits id #> <list> <chr> -#> 1 <split [32/11]> Bootstrap01 -#> 2 <split [32/13]> Bootstrap02 -#> 3 <split [32/12]> Bootstrap03 -#> 4 <split [32/13]> Bootstrap04 -#> 5 <split [32/13]> Bootstrap05 -#> 6 <split [32/13]> Bootstrap06 -#> 7 <split [32/12]> Bootstrap07 -#> 8 <split [32/11]> Bootstrap08 -#> 9 <split [32/9]> Bootstrap09 -#> 10 <split [32/15]> Bootstrap10 +#> 1 <split [32/10]> Bootstrap01 +#> 2 <split [32/9]> Bootstrap02 +#> 3 <split [32/10]> Bootstrap03 +#> 4 <split [32/14]> Bootstrap04 +#> 5 <split [32/8]> Bootstrap05 +#> 6 <split [32/12]> Bootstrap06 +#> 7 <split [32/14]> Bootstrap07 +#> 8 <split [32/10]> Bootstrap08 +#> 9 <split [32/12]> Bootstrap09 +#> 10 <split [32/13]> Bootstrap10
    @@ -10657,7 +10657,7 @@

    one_split
    #> <Analysis/Assess/Total>
    -#> <32/11/32>
    +#> <32/10/32>
    @@ -10670,55 +10670,54 @@

    rsample::training()
    #>                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Valiant                18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Cadillac Fleetwood     10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    -#> Ferrari Dino           19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    +#> Hornet 4 Drive...1     21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    +#> Valiant...2            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    +#> Hornet Sportabout...3  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    +#> Mazda RX4 Wag          21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    +#> Fiat X1-9...5          27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    +#> Toyota Corolla...6     33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> Lotus Europa...7       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    +#> Lotus Europa...8       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    +#> Valiant...9            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    +#> Lincoln Continental    10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    +#> Merc 450SE             16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    +#> Pontiac Firebird       19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    +#> Fiat X1-9...13         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
     #> Camaro Z28             13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    -#> Volvo 142E             21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    -#> Mazda RX4 Wag...6      21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    -#> Ford Pantera L...7     15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Datsun 710...8         22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> AMC Javelin...9        15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Lotus Europa...10      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Duster 360...11        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Lotus Europa...12      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Hornet 4 Drive...13    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Merc 280...14          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    +#> Fiat 128               32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    +#> Toyota Corolla...16    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> Toyota Corolla...17    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
     #> Merc 230               22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    -#> Pontiac Firebird       19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    -#> Toyota Corona...17     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> Hornet 4 Drive...18    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Merc 240D              24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    -#> Duster 360...20        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Hornet 4 Drive...21    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Chrysler Imperial...22 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Chrysler Imperial...23 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Ford Pantera L...24    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Mazda RX4 Wag...25     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    +#> Dodge Challenger       15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    +#> Valiant...20           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    +#> Ferrari Dino...21      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
     #> Merc 280C              17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    -#> Fiat X1-9              27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Datsun 710...28        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> Toyota Corona...29     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> AMC Javelin...30       15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Merc 280...31          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    -#> Toyota Corolla         33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 +#> Porsche 914-2...24 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 +#> Hornet 4 Drive...25 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 +#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 +#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 +#> Ferrari Dino...28 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 +#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 +#> Porsche 914-2...30 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 +#> Hornet Sportabout...31 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 +#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
    # extract test set
     one_split |>
     rsample::testing()
    -
    #>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    -#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    -#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    -#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    -#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    -#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    -#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    -#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    -#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    -#> Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    +
    #>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    +#> Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    +#> Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    +#> Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    +#> Merc 450SLC    15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    +#> Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    +#> Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    +#> AMC Javelin    15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    +#> Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    +#> Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    +#> Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    @@ -10776,7 +10775,7 @@

    geom_vline(xintercept = 0, linetype = 'dashed') - +

    metrics

    for predictive modeling workflows, rsample is typically used in conjunction with yardstick and tune to estimate model performance for a model or tune a model across parameters

    @@ -10808,11 +10807,11 @@

    mutate_if(is.numeric, round, 3)
    #> # A tibble: 3 × 6
    -#>   .metric .estimator   mean     n std_err .config             
    -#>   <chr>   <chr>       <dbl> <dbl>   <dbl> <chr>               
    -#> 1 ccc     standard    0.457    10   0.095 Preprocessor1_Model1
    -#> 2 rmse    standard   11.1      10   2.86  Preprocessor1_Model1
    -#> 3 rsq     standard    0.403    10   0.095 Preprocessor1_Model1
    +#> .metric .estimator mean n std_err .config +#> <chr> <chr> <dbl> <dbl> <dbl> <chr> +#> 1 ccc standard 0.488 10 0.104 Preprocessor1_Model1 +#> 2 rmse standard 9.22 10 2.38 Preprocessor1_Model1 +#> 3 rsq standard 0.436 10 0.084 Preprocessor1_Model1
    @@ -10959,7 +10958,7 @@

    -
    +
    10:00
    @@ -10988,7 +10987,7 @@

    -
    +
    10:00
    @@ -11011,21 +11010,21 @@

    #> # A tibble: 15 × 3
     #>    air_time distance arr_delay
     #>       <dbl>    <dbl> <fct>    
    -#>  1  -1.07    -1.15   on_time  
    -#>  2  -0.726   -0.676  on_time  
    -#>  3  -0.163   -0.0435 on_time  
    -#>  4  -0.109   -0.0231 on_time  
    -#>  5  -1.16    -1.15   on_time  
    -#>  6   1.64     1.93   on_time  
    -#>  7   2.09     1.94   late     
    -#>  8  -0.434   -0.429  on_time  
    -#>  9  -0.455   -0.379  on_time  
    -#> 10  -0.682   -0.680  on_time  
    -#> 11  -1.24    -1.15   on_time  
    -#> 12  -1.20    -1.19   late     
    -#> 13  -0.228   -1.11   on_time  
    -#> 14  -0.0225   0.0342 on_time  
    -#> 15  -0.423   -0.376  on_time
    +#> 1 -1.04 -1.01 late +#> 2 -0.855 -0.830 on_time +#> 3 -0.412 -0.409 on_time +#> 4 -0.325 -0.229 on_time +#> 5 -0.0225 0.0397 on_time +#> 6 1.78 2.08 on_time +#> 7 0.216 0.0465 on_time +#> 8 0.107 0.0465 on_time +#> 9 -0.542 -0.562 on_time +#> 10 -0.509 -0.379 late +#> 11 -0.423 -0.409 on_time +#> 12 -0.131 -0.229 on_time +#> 13 0.518 0.731 on_time +#> 14 1.48 1.65 on_time +#> 15 -0.855 -0.847 late

    @@ -11047,7 +11046,7 @@

    -
    +
    10:00
    @@ -11082,7 +11081,7 @@

    -
    +
    10:00
    @@ -11143,7 +11142,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/04-production.html b/docs/materials/04-production.html index 10487d2..b196ef8 100644 --- a/docs/materials/04-production.html +++ b/docs/materials/04-production.html @@ -1413,8 +1413,8 @@

    -
    -

    +
    +

    environments as code

    DevOps principles aim to create software that builds security, stability, and scalability into the software from the very beginning. The idea is to avoid building software that works locally, but doesn’t work well in collaboration or production.

    @@ -1422,10 +1422,10 @@

    So much of DevOps boils down to preventing the well-it-runs-on-my-machine problem.

    -
    +

    -
    +

    DevOps principles aim to create software that builds security, stability, and scalability into the software from the very beginning. The idea is to avoid building software that works locally, but doesn’t work well in collaboration or production.

    @@ -1441,14 +1441,14 @@

    -
    +

    How close are we to creating fully reproducible environments via code? What are we missing?

    -
    +

    -
    +

    How close are we to creating fully reproducible environments via code? What are we missing?

    We’ve only really covered one layer:

    @@ -1457,7 +1457,7 @@

    renv and venv allow us to create isolated virtual environments in which to execute our code.

    -
    +

    your data science environment is the stack of software and hardware below your code, from the R and Python packages you’re using right down to the physical hardware your code runs on.

    @@ -1473,7 +1473,7 @@

    -
    +

    We’ve covered creating and taking down one layer:

      @@ -1481,7 +1481,7 @@

    renv and venv allow us to create isolated virtual environments in which to execute our code.

    -
    +

    But there are three main layers to think about:

      @@ -1497,7 +1497,7 @@

      API keys, database credentials, ODBC drivers…

    -
    +

    But there are three main layers to think about:

      @@ -1509,19 +1509,19 @@

      Your code has to actually run on something. Even if it’s in the cloud it’s still running on a physical machine somewhere.

    -
    +

    -

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    +

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    -
    +

    -
    +

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    Then, it becomes a matter of reproducing each of these pieces via code. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.

    -
    +

    Let’s revisit the GitHub action we saw earlier.

    # name: updating the README
    @@ -1577,7 +1577,7 @@ 

    # git push origin || echo "No changes to commit" #
    -
    +

    This is essentially just a script that:

      @@ -1590,62 +1590,222 @@

    1. Renders the Quarto README and commits/pushes it to the repository
    -
    +

    Now, to be clear, this is a lot of work to just render a goddamn README.

    But we use the same setup to do more elaborate work, such as running the whole dang pipeline via a Github Action.

    -
    +

    We’ve been building pipelines with targets.

    +
    +

    If you run targets::tar_github_actions(), you will notice a new file .github/workflows/targets.yaml appears in your project working directory

    +
    +
    +
    
    +# MIT License
    +# Copyright (c) 2021 Eli Lilly and Company
    +# Author: William Michael Landau (will.landau at gmail)
    +# Written with help from public domain (CC0 1.0 Universal) workflow files by Jim Hester:
    +# * https://github.com/r-lib/actions/blob/master/examples/check-full.yaml
    +# * https://github.com/r-lib/actions/blob/master/examples/blogdown.yaml
    +#
    +# Permission is hereby granted, free of charge, to any person obtaining a copy
    +# of this software and associated documentation files (the "Software"), to deal
    +# in the Software without restriction, including without limitation the rights
    +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    +# copies of the Software, and to permit persons to whom the Software is
    +# furnished to do so, subject to the following conditions:
    +#
    +# The above copyright notice and this permission notice shall be included in all
    +# copies or substantial portions of the Software.
    +#
    +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    +# SOFTWARE.
    +
    +on:
    +  push:
    +    branches:
    +      - main
    +      - master
    +
    +name: targets
    +
    +jobs:
    +  targets:
    +    runs-on: ubuntu-latest
    +    env:
    +      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    +      RENV_PATHS_ROOT: ~/.local/share/renv
    +    steps:
    +      - uses: actions/checkout@v2
    +      - uses: r-lib/actions/setup-r@v2
    +      - uses: r-lib/actions/setup-pandoc@v2
    +
    +      - name: Install Mac system dependencies
    +        if: runner.os == 'macOS'
    +        run: brew install zeromq
    +
    +      - name: Install Linux system dependencies
    +        if: runner.os == 'Linux'
    +        run: |
    +          sudo apt-get install libcurl4-openssl-dev
    +          sudo apt-get install libssl-dev
    +          sudo apt-get install libzmq3-dev
    +
    +      - name: Cache packages
    +        uses: actions/cache@v1
    +        with:
    +          path: ${{ env.RENV_PATHS_ROOT }}
    +          key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
    +          restore-keys: ${{ runner.os }}-renv-
    +
    +      - name: Restore packages
    +        shell: Rscript {0}
    +        run: |
    +          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
    +          renv::restore()
    +
    +      - name: Check if previous runs exists
    +        id: runs-exist
    +        run: git ls-remote --exit-code --heads origin targets-runs
    +        continue-on-error: true
    +
    +      - name: Checkout previous run
    +        if: steps.runs-exist.outcome == 'success'
    +        uses: actions/checkout@v2
    +        with:
    +          ref: targets-runs
    +          fetch-depth: 1
    +          path: .targets-runs
    +
    +      - name: Restore output files from the previous run
    +        if: steps.runs-exist.outcome == 'success'
    +        run: |
    +          for (dest in scan(".targets-runs/.targets-files", what = character())) {
    +            source <- file.path(".targets-runs", dest)
    +            if (!file.exists(dirname(dest))) dir.create(dirname(dest), recursive = TRUE)
    +            if (file.exists(source)) file.rename(source, dest)
    +          }
    +        shell: Rscript {0}
    +
    +      - name: Run targets pipeline
    +        run: targets::tar_make()
    +        shell: Rscript {0}
    +
    +      - name: Identify files that the targets pipeline produced
    +        run: git ls-files -mo --exclude=renv > .targets-files
    +
    +      - name: Create the runs branch if it does not already exist
    +        if: steps.runs-exist.outcome != 'success'
    +        run: git checkout --orphan targets-runs
    +
    +      - name: Put the worktree in the runs branch if the latter already exists
    +        if: steps.runs-exist.outcome == 'success'
    +        run: |
    +          rm -r .git
    +          mv .targets-runs/.git .
    +          rm -r .targets-runs
    +
    +      - name: Upload latest run
    +        run: |
    +          git config --local user.name "GitHub Actions"
    +          git config --local user.email "actions@github.com"
    +          rm -r .gitignore .github/workflows
    +          git add --all -- ':!renv'
    +          for file in $(git ls-files -mo --exclude=renv)
    +          do
    +            git add --force $file
    +          done
    +          git commit -am "Run pipeline"
    +          git push origin targets-runs
    +
    +      - name: Prepare failure artifact
    +        if: failure()
    +        run: rm -rf .git .github .targets-files .targets-runs
    +
    +      - name: Post failure artifact
    +        if: failure()
    +        uses: actions/upload-artifact@main
    +        with:
    +          name: ${{ runner.os }}-r${{ matrix.config.r }}-results
    +          path: .
    +
    +
    +
    +

    +

    This generates a GitHub Action template that will reproduce your project environment, run the pipeline, and output the results.

    +
    +

    Note: you will still need to configure things on which your environemnt depends, such as API keys, database credentials, etc.

    +
    +
    +

    This also relies on using GitHub runners for your compute and storage, which are both low by design - they are not intended for heavy workloads.

    +
    +
    +

    But these illustrate the steps for reproducing your data science environment via code.

    +

    -

    For instance, we’ve been building a pipelines with targets.

    +

    How do

    -
    +

    +

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    +

    Then, it becomes a matter of reproducing each of these pieces via code. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.

    +

    This enables us to create separate environments in which we can do our development and testing before promoting code to production.

    -
    -

    environments as code

    +
    +

    -
    -

    project

    -

    What would we need to

    -

    we have to date covered:

    +
    +

    +

    This style of thinking is typically focused on things like software/applications, where different versions are incrementally developed, tested, and released as updates.

    +
    +

    How does data science differ?

    +
    +
    +
    +

    data science project architecture

    +

    What is the typical output of a data science project?

    +
      -
    • Git/GitHub for versioning and sharing our code
    • -
    • renv for reproducing our code dependencies
    • -
    • targets for creating repeatable pipelines
    • +
    • a job: a script that trains a model, updates a dataset, writes to a database
    -

    putting things into production is a matter of managing environments

    -
    -
    -
    -

    environment

    +
    +
      -
    • code
    • -
    • packages
    • -
    • system
    • -
    • hardware
    • +
    • an app: created in Shiny, Streamlit, Dash,
    - -
    -

    project architecture

    -

    What is the typical output of a data science project?

    +
    +
    +
      +
    • a report: a presentation, book, article, that is rendered from code
    • +
    +
      -
    • a job: a script that trains a model, updates a dataset, writes to a database

    • -
    • an app: created in Shiny, Streamlit, Dash,

    • -
    • a report: a presentation, book, article, that is rendered from code

    • -
    • an API

    • +
    • an API
    -
    +

    +

    thing back to where we left our flights project.

    + + + + + + +

    @@ -4071,7 +4071,7 @@

    -
    +
    @@ -4193,20 +4193,20 @@

    -
    - @@ -4799,15 +4799,15 @@

    -
    +

    -
    +

    -
    +

    We realize that we shouldn’t be calculating sentiment at the line-level and then aggregating, because short positive statements potentially end up getting as much weight as longer complaints.

    @@ -5513,7 +5513,7 @@

    -
    +
    10:00
    @@ -5694,7 +5694,7 @@

    -
    +
    10:00
    @@ -6061,7 +6061,7 @@

    -
    +
    10:00
    @@ -6182,7 +6182,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/02-renv.html b/docs/materials/02-renv.html index 3364ed7..b968a8c 100644 --- a/docs/materials/02-renv.html +++ b/docs/materials/02-renv.html @@ -2342,7 +2342,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/03-targets.html b/docs/materials/03-targets.html index 0ac52c6..5736ee4 100644 --- a/docs/materials/03-targets.html +++ b/docs/materials/03-targets.html @@ -9540,8 +9540,8 @@

    Or, we can visualize the pipeline using tar_glimpse().

    -
    - +
    +
    @@ -9551,8 +9551,8 @@

    tar_visnetwork() provides a more detailed breakdown of the pipeline, including the status of individual targets, as well as the functions and where they are used.

    -
    - +
    +

    @@ -9567,7 +9567,7 @@

    #> v skipped target starwars
     #> v skipped target sentences
     #> v skipped target sentiment
    -#> v skipped pipeline [0.044 seconds]
    +#> v skipped pipeline [0.043 seconds]
    @@ -9584,7 +9584,7 @@

    -
    +
    @@ -9800,7 +9800,7 @@

    -
    +
    15:00
    @@ -10612,16 +10612,16 @@

    #> # A tibble: 10 × 2 #> splits id #> <list> <chr> -#> 1 <split [32/10]> Bootstrap01 -#> 2 <split [32/9]> Bootstrap02 -#> 3 <split [32/10]> Bootstrap03 -#> 4 <split [32/14]> Bootstrap04 -#> 5 <split [32/8]> Bootstrap05 -#> 6 <split [32/12]> Bootstrap06 -#> 7 <split [32/14]> Bootstrap07 -#> 8 <split [32/10]> Bootstrap08 +#> 1 <split [32/9]> Bootstrap01 +#> 2 <split [32/10]> Bootstrap02 +#> 3 <split [32/11]> Bootstrap03 +#> 4 <split [32/16]> Bootstrap04 +#> 5 <split [32/12]> Bootstrap05 +#> 6 <split [32/13]> Bootstrap06 +#> 7 <split [32/12]> Bootstrap07 +#> 8 <split [32/13]> Bootstrap08 #> 9 <split [32/12]> Bootstrap09 -#> 10 <split [32/13]> Bootstrap10 +#> 10 <split [32/11]> Bootstrap10
    @@ -10657,7 +10657,7 @@

    one_split
    #> <Analysis/Assess/Total>
    -#> <32/10/32>
    +#> <32/9/32>
    @@ -10669,55 +10669,54 @@

    one_split |>rsample::training()
    -
    #>                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Hornet 4 Drive...1     21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Valiant...2            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Hornet Sportabout...3  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    -#> Mazda RX4 Wag          21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    -#> Fiat X1-9...5          27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Toyota Corolla...6     33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    -#> Lotus Europa...7       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Lotus Europa...8       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Valiant...9            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Lincoln Continental    10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    -#> Merc 450SE             16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    -#> Pontiac Firebird       19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    -#> Fiat X1-9...13         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Camaro Z28             13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    -#> Fiat 128               32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    -#> Toyota Corolla...16    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    -#> Toyota Corolla...17    33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    -#> Merc 230               22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    -#> Dodge Challenger       15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    -#> Valiant...20           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Ferrari Dino...21      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    -#> Merc 280C              17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    -#> Cadillac Fleetwood     10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    -#> Porsche 914-2...24     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Hornet 4 Drive...25    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Duster 360             14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Merc 280               19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    -#> Ferrari Dino...28      19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    -#> Chrysler Imperial      14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Porsche 914-2...30     26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Hornet Sportabout...31 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    -#> Merc 450SL             17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    +
    #>                          mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    +#> Volvo 142E...1          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    +#> Cadillac Fleetwood...2  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    +#> Lotus Europa            30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    +#> Volvo 142E...4          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    +#> Merc 230                22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    +#> Honda Civic...6         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    +#> Merc 280                19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    +#> Chrysler Imperial       14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    +#> Porsche 914-2...9       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    +#> Hornet Sportabout...10  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    +#> Porsche 914-2...11      26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    +#> Fiat X1-9...12          27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    +#> Honda Civic...13        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    +#> Mazda RX4 Wag           21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    +#> Fiat 128                32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    +#> Merc 450SLC             15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    +#> Camaro Z28              13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    +#> Cadillac Fleetwood...18 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    +#> Merc 450SL              17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    +#> Ferrari Dino            19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    +#> Duster 360              14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    +#> Merc 280C               17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    +#> Mazda RX4...23          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    +#> Toyota Corona...24      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    +#> Toyota Corona...25      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    +#> Toyota Corolla          33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> Mazda RX4...27          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    +#> Dodge Challenger        15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    +#> Ford Pantera L          15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    +#> Fiat X1-9...30          27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    +#> Volvo 142E...31         21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    +#> Hornet Sportabout...32  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    # extract test set
     one_split |>
     rsample::testing()
    -
    #>                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    -#> Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    -#> Merc 450SLC    15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    -#> Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    -#> Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> AMC Javelin    15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    -#> Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    +
    #>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    +#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    +#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    +#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    +#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    +#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    +#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    +#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    +#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    +#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    @@ -10775,7 +10774,7 @@

    geom_vline(xintercept = 0, linetype = 'dashed') - +

    metrics

    for predictive modeling workflows, rsample is typically used in conjunction with yardstick and tune to estimate model performance for a model or tune a model across parameters

    @@ -10809,9 +10808,9 @@

    #> # A tibble: 3 × 6
     #>   .metric .estimator  mean     n std_err .config             
     #>   <chr>   <chr>      <dbl> <dbl>   <dbl> <chr>               
    -#> 1 ccc     standard   0.488    10   0.104 Preprocessor1_Model1
    -#> 2 rmse    standard   9.22     10   2.38  Preprocessor1_Model1
    -#> 3 rsq     standard   0.436    10   0.084 Preprocessor1_Model1
    +#> 1 ccc standard 0.634 10 0.086 Preprocessor1_Model1 +#> 2 rmse standard 7.92 10 2.50 Preprocessor1_Model1 +#> 3 rsq standard 0.546 10 0.079 Preprocessor1_Model1
    @@ -10958,7 +10957,7 @@

    -
    +
    10:00
    @@ -10987,7 +10986,7 @@

    -
    +
    10:00
    @@ -11010,21 +11009,21 @@

    #> # A tibble: 15 × 3
     #>    air_time distance arr_delay
     #>       <dbl>    <dbl> <fct>    
    -#>  1  -1.04    -1.01   late     
    -#>  2  -0.855   -0.830  on_time  
    -#>  3  -0.412   -0.409  on_time  
    -#>  4  -0.325   -0.229  on_time  
    -#>  5  -0.0225   0.0397 on_time  
    -#>  6   1.78     2.08   on_time  
    -#>  7   0.216    0.0465 on_time  
    -#>  8   0.107    0.0465 on_time  
    -#>  9  -0.542   -0.562  on_time  
    -#> 10  -0.509   -0.379  late     
    -#> 11  -0.423   -0.409  on_time  
    -#> 12  -0.131   -0.229  on_time  
    -#> 13   0.518    0.731  on_time  
    -#> 14   1.48     1.65   on_time  
    -#> 15  -0.855   -0.847  late
    +#> 1 -1.16 -1.05 on_time +#> 2 -1.10 -1.05 on_time +#> 3 0.529 0.506 on_time +#> 4 -1.24 -1.15 late +#> 5 1.96 1.96 on_time +#> 6 -1.20 -1.17 on_time +#> 7 -0.390 -0.412 on_time +#> 8 -0.791 -0.734 on_time +#> 9 -0.120 -0.0435 on_time +#> 10 -0.0441 -0.0408 on_time +#> 11 -0.769 -0.836 late +#> 12 1.99 2.08 on_time +#> 13 -0.131 -0.207 late +#> 14 -1.18 -1.13 on_time +#> 15 0.0424 0.0492 on_time

    @@ -11046,7 +11045,7 @@

    -
    +
    10:00
    @@ -11081,7 +11080,7 @@

    -
    +
    10:00
    @@ -11106,7 +11105,7 @@

    -
    +

    This pipeline produces a final workflow that we then turn into a vetiver_model for the purpose of using the model in a production setting.

    vetiver provides a standardized way for bundling workflows with the information needed to version, store, and deploy them.

    @@ -11142,7 +11141,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/04-production.html b/docs/materials/04-production.html index b196ef8..5876e2a 100644 --- a/docs/materials/04-production.html +++ b/docs/materials/04-production.html @@ -1754,18 +1754,14 @@

    -

    How do

    -
    -
    -

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    Then, it becomes a matter of reproducing each of these pieces via code. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.

    This enables us to create separate environments in which we can do our development and testing before promoting code to production.

    -
    +

    -
    +

    This style of thinking is typically focused on things like software/applications, where different versions are incrementally developed, tested, and released as updates.

    @@ -1796,7 +1792,7 @@

    data science project architecture

    -
    +

    thing back to where we left our flights project.

    diff --git a/docs/search.json b/docs/search.json index 5e407d2..3d857a4 100644 --- a/docs/search.json +++ b/docs/search.json @@ -123,7 +123,7 @@ "href": "materials/03-targets.html#section-15", "title": "", "section": "", - "text": "We then run the pipeline using tar_make(), which will detail the steps that are being carried out and whether they were re-run or skipped.\n\ntargets::tar_make()\n\n\n\n#> v skipped target starwars\n#> v skipped target sentences\n#> v skipped target sentiment\n#> v skipped pipeline [0.044 seconds]" + "text": "We then run the pipeline using tar_make(), which will detail the steps that are being carried out and whether they were re-run or skipped.\n\ntargets::tar_make()\n\n\n\n#> v skipped target starwars\n#> v skipped target sentences\n#> v skipped target sentiment\n#> v skipped pipeline [0.043 seconds]" }, { "objectID": "materials/03-targets.html#section-16", @@ -480,7 +480,7 @@ "href": "materials/03-targets.html#section-60", "title": "", "section": "", - "text": "creating bootstraps\n\n# bootstrap\nboots = rsample::bootstraps(mtcars, times =10)\nboots\n\n#> # Bootstrap sampling \n#> # A tibble: 10 × 2\n#> splits id \n#> <list> <chr> \n#> 1 <split [32/10]> Bootstrap01\n#> 2 <split [32/9]> Bootstrap02\n#> 3 <split [32/10]> Bootstrap03\n#> 4 <split [32/14]> Bootstrap04\n#> 5 <split [32/8]> Bootstrap05\n#> 6 <split [32/12]> Bootstrap06\n#> 7 <split [32/14]> Bootstrap07\n#> 8 <split [32/10]> Bootstrap08\n#> 9 <split [32/12]> Bootstrap09\n#> 10 <split [32/13]> Bootstrap10" + "text": "creating bootstraps\n\n# bootstrap\nboots = rsample::bootstraps(mtcars, times =10)\nboots\n\n#> # Bootstrap sampling \n#> # A tibble: 10 × 2\n#> splits id \n#> <list> <chr> \n#> 1 <split [32/9]> Bootstrap01\n#> 2 <split [32/10]> Bootstrap02\n#> 3 <split [32/11]> Bootstrap03\n#> 4 <split [32/16]> Bootstrap04\n#> 5 <split [32/12]> Bootstrap05\n#> 6 <split [32/13]> Bootstrap06\n#> 7 <split [32/12]> Bootstrap07\n#> 8 <split [32/13]> Bootstrap08\n#> 9 <split [32/12]> Bootstrap09\n#> 10 <split [32/11]> Bootstrap10" }, { "objectID": "materials/03-targets.html#section-61", @@ -494,14 +494,14 @@ "href": "materials/03-targets.html#section-62", "title": "", "section": "", - "text": "each individual row contains an rsplit object, which has the original data stored as a single training/test split\n\n# grab one split\none_split =\nboots |>\npluck(\"splits\", 1)\n\none_split\n\n#> <Analysis/Assess/Total>\n#> <32/10/32>" + "text": "each individual row contains an rsplit object, which has the original data stored as a single training/test split\n\n# grab one split\none_split =\nboots |>\npluck(\"splits\", 1)\n\none_split\n\n#> <Analysis/Assess/Total>\n#> <32/9/32>" }, { "objectID": "materials/03-targets.html#section-63", "href": "materials/03-targets.html#section-63", "title": "", "section": "", - "text": "these sets can be extracted via the functions rsample::training() or rsample::testing()\n\n# extract training set\none_split |>\nrsample::training()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Hornet 4 Drive...1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1\n#> Valiant...2 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n#> Hornet Sportabout...3 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\n#> Fiat X1-9...5 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Toyota Corolla...6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n#> Lotus Europa...7 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n#> Lotus Europa...8 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n#> Valiant...9 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2\n#> Fiat X1-9...13 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\n#> Toyota Corolla...16 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n#> Toyota Corolla...17 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\n#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\n#> Valiant...20 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n#> Ferrari Dino...21 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\n#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\n#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n#> Porsche 914-2...24 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Hornet 4 Drive...25 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1\n#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\n#> Ferrari Dino...28 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\n#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n#> Porsche 914-2...30 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Hornet Sportabout...31 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3\n\n# extract test set\none_split |>\nrsample::testing()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\n#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\n#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\n#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n#> Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\n#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\n#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2" + "text": "these sets can be extracted via the functions rsample::training() or rsample::testing()\n\n# extract training set\none_split |>\nrsample::training()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Volvo 142E...1 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Cadillac Fleetwood...2 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n#> Volvo 142E...4 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\n#> Honda Civic...6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\n#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n#> Porsche 914-2...9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Hornet Sportabout...10 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n#> Porsche 914-2...11 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Fiat X1-9...12 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Honda Civic...13 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\n#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\n#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\n#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n#> Cadillac Fleetwood...18 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3\n#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\n#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\n#> Mazda RX4...23 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n#> Toyota Corona...24 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n#> Toyota Corona...25 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n#> Mazda RX4...27 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\n#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\n#> Fiat X1-9...30 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Volvo 142E...31 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Hornet Sportabout...32 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n\n# extract test set\none_split |>\nrsample::testing()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\n#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1\n#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\n#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\n#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2\n#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8" }, { "objectID": "materials/03-targets.html#section-64", @@ -529,7 +529,7 @@ "href": "materials/03-targets.html#section-66", "title": "", "section": "", - "text": "then we can fit our workflow across resamples and estimate its performance across these metrics\n\nwflow |>\n fit_resamples(\n resamples = boots,\n metrics = my_metrics\n ) |>\n collect_metrics() |>\n mutate_if(is.numeric, round, 3)\n\n#> # A tibble: 3 × 6\n#> .metric .estimator mean n std_err .config \n#> <chr> <chr> <dbl> <dbl> <dbl> <chr> \n#> 1 ccc standard 0.488 10 0.104 Preprocessor1_Model1\n#> 2 rmse standard 9.22 10 2.38 Preprocessor1_Model1\n#> 3 rsq standard 0.436 10 0.084 Preprocessor1_Model1" + "text": "then we can fit our workflow across resamples and estimate its performance across these metrics\n\nwflow |>\n fit_resamples(\n resamples = boots,\n metrics = my_metrics\n ) |>\n collect_metrics() |>\n mutate_if(is.numeric, round, 3)\n\n#> # A tibble: 3 × 6\n#> .metric .estimator mean n std_err .config \n#> <chr> <chr> <dbl> <dbl> <dbl> <chr> \n#> 1 ccc standard 0.634 10 0.086 Preprocessor1_Model1\n#> 2 rmse standard 7.92 10 2.50 Preprocessor1_Model1\n#> 3 rsq standard 0.546 10 0.079 Preprocessor1_Model1" }, { "objectID": "materials/03-targets.html#key-tidymodels-concepts-2", @@ -585,7 +585,7 @@ "href": "materials/03-targets.html#section-69", "title": "", "section": "", - "text": "We can create a recipe in the following way:\n\nrec=\n recipe(arr_delay ~ air_time + distance, data = flights) |>\n step_impute_median(all_numeric_predictors()) |>\n step_normalize(all_numeric_predictors())\n\n\nWe can then see how this recipe prepares data if we prep it on our training set and then use bake.\n\n\n#> # A tibble: 15 × 3\n#> air_time distance arr_delay\n#> <dbl> <dbl> <fct> \n#> 1 -1.04 -1.01 late \n#> 2 -0.855 -0.830 on_time \n#> 3 -0.412 -0.409 on_time \n#> 4 -0.325 -0.229 on_time \n#> 5 -0.0225 0.0397 on_time \n#> 6 1.78 2.08 on_time \n#> 7 0.216 0.0465 on_time \n#> 8 0.107 0.0465 on_time \n#> 9 -0.542 -0.562 on_time \n#> 10 -0.509 -0.379 late \n#> 11 -0.423 -0.409 on_time \n#> 12 -0.131 -0.229 on_time \n#> 13 0.518 0.731 on_time \n#> 14 1.48 1.65 on_time \n#> 15 -0.855 -0.847 late" + "text": "We can create a recipe in the following way:\n\nrec=\n recipe(arr_delay ~ air_time + distance, data = flights) |>\n step_impute_median(all_numeric_predictors()) |>\n step_normalize(all_numeric_predictors())\n\n\nWe can then see how this recipe prepares data if we prep it on our training set and then use bake.\n\n\n#> # A tibble: 15 × 3\n#> air_time distance arr_delay\n#> <dbl> <dbl> <fct> \n#> 1 -1.16 -1.05 on_time \n#> 2 -1.10 -1.05 on_time \n#> 3 0.529 0.506 on_time \n#> 4 -1.24 -1.15 late \n#> 5 1.96 1.96 on_time \n#> 6 -1.20 -1.17 on_time \n#> 7 -0.390 -0.412 on_time \n#> 8 -0.791 -0.734 on_time \n#> 9 -0.120 -0.0435 on_time \n#> 10 -0.0441 -0.0408 on_time \n#> 11 -0.769 -0.836 late \n#> 12 1.99 2.08 on_time \n#> 13 -0.131 -0.207 late \n#> 14 -1.18 -1.13 on_time \n#> 15 0.0424 0.0492 on_time" }, { "objectID": "materials/03-targets.html#section-70", @@ -2104,18 +2104,11 @@ "href": "materials/04-production.html#section-37", "title": "", "section": "", - "text": "How do" - }, - { - "objectID": "materials/04-production.html#section-38", - "href": "materials/04-production.html#section-38", - "title": "", - "section": "", "text": "So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.\nThen, it becomes a matter of reproducing each of these pieces via code. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.\nThis enables us to create separate environments in which we can do our development and testing before promoting code to production." }, { - "objectID": "materials/04-production.html#section-40", - "href": "materials/04-production.html#section-40", + "objectID": "materials/04-production.html#section-39", + "href": "materials/04-production.html#section-39", "title": "", "section": "", "text": "This style of thinking is typically focused on things like software/applications, where different versions are incrementally developed, tested, and released as updates.\n\nHow does data science differ?" @@ -2128,8 +2121,8 @@ "text": "data science project architecture\nWhat is the typical output of a data science project?\n\n\na job: a script that trains a model, updates a dataset, writes to a database\n\n\n\n\nan app: created in Shiny, Streamlit, Dash,\n\n\n\n\na report: a presentation, book, article, that is rendered from code\n\n\n\n\nan API" }, { - "objectID": "materials/04-production.html#section-41", - "href": "materials/04-production.html#section-41", + "objectID": "materials/04-production.html#section-40", + "href": "materials/04-production.html#section-40", "title": "", "section": "", "text": "thing back to where we left our flights project." diff --git a/materials/03-targets.qmd b/materials/03-targets.qmd index d32bdbf..7601232 100644 --- a/materials/03-targets.qmd +++ b/materials/03-targets.qmd @@ -1405,9 +1405,6 @@ estimates ``` - - - ## ```{r} @@ -1718,7 +1715,7 @@ Then, we'll refit to training + validation + test and prepare the model for depl ## -![](images/flights_stable.png) +![](images/flights_stable.png){fig-align="center"} . . . diff --git a/materials/04-production.qmd b/materials/04-production.qmd index 4cf0832..89a0785 100644 --- a/materials/04-production.qmd +++ b/materials/04-production.qmd @@ -574,10 +574,6 @@ But these illustrate the steps for *reproducing your data science environment vi ## -How do - -## - [So, putting things *in production* in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.]{.semi-transparent} [Then, it becomes a matter of reproducing each of these pieces *via code*. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.]{.semi-transparent} From fab799340314080221794176ddd951fd183d7c5a Mon Sep 17 00:00:00 2001 From: Phil Henrickson Date: Mon, 26 Aug 2024 09:20:37 -0500 Subject: [PATCH 3/3] re-rendering --- docs/materials/01-git.html | 128 ++++++++++++------------- docs/materials/02-renv.html | 2 +- docs/materials/03-targets.html | 168 +++++++++++++++++---------------- docs/search.json | 10 +- 4 files changed, 156 insertions(+), 152 deletions(-) diff --git a/docs/materials/01-git.html b/docs/materials/01-git.html index 80491c6..3256e6d 100644 --- a/docs/materials/01-git.html +++ b/docs/materials/01-git.html @@ -3922,7 +3922,7 @@

    -
    +

    @@ -4071,7 +4071,7 @@

    -
    +
    @@ -4193,20 +4193,20 @@

    -
    - @@ -4799,15 +4799,15 @@

    -
    +

    -
    +

    -
    +

    We realize that we shouldn’t be calculating sentiment at the line-level and then aggregating, because short positive statements potentially end up getting as much weight as longer complaints.

    @@ -5513,7 +5513,7 @@

    -
    +
    10:00
    @@ -5694,7 +5694,7 @@

    -
    +
    10:00
    @@ -6061,7 +6061,7 @@

    -
    +
    10:00
    @@ -6182,7 +6182,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/02-renv.html b/docs/materials/02-renv.html index b968a8c..42548cf 100644 --- a/docs/materials/02-renv.html +++ b/docs/materials/02-renv.html @@ -2342,7 +2342,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/03-targets.html b/docs/materials/03-targets.html index 5736ee4..1c357ca 100644 --- a/docs/materials/03-targets.html +++ b/docs/materials/03-targets.html @@ -9540,8 +9540,8 @@

    Or, we can visualize the pipeline using tar_glimpse().

    -
    - +
    +
    @@ -9551,8 +9551,8 @@

    tar_visnetwork() provides a more detailed breakdown of the pipeline, including the status of individual targets, as well as the functions and where they are used.

    -
    - +
    +

    @@ -9584,7 +9584,7 @@

    -
    +
    @@ -9800,7 +9800,7 @@

    -
    +
    15:00
    @@ -10612,16 +10612,16 @@

    #> # A tibble: 10 × 2 #> splits id #> <list> <chr> -#> 1 <split [32/9]> Bootstrap01 -#> 2 <split [32/10]> Bootstrap02 +#> 1 <split [32/13]> Bootstrap01 +#> 2 <split [32/11]> Bootstrap02 #> 3 <split [32/11]> Bootstrap03 -#> 4 <split [32/16]> Bootstrap04 -#> 5 <split [32/12]> Bootstrap05 -#> 6 <split [32/13]> Bootstrap06 -#> 7 <split [32/12]> Bootstrap07 -#> 8 <split [32/13]> Bootstrap08 +#> 4 <split [32/8]> Bootstrap04 +#> 5 <split [32/10]> Bootstrap05 +#> 6 <split [32/12]> Bootstrap06 +#> 7 <split [32/10]> Bootstrap07 +#> 8 <split [32/10]> Bootstrap08 #> 9 <split [32/12]> Bootstrap09 -#> 10 <split [32/11]> Bootstrap10 +#> 10 <split [32/10]> Bootstrap10
    @@ -10657,7 +10657,7 @@

    one_split
    #> <Analysis/Assess/Total>
    -#> <32/9/32>
    +#> <32/13/32>
    @@ -10669,54 +10669,58 @@

    one_split |>rsample::training()
    -
    #>                          mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Volvo 142E...1          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    -#> Cadillac Fleetwood...2  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    -#> Lotus Europa            30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Volvo 142E...4          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    -#> Merc 230                22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    -#> Honda Civic...6         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    -#> Merc 280                19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    -#> Chrysler Imperial       14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Porsche 914-2...9       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Hornet Sportabout...10  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    -#> Porsche 914-2...11      26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Fiat X1-9...12          27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Honda Civic...13        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    -#> Mazda RX4 Wag           21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    -#> Fiat 128                32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    -#> Merc 450SLC             15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    -#> Camaro Z28              13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    -#> Cadillac Fleetwood...18 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    -#> Merc 450SL              17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    -#> Ferrari Dino            19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    -#> Duster 360              14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Merc 280C               17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    -#> Mazda RX4...23          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    -#> Toyota Corona...24      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> Toyota Corona...25      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> Toyota Corolla          33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    -#> Mazda RX4...27          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    -#> Dodge Challenger        15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    -#> Ford Pantera L          15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Fiat X1-9...30          27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Volvo 142E...31         21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    -#> Hornet Sportabout...32  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    +
    #>                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    +#> Merc 230               22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    +#> AMC Javelin            15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    +#> Pontiac Firebird...3   19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    +#> Toyota Corona...4      21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    +#> Merc 450SE...5         16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    +#> Hornet Sportabout      18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    +#> Mazda RX4 Wag...7      21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    +#> Ford Pantera L         15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    +#> Merc 450SL             17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    +#> Duster 360...10        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    +#> Porsche 914-2          26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    +#> Maserati Bora...12     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    +#> Maserati Bora...13     15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    +#> Merc 240D...14         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    +#> Merc 280               19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    +#> Chrysler Imperial...16 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    +#> Fiat X1-9...17         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    +#> Duster 360...18        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    +#> Duster 360...19        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    +#> Merc 450SE...20        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    +#> Chrysler Imperial...21 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    +#> Datsun 710             22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    +#> Volvo 142E             21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    +#> Merc 450SE...24        16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    +#> Pontiac Firebird...25  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    +#> Merc 240D...26         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    +#> Mazda RX4 Wag...27     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    +#> Lincoln Continental    10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    +#> Fiat X1-9...29         27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    +#> Chrysler Imperial...30 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    +#> Duster 360...31        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    +#> Toyota Corona...32     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    # extract test set
     one_split |>
     rsample::testing()
    -
    #>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    -#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    -#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    -#> AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    -#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    +
    #>                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    +#> Mazda RX4          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    +#> Hornet 4 Drive     21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    +#> Valiant            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    +#> Merc 280C          17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    +#> Merc 450SLC        15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    +#> Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    +#> Fiat 128           32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    +#> Honda Civic        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    +#> Toyota Corolla     33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> Dodge Challenger   15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    +#> Camaro Z28         13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    +#> Lotus Europa       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    +#> Ferrari Dino       19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    @@ -10774,7 +10778,7 @@

    geom_vline(xintercept = 0, linetype = 'dashed') - +

    metrics

    for predictive modeling workflows, rsample is typically used in conjunction with yardstick and tune to estimate model performance for a model or tune a model across parameters

    @@ -10808,9 +10812,9 @@

    #> # A tibble: 3 × 6
     #>   .metric .estimator  mean     n std_err .config             
     #>   <chr>   <chr>      <dbl> <dbl>   <dbl> <chr>               
    -#> 1 ccc     standard   0.634    10   0.086 Preprocessor1_Model1
    -#> 2 rmse    standard   7.92     10   2.50  Preprocessor1_Model1
    -#> 3 rsq     standard   0.546    10   0.079 Preprocessor1_Model1
    +#> 1 ccc standard 0.535 10 0.106 Preprocessor1_Model1 +#> 2 rmse standard 6.30 10 0.816 Preprocessor1_Model1 +#> 3 rsq standard 0.489 10 0.081 Preprocessor1_Model1
    @@ -10957,7 +10961,7 @@

    -
    +
    10:00
    @@ -10986,7 +10990,7 @@

    -
    +
    10:00
    @@ -11009,21 +11013,21 @@

    #> # A tibble: 15 × 3
     #>    air_time distance arr_delay
     #>       <dbl>    <dbl> <fct>    
    -#>  1  -1.16    -1.05   on_time  
    -#>  2  -1.10    -1.05   on_time  
    -#>  3   0.529    0.506  on_time  
    -#>  4  -1.24    -1.15   late     
    -#>  5   1.96     1.96   on_time  
    -#>  6  -1.20    -1.17   on_time  
    -#>  7  -0.390   -0.412  on_time  
    -#>  8  -0.791   -0.734  on_time  
    -#>  9  -0.120   -0.0435 on_time  
    -#> 10  -0.0441  -0.0408 on_time  
    -#> 11  -0.769   -0.836  late     
    -#> 12   1.99     2.08   on_time  
    -#> 13  -0.131   -0.207  late     
    -#> 14  -1.18    -1.13   on_time  
    -#> 15   0.0424   0.0492 on_time
    +#> 1 0.562 0.491 on_time +#> 2 2.60 2.11 on_time +#> 3 -0.228 0.453 on_time +#> 4 -0.531 -0.419 on_time +#> 5 -0.250 -0.419 on_time +#> 6 -0.801 -0.851 on_time +#> 7 -0.488 -0.438 on_time +#> 8 -0.791 -0.734 on_time +#> 9 0.670 0.791 on_time +#> 10 1.85 2.02 on_time +#> 11 0.291 -0.0435 late +#> 12 -0.228 -0.131 on_time +#> 13 -0.401 -0.539 late +#> 14 -0.455 -0.429 on_time +#> 15 0.183 -0.0272 on_time

    @@ -11045,7 +11049,7 @@

    -
    +
    10:00
    @@ -11080,7 +11084,7 @@

    -
    +
    10:00
    @@ -11141,7 +11145,7 @@

    -
    +
    15:00
    diff --git a/docs/search.json b/docs/search.json index 3d857a4..2e5f2af 100644 --- a/docs/search.json +++ b/docs/search.json @@ -480,7 +480,7 @@ "href": "materials/03-targets.html#section-60", "title": "", "section": "", - "text": "creating bootstraps\n\n# bootstrap\nboots = rsample::bootstraps(mtcars, times =10)\nboots\n\n#> # Bootstrap sampling \n#> # A tibble: 10 × 2\n#> splits id \n#> <list> <chr> \n#> 1 <split [32/9]> Bootstrap01\n#> 2 <split [32/10]> Bootstrap02\n#> 3 <split [32/11]> Bootstrap03\n#> 4 <split [32/16]> Bootstrap04\n#> 5 <split [32/12]> Bootstrap05\n#> 6 <split [32/13]> Bootstrap06\n#> 7 <split [32/12]> Bootstrap07\n#> 8 <split [32/13]> Bootstrap08\n#> 9 <split [32/12]> Bootstrap09\n#> 10 <split [32/11]> Bootstrap10" + "text": "creating bootstraps\n\n# bootstrap\nboots = rsample::bootstraps(mtcars, times =10)\nboots\n\n#> # Bootstrap sampling \n#> # A tibble: 10 × 2\n#> splits id \n#> <list> <chr> \n#> 1 <split [32/13]> Bootstrap01\n#> 2 <split [32/11]> Bootstrap02\n#> 3 <split [32/11]> Bootstrap03\n#> 4 <split [32/8]> Bootstrap04\n#> 5 <split [32/10]> Bootstrap05\n#> 6 <split [32/12]> Bootstrap06\n#> 7 <split [32/10]> Bootstrap07\n#> 8 <split [32/10]> Bootstrap08\n#> 9 <split [32/12]> Bootstrap09\n#> 10 <split [32/10]> Bootstrap10" }, { "objectID": "materials/03-targets.html#section-61", @@ -494,14 +494,14 @@ "href": "materials/03-targets.html#section-62", "title": "", "section": "", - "text": "each individual row contains an rsplit object, which has the original data stored as a single training/test split\n\n# grab one split\none_split =\nboots |>\npluck(\"splits\", 1)\n\none_split\n\n#> <Analysis/Assess/Total>\n#> <32/9/32>" + "text": "each individual row contains an rsplit object, which has the original data stored as a single training/test split\n\n# grab one split\none_split =\nboots |>\npluck(\"splits\", 1)\n\none_split\n\n#> <Analysis/Assess/Total>\n#> <32/13/32>" }, { "objectID": "materials/03-targets.html#section-63", "href": "materials/03-targets.html#section-63", "title": "", "section": "", - "text": "these sets can be extracted via the functions rsample::training() or rsample::testing()\n\n# extract training set\none_split |>\nrsample::training()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Volvo 142E...1 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Cadillac Fleetwood...2 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n#> Volvo 142E...4 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\n#> Honda Civic...6 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\n#> Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n#> Porsche 914-2...9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Hornet Sportabout...10 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n#> Porsche 914-2...11 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Fiat X1-9...12 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Honda Civic...13 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\n#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\n#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\n#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n#> Cadillac Fleetwood...18 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3\n#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6\n#> Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\n#> Mazda RX4...23 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n#> Toyota Corona...24 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n#> Toyota Corona...25 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n#> Mazda RX4...27 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\n#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\n#> Fiat X1-9...30 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Volvo 142E...31 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Hornet Sportabout...32 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n\n# extract test set\none_split |>\nrsample::testing()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\n#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1\n#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n#> Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\n#> Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\n#> Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2\n#> Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8" + "text": "these sets can be extracted via the functions rsample::training() or rsample::testing()\n\n# extract training set\none_split |>\nrsample::training()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2\n#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2\n#> Pontiac Firebird...3 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2\n#> Toyota Corona...4 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n#> Merc 450SE...5 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2\n#> Mazda RX4 Wag...7 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\n#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4\n#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3\n#> Duster 360...10 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2\n#> Maserati Bora...12 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n#> Maserati Bora...13 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8\n#> Merc 240D...14 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\n#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4\n#> Chrysler Imperial...16 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n#> Fiat X1-9...17 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Duster 360...18 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Duster 360...19 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Merc 450SE...20 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n#> Chrysler Imperial...21 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1\n#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2\n#> Merc 450SE...24 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3\n#> Pontiac Firebird...25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2\n#> Merc 240D...26 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2\n#> Mazda RX4 Wag...27 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4\n#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4\n#> Fiat X1-9...29 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1\n#> Chrysler Imperial...30 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4\n#> Duster 360...31 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4\n#> Toyota Corona...32 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1\n\n# extract test set\none_split |>\nrsample::testing()\n\n#> mpg cyl disp hp drat wt qsec vs am gear carb\n#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4\n#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1\n#> Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1\n#> Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4\n#> Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3\n#> Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4\n#> Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1\n#> Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2\n#> Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1\n#> Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2\n#> Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4\n#> Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2\n#> Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6" }, { "objectID": "materials/03-targets.html#section-64", @@ -529,7 +529,7 @@ "href": "materials/03-targets.html#section-66", "title": "", "section": "", - "text": "then we can fit our workflow across resamples and estimate its performance across these metrics\n\nwflow |>\n fit_resamples(\n resamples = boots,\n metrics = my_metrics\n ) |>\n collect_metrics() |>\n mutate_if(is.numeric, round, 3)\n\n#> # A tibble: 3 × 6\n#> .metric .estimator mean n std_err .config \n#> <chr> <chr> <dbl> <dbl> <dbl> <chr> \n#> 1 ccc standard 0.634 10 0.086 Preprocessor1_Model1\n#> 2 rmse standard 7.92 10 2.50 Preprocessor1_Model1\n#> 3 rsq standard 0.546 10 0.079 Preprocessor1_Model1" + "text": "then we can fit our workflow across resamples and estimate its performance across these metrics\n\nwflow |>\n fit_resamples(\n resamples = boots,\n metrics = my_metrics\n ) |>\n collect_metrics() |>\n mutate_if(is.numeric, round, 3)\n\n#> # A tibble: 3 × 6\n#> .metric .estimator mean n std_err .config \n#> <chr> <chr> <dbl> <dbl> <dbl> <chr> \n#> 1 ccc standard 0.535 10 0.106 Preprocessor1_Model1\n#> 2 rmse standard 6.30 10 0.816 Preprocessor1_Model1\n#> 3 rsq standard 0.489 10 0.081 Preprocessor1_Model1" }, { "objectID": "materials/03-targets.html#key-tidymodels-concepts-2", @@ -585,7 +585,7 @@ "href": "materials/03-targets.html#section-69", "title": "", "section": "", - "text": "We can create a recipe in the following way:\n\nrec=\n recipe(arr_delay ~ air_time + distance, data = flights) |>\n step_impute_median(all_numeric_predictors()) |>\n step_normalize(all_numeric_predictors())\n\n\nWe can then see how this recipe prepares data if we prep it on our training set and then use bake.\n\n\n#> # A tibble: 15 × 3\n#> air_time distance arr_delay\n#> <dbl> <dbl> <fct> \n#> 1 -1.16 -1.05 on_time \n#> 2 -1.10 -1.05 on_time \n#> 3 0.529 0.506 on_time \n#> 4 -1.24 -1.15 late \n#> 5 1.96 1.96 on_time \n#> 6 -1.20 -1.17 on_time \n#> 7 -0.390 -0.412 on_time \n#> 8 -0.791 -0.734 on_time \n#> 9 -0.120 -0.0435 on_time \n#> 10 -0.0441 -0.0408 on_time \n#> 11 -0.769 -0.836 late \n#> 12 1.99 2.08 on_time \n#> 13 -0.131 -0.207 late \n#> 14 -1.18 -1.13 on_time \n#> 15 0.0424 0.0492 on_time" + "text": "We can create a recipe in the following way:\n\nrec=\n recipe(arr_delay ~ air_time + distance, data = flights) |>\n step_impute_median(all_numeric_predictors()) |>\n step_normalize(all_numeric_predictors())\n\n\nWe can then see how this recipe prepares data if we prep it on our training set and then use bake.\n\n\n#> # A tibble: 15 × 3\n#> air_time distance arr_delay\n#> <dbl> <dbl> <fct> \n#> 1 0.562 0.491 on_time \n#> 2 2.60 2.11 on_time \n#> 3 -0.228 0.453 on_time \n#> 4 -0.531 -0.419 on_time \n#> 5 -0.250 -0.419 on_time \n#> 6 -0.801 -0.851 on_time \n#> 7 -0.488 -0.438 on_time \n#> 8 -0.791 -0.734 on_time \n#> 9 0.670 0.791 on_time \n#> 10 1.85 2.02 on_time \n#> 11 0.291 -0.0435 late \n#> 12 -0.228 -0.131 on_time \n#> 13 -0.401 -0.539 late \n#> 14 -0.455 -0.429 on_time \n#> 15 0.183 -0.0272 on_time" }, { "objectID": "materials/03-targets.html#section-70",