diff --git a/docs/index.html b/docs/index.html index da75d9b..94c3998 100644 --- a/docs/index.html +++ b/docs/index.html @@ -115,7 +115,7 @@

What They Didn’t Teach You About Data Science

About

-

This workshop focuses on using R/RStudio with Git, renv, and targets. This will be pretty hands on, so it will be helpful to configure as much as possible in advance. I can help debug if we have any issues on the day.

+

This workshop focuses on using R/RStudio with Git/GitHub, renv, and targets. This will be pretty hands on, so it will be helpful to configure as much as possible in advance. I can help debug if we have any issues on the day.

Setup

These are the following pieces to set up in advance:

@@ -137,6 +137,7 @@

Schedule

  • Git + GitHub
  • renv
  • targets
  • +
  • production
  • diff --git a/docs/materials/01-git.html b/docs/materials/01-git.html index 53ad961..3256e6d 100644 --- a/docs/materials/01-git.html +++ b/docs/materials/01-git.html @@ -3922,7 +3922,7 @@

    -
    +
    @@ -4071,7 +4071,7 @@

    -
    +
    @@ -4193,20 +4193,20 @@

    -
    - @@ -4799,15 +4799,15 @@

    -
    +

    -
    +

    -
    +

    We realize that we shouldn’t be calculating sentiment at the line-level and then aggregating, because short positive statements potentially end up getting as much weight as longer complaints.

    @@ -5513,7 +5513,7 @@

    -
    +
    10:00
    @@ -5694,7 +5694,7 @@

    -
    +
    10:00
    @@ -6061,7 +6061,7 @@

    -
    +
    10:00
    @@ -6182,7 +6182,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/02-renv.html b/docs/materials/02-renv.html index 1e31f26..42548cf 100644 --- a/docs/materials/02-renv.html +++ b/docs/materials/02-renv.html @@ -2342,7 +2342,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/03-targets.html b/docs/materials/03-targets.html index 2886042..1c357ca 100644 --- a/docs/materials/03-targets.html +++ b/docs/materials/03-targets.html @@ -9540,8 +9540,8 @@

    Or, we can visualize the pipeline using tar_glimpse().

    -
    - +
    +
    @@ -9551,8 +9551,8 @@

    tar_visnetwork() provides a more detailed breakdown of the pipeline, including the status of individual targets, as well as the functions and where they are used.

    -
    - +
    +

    @@ -9584,7 +9584,7 @@

    -
    +
    @@ -9800,7 +9800,7 @@

    -
    +
    15:00
    @@ -10612,16 +10612,16 @@

    #> # A tibble: 10 × 2 #> splits id #> <list> <chr> -#> 1 <split [32/11]> Bootstrap01 -#> 2 <split [32/13]> Bootstrap02 -#> 3 <split [32/12]> Bootstrap03 -#> 4 <split [32/13]> Bootstrap04 -#> 5 <split [32/13]> Bootstrap05 -#> 6 <split [32/13]> Bootstrap06 -#> 7 <split [32/12]> Bootstrap07 -#> 8 <split [32/11]> Bootstrap08 -#> 9 <split [32/9]> Bootstrap09 -#> 10 <split [32/15]> Bootstrap10 +#> 1 <split [32/13]> Bootstrap01 +#> 2 <split [32/11]> Bootstrap02 +#> 3 <split [32/11]> Bootstrap03 +#> 4 <split [32/8]> Bootstrap04 +#> 5 <split [32/10]> Bootstrap05 +#> 6 <split [32/12]> Bootstrap06 +#> 7 <split [32/10]> Bootstrap07 +#> 8 <split [32/10]> Bootstrap08 +#> 9 <split [32/12]> Bootstrap09 +#> 10 <split [32/10]> Bootstrap10
    @@ -10657,7 +10657,7 @@

    one_split
    #> <Analysis/Assess/Total>
    -#> <32/11/32>
    +#> <32/13/32>
    @@ -10670,55 +10670,57 @@

    rsample::training()
    #>                         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Valiant                18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    -#> Cadillac Fleetwood     10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    -#> Ferrari Dino           19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    -#> Camaro Z28             13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    -#> Volvo 142E             21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
    -#> Mazda RX4 Wag...6      21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    -#> Ford Pantera L...7     15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Datsun 710...8         22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> AMC Javelin...9        15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Lotus Europa...10      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Duster 360...11        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Lotus Europa...12      30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    -#> Hornet 4 Drive...13    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Merc 280...14          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
     #> Merc 230               22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
    -#> Pontiac Firebird       19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
    -#> Toyota Corona...17     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> Hornet 4 Drive...18    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Merc 240D              24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
    -#> Duster 360...20        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
    -#> Hornet 4 Drive...21    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    -#> Chrysler Imperial...22 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Chrysler Imperial...23 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
    -#> Ford Pantera L...24    15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
    -#> Mazda RX4 Wag...25     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
    -#> Merc 280C              17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    -#> Fiat X1-9              27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
    -#> Datsun 710...28        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
    -#> Toyota Corona...29     21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
    -#> AMC Javelin...30       15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
    -#> Merc 280...31          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
    -#> Toyota Corolla         33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 +#> Pontiac Firebird...3 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 +#> Toyota Corona...4 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 +#> Merc 450SE...5 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 +#> Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 +#> Mazda RX4 Wag...7 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 +#> Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 +#> Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 +#> Duster 360...10 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 +#> Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 +#> Maserati Bora...12 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 +#> Maserati Bora...13 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 +#> Merc 240D...14 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 +#> Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 +#> Chrysler Imperial...16 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 +#> Fiat X1-9...17 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 +#> Duster 360...18 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 +#> Duster 360...19 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 +#> Merc 450SE...20 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 +#> Chrysler Imperial...21 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 +#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 +#> Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 +#> Merc 450SE...24 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 +#> Pontiac Firebird...25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 +#> Merc 240D...26 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 +#> Mazda RX4 Wag...27 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 +#> Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 +#> Fiat X1-9...29 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 +#> Chrysler Imperial...30 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 +#> Duster 360...31 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 +#> Toyota Corona...32 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
    # extract test set
     one_split |>
     rsample::testing()
    -
    #>                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    -#> Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    -#> Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
    -#> Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
    -#> Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
    -#> Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    -#> Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
    -#> Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    -#> Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    -#> Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    -#> Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
    -#> Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
    +
    #>                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
    +#> Mazda RX4          21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
    +#> Hornet 4 Drive     21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
    +#> Valiant            18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
    +#> Merc 280C          17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
    +#> Merc 450SLC        15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
    +#> Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
    +#> Fiat 128           32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
    +#> Honda Civic        30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
    +#> Toyota Corolla     33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
    +#> Dodge Challenger   15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
    +#> Camaro Z28         13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
    +#> Lotus Europa       30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
    +#> Ferrari Dino       19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
    @@ -10776,7 +10778,7 @@

    geom_vline(xintercept = 0, linetype = 'dashed') - +

    metrics

    for predictive modeling workflows, rsample is typically used in conjunction with yardstick and tune to estimate model performance for a model or tune a model across parameters

    @@ -10808,11 +10810,11 @@

    mutate_if(is.numeric, round, 3)
    #> # A tibble: 3 × 6
    -#>   .metric .estimator   mean     n std_err .config             
    -#>   <chr>   <chr>       <dbl> <dbl>   <dbl> <chr>               
    -#> 1 ccc     standard    0.457    10   0.095 Preprocessor1_Model1
    -#> 2 rmse    standard   11.1      10   2.86  Preprocessor1_Model1
    -#> 3 rsq     standard    0.403    10   0.095 Preprocessor1_Model1
    +#> .metric .estimator mean n std_err .config +#> <chr> <chr> <dbl> <dbl> <dbl> <chr> +#> 1 ccc standard 0.535 10 0.106 Preprocessor1_Model1 +#> 2 rmse standard 6.30 10 0.816 Preprocessor1_Model1 +#> 3 rsq standard 0.489 10 0.081 Preprocessor1_Model1
    @@ -10959,7 +10961,7 @@

    -
    +
    10:00
    @@ -10988,7 +10990,7 @@

    -
    +
    10:00
    @@ -11011,21 +11013,21 @@

    #> # A tibble: 15 × 3
     #>    air_time distance arr_delay
     #>       <dbl>    <dbl> <fct>    
    -#>  1  -1.07    -1.15   on_time  
    -#>  2  -0.726   -0.676  on_time  
    -#>  3  -0.163   -0.0435 on_time  
    -#>  4  -0.109   -0.0231 on_time  
    -#>  5  -1.16    -1.15   on_time  
    -#>  6   1.64     1.93   on_time  
    -#>  7   2.09     1.94   late     
    -#>  8  -0.434   -0.429  on_time  
    -#>  9  -0.455   -0.379  on_time  
    -#> 10  -0.682   -0.680  on_time  
    -#> 11  -1.24    -1.15   on_time  
    -#> 12  -1.20    -1.19   late     
    -#> 13  -0.228   -1.11   on_time  
    -#> 14  -0.0225   0.0342 on_time  
    -#> 15  -0.423   -0.376  on_time
    +#> 1 0.562 0.491 on_time +#> 2 2.60 2.11 on_time +#> 3 -0.228 0.453 on_time +#> 4 -0.531 -0.419 on_time +#> 5 -0.250 -0.419 on_time +#> 6 -0.801 -0.851 on_time +#> 7 -0.488 -0.438 on_time +#> 8 -0.791 -0.734 on_time +#> 9 0.670 0.791 on_time +#> 10 1.85 2.02 on_time +#> 11 0.291 -0.0435 late +#> 12 -0.228 -0.131 on_time +#> 13 -0.401 -0.539 late +#> 14 -0.455 -0.429 on_time +#> 15 0.183 -0.0272 on_time

    @@ -11047,7 +11049,7 @@

    -
    +
    10:00
    @@ -11082,7 +11084,7 @@

    -
    +
    10:00
    @@ -11107,7 +11109,7 @@

    -
    +

    This pipeline produces a final workflow that we then turn into a vetiver_model for the purpose of using the model in a production setting.

    vetiver provides a standardized way for bundling workflows with the information needed to version, store, and deploy them.

    @@ -11143,7 +11145,7 @@

    -
    +
    15:00
    diff --git a/docs/materials/04-production.html b/docs/materials/04-production.html index 10487d2..5876e2a 100644 --- a/docs/materials/04-production.html +++ b/docs/materials/04-production.html @@ -1413,8 +1413,8 @@

    -
    -

    +
    +

    environments as code

    DevOps principles aim to create software that builds security, stability, and scalability into the software from the very beginning. The idea is to avoid building software that works locally, but doesn’t work well in collaboration or production.

    @@ -1422,10 +1422,10 @@

    So much of DevOps boils down to preventing the well-it-runs-on-my-machine problem.

    -
    +

    -
    +

    DevOps principles aim to create software that builds security, stability, and scalability into the software from the very beginning. The idea is to avoid building software that works locally, but doesn’t work well in collaboration or production.

    @@ -1441,14 +1441,14 @@

    -
    +

    How close are we to creating fully reproducible environments via code? What are we missing?

    -
    +

    -
    +

    How close are we to creating fully reproducible environments via code? What are we missing?

    We’ve only really covered one layer:

    @@ -1457,7 +1457,7 @@

    renv and venv allow us to create isolated virtual environments in which to execute our code.

    -
    +

    your data science environment is the stack of software and hardware below your code, from the R and Python packages you’re using right down to the physical hardware your code runs on.

    @@ -1473,7 +1473,7 @@

    -
    +

    We’ve covered creating and taking down one layer:

      @@ -1481,7 +1481,7 @@

    renv and venv allow us to create isolated virtual environments in which to execute our code.

    -
    +

    But there are three main layers to think about:

      @@ -1497,7 +1497,7 @@

      API keys, database credentials, ODBC drivers…

    -
    +

    But there are three main layers to think about:

      @@ -1509,19 +1509,19 @@

      Your code has to actually run on something. Even if it’s in the cloud it’s still running on a physical machine somewhere.

    -
    +

    -

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    +

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    -
    +

    -
    +

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    Then, it becomes a matter of reproducing each of these pieces via code. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.

    -
    +

    Let’s revisit the GitHub action we saw earlier.

    # name: updating the README
    @@ -1577,7 +1577,7 @@ 

    # git push origin || echo "No changes to commit" #
    -
    +

    This is essentially just a script that:

      @@ -1590,62 +1590,218 @@

    1. Renders the Quarto README and commits/pushes it to the repository
    -
    +

    Now, to be clear, this is a lot of work to just render a goddamn README.

    But we use the same setup to do more elaborate work, such as running the whole dang pipeline via a Github Action.

    -
    +

    We’ve been building pipelines with targets.

    +
    +

    If you run targets::tar_github_actions(), you will notice a new file .github/workflows/targets.yaml appears in your project working directory

    +
    +
    +
    
    +# MIT License
    +# Copyright (c) 2021 Eli Lilly and Company
    +# Author: William Michael Landau (will.landau at gmail)
    +# Written with help from public domain (CC0 1.0 Universal) workflow files by Jim Hester:
    +# * https://github.com/r-lib/actions/blob/master/examples/check-full.yaml
    +# * https://github.com/r-lib/actions/blob/master/examples/blogdown.yaml
    +#
    +# Permission is hereby granted, free of charge, to any person obtaining a copy
    +# of this software and associated documentation files (the "Software"), to deal
    +# in the Software without restriction, including without limitation the rights
    +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    +# copies of the Software, and to permit persons to whom the Software is
    +# furnished to do so, subject to the following conditions:
    +#
    +# The above copyright notice and this permission notice shall be included in all
    +# copies or substantial portions of the Software.
    +#
    +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    +# SOFTWARE.
    +
    +on:
    +  push:
    +    branches:
    +      - main
    +      - master
    +
    +name: targets
    +
    +jobs:
    +  targets:
    +    runs-on: ubuntu-latest
    +    env:
    +      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    +      RENV_PATHS_ROOT: ~/.local/share/renv
    +    steps:
    +      - uses: actions/checkout@v2
    +      - uses: r-lib/actions/setup-r@v2
    +      - uses: r-lib/actions/setup-pandoc@v2
    +
    +      - name: Install Mac system dependencies
    +        if: runner.os == 'macOS'
    +        run: brew install zeromq
    +
    +      - name: Install Linux system dependencies
    +        if: runner.os == 'Linux'
    +        run: |
    +          sudo apt-get install libcurl4-openssl-dev
    +          sudo apt-get install libssl-dev
    +          sudo apt-get install libzmq3-dev
    +
    +      - name: Cache packages
    +        uses: actions/cache@v1
    +        with:
    +          path: ${{ env.RENV_PATHS_ROOT }}
    +          key: ${{ runner.os }}-renv-${{ hashFiles('**/renv.lock') }}
    +          restore-keys: ${{ runner.os }}-renv-
    +
    +      - name: Restore packages
    +        shell: Rscript {0}
    +        run: |
    +          if (!requireNamespace("renv", quietly = TRUE)) install.packages("renv")
    +          renv::restore()
    +
    +      - name: Check if previous runs exists
    +        id: runs-exist
    +        run: git ls-remote --exit-code --heads origin targets-runs
    +        continue-on-error: true
    +
    +      - name: Checkout previous run
    +        if: steps.runs-exist.outcome == 'success'
    +        uses: actions/checkout@v2
    +        with:
    +          ref: targets-runs
    +          fetch-depth: 1
    +          path: .targets-runs
    +
    +      - name: Restore output files from the previous run
    +        if: steps.runs-exist.outcome == 'success'
    +        run: |
    +          for (dest in scan(".targets-runs/.targets-files", what = character())) {
    +            source <- file.path(".targets-runs", dest)
    +            if (!file.exists(dirname(dest))) dir.create(dirname(dest), recursive = TRUE)
    +            if (file.exists(source)) file.rename(source, dest)
    +          }
    +        shell: Rscript {0}
    +
    +      - name: Run targets pipeline
    +        run: targets::tar_make()
    +        shell: Rscript {0}
    +
    +      - name: Identify files that the targets pipeline produced
    +        run: git ls-files -mo --exclude=renv > .targets-files
    +
    +      - name: Create the runs branch if it does not already exist
    +        if: steps.runs-exist.outcome != 'success'
    +        run: git checkout --orphan targets-runs
    +
    +      - name: Put the worktree in the runs branch if the latter already exists
    +        if: steps.runs-exist.outcome == 'success'
    +        run: |
    +          rm -r .git
    +          mv .targets-runs/.git .
    +          rm -r .targets-runs
    +
    +      - name: Upload latest run
    +        run: |
    +          git config --local user.name "GitHub Actions"
    +          git config --local user.email "actions@github.com"
    +          rm -r .gitignore .github/workflows
    +          git add --all -- ':!renv'
    +          for file in $(git ls-files -mo --exclude=renv)
    +          do
    +            git add --force $file
    +          done
    +          git commit -am "Run pipeline"
    +          git push origin targets-runs
    +
    +      - name: Prepare failure artifact
    +        if: failure()
    +        run: rm -rf .git .github .targets-files .targets-runs
    +
    +      - name: Post failure artifact
    +        if: failure()
    +        uses: actions/upload-artifact@main
    +        with:
    +          name: ${{ runner.os }}-r${{ matrix.config.r }}-results
    +          path: .
    +
    +
    +
    +

    +

    This generates a GitHub Action template that will reproduce your project environment, run the pipeline, and output the results.

    +
    +

    Note: you will still need to configure things on which your environemnt depends, such as API keys, database credentials, etc.

    +
    +
    +

    This also relies on using GitHub runners for your compute and storage, which are both low by design - they are not intended for heavy workloads.

    +
    +
    +

    But these illustrate the steps for reproducing your data science environment via code.

    +

    -

    For instance, we’ve been building a pipelines with targets.

    +

    So, putting things in production in a safe and reliable way starts with recognizing the different pieces we need to recreate our data science environment.

    +

    Then, it becomes a matter of reproducing each of these pieces via code. This part sounds super complicated, and it can be, but a lot of smart people have put a lot of time into making it easier.

    +

    This enables us to create separate environments in which we can do our development and testing before promoting code to production.

    -
    -

    environments as code

    +
    +

    +

    This style of thinking is typically focused on things like software/applications, where different versions are incrementally developed, tested, and released as updates.

    +
    +

    How does data science differ?

    +
    -
    -

    project

    -

    What would we need to

    -

    we have to date covered:

    +
    +

    data science project architecture

    +

    What is the typical output of a data science project?

    +
      -
    • Git/GitHub for versioning and sharing our code
    • -
    • renv for reproducing our code dependencies
    • -
    • targets for creating repeatable pipelines
    • +
    • a job: a script that trains a model, updates a dataset, writes to a database
    -

    putting things into production is a matter of managing environments

    -
    -
    -
    -

    environment

    +

    +
      -
    • code
    • -
    • packages
    • -
    • system
    • -
    • hardware
    • +
    • an app: created in Shiny, Streamlit, Dash,
    - -
    -

    project architecture

    -

    What is the typical output of a data science project?

    +
      -
    • a job: a script that trains a model, updates a dataset, writes to a database

    • -
    • an app: created in Shiny, Streamlit, Dash,

    • -
    • a report: a presentation, book, article, that is rendered from code

    • -
    • an API

    • +
    • a report: a presentation, book, article, that is rendered from code
    • +
    +
    +
    +
      +
    • an API
    -
    +

    +

    thing back to where we left our flights project.

    + + + + + + +