Intro2StatisticalLearning/index.qmd at main · ASPteaching/Intro2StatisticalLearning · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "Introduction to Statistical Learning"
---

# Introduction

This [web page](https://aspteaching.github.io/Intro2StatisticalLearning/) and its [associated github repository](https://github.com/aspteaching/Intro2StatisticalLearning/) is intended to provide materials
(slides, scripts datasets etc) for _some of the units[^1]_ of the _Statistical Learning_
course at the [UPC-UB MSc in Statistics and Operations Research (MESIO)](https://mesioupcub.masters.upc.edu/en).

[^1:] Specifically the materials in this page are about chapters:

  1. Introduction
  4. Tree and Ensemble Methods
  5. Artificial Neural Networks..

The main reason for creating this page, apart of making it open and facilitatigt the hierrchjichal presentation of the materials, is that some labs may take time to run. So, given that it is not an option to run them in class we provide links to the HTML files obtaind when running the R or Python notebooks provided.

By the other side, in order to make the materials completely reproducible, the page is based on a github repository, which can be cloned or downladed in order for anyone who wishes to play around with the materials and reproduce or improve the examples.


## Introduction to Statistical Learning

- [Course presentation](C0-Intro2StatLearn-PD_AS.qmd)

The first part of the course introduces Statistical Learning and relates it to the related fields of Statistics, Machine Learning and Statistical Learning.

After this, an introduction to Supervised Learning (Prediction and Classification) and to Model Validation and Resampling (Bootstrap) concepts is presented.

<!-- - [1.1 Overview of Supervised Learning](C1.1-SupervisedLearning-PD_AS.qmd) -->
<!-- - [1.2 Model validation and Resampling](C1.2-Model_validation_and_Resampling.qmd) -->

<!-- - R-labs -->
<!--   - [Regression with KNN](labs/Lab-C1.1-knn4regression.Rmd) -->
<!--   - [Classification with KNN](labs/Lab-C1.2-knn4classification) -->

- Complements
  - [Introduction to biomarkers and diagnostic tests](FromBiomarker2DiagnosticTests.pdf)

## Tree based methods

### Decision Trees

Decision trees are a type of non-parametric classifiers which have been Very successful because of their interpretability, flexibility and a very decent accuracy.

<!-- - [2.1-DecisionTrees-Slides](2.1-DecisionTrees-Slides.qmd) -->
<!-- - R-labs -->
<!--     - [Lab_1- Classification and Regression Trees](CART-Examples.html) -->
<!-- - Python-labs -->
<!--     - [Lab_1- Decision Trees lab (from ISL. Ch 08)](ISLch08-baggboost-lab.ipynb) -->

### Ensemble methods

The term "Ensemble" (together in french) refers to distinct approaches to build predictiors by combining multiple models.

They have proved to addres well some limitations of trees therefore improving accuracy and robustness as well as being able to reduce overfitting and capture complex relationships.

<!-- - [2.2-Ensemble Methods. Slides](C2.2-Ensemble_Methods-Slides.qmd) -->
<!-- - R-Labs -->
<!--   - [Ensemble Lab 1 (Random Forest)](labs/Lab-C.2.4-Ensembles-1RF.qmd) -->
<!--   - [Ensemble Lab 2 (Boosting)](Lab-C.2.4-Boosting.qmd) -->
<!--   - [Ensemble Lab 2b (Boosting Optimization)](Lab-C.2.4b-BoostingOptimization.qmd) -->

<!--   - [Ensemble Lab 3 (Caret)](The-caret_package.qmd) -->


## Artifical Neural Networks

### Shallow Neural Networks

These are raditional ML models, inspired in brain, that simulate neuron behavior, thata is they receive an input, which is processed and an  output prediction is produced.

For long their applicability has been relatively restricted to a few fields or problems due mainly to their "black box" functioning that made them hard to interpret.

The scenario has now completely changed with the advent of deep neural networks which are in the basis of many powerful applications of artificial intelligence.

<!-- - [Neural Networks Slides](C3.1-Introduction_to_ANN-Slides.qmd) -->
<!-- - Labs -->
<!--   - NeuralNets Lab (Rmd version) -->
<!--   - NeuralNets Lab (Python notebook -->

### Deep Neural Networks

Esssentially these are ANN with multiple hidden layers with allow overpassing many of their limitations.

They can be tuned in a much more automatical way and have been applied to many complex tasks. such as Computer vision, Natural Language Processing or Recommender systems.

<!-- - [Deep learning Slides](C3.2-Introduction_to_Deep_Learning-Slides.qmd) -->
<!--   - DeepLearning Lab (Rmd version) -->
<!--   - DeepLearning Lab (Python notebook -->


# References and resources

## References for Tree based methods

- Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press.

- Brandon M. Greenwell (202) Tree-Based Methods for Statistical Learning in R. 1st Edition. Chapman and Hall/CRC DOI: https://doi.org/10.1201/9781003089032 Web site

- Efron, B., Hastie T. (2016) Computer Age Statistical Inference. Cambridge University Press. Web site

- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.

- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112). Springer.

## References for deep neural networks

- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (Vol. 1). MIT press. Web site

- LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

- Chollet, F. (2018). Deep learning with Python. Manning Publications.

- Chollet, F. (2023). Deep learning with R . 2nd edition. Manning Publications.

## Some interesting online resources

### Statistical/Machine Learning in General

- [Google's Machine Learning Crash Course](https://developers.google.com/machine-learning/crash-course)

- [Applied Data Mining and Statistical Learning (Penn Statte-University)](https://online.stat.psu.edu/stat508/)

- [R for statistical learning](https://daviddalpiaz.github.io/r4sl/)

- [Introduction to Statistical Learning (ISL)]()

### Decision Trees

- [Decision Trees free course (9 videos). By Analytics Vidhya](https://www.youtube.com/playlist?list=PLdKd-j64gDcC5TCZEqODMZtAotCfm5Zkh)

- [An Introduction to Recursive Partitioning Using the RPART Routines](https://cran.r-project.org/web/packages/rpart/vignettes/longintro.pdf)

### Neural Networks

- [Deep Learning and Neural Networks (Andrew Ng @ Coursera)]() (https://www.coursera.org/learn/neural-networks-deep-learning)

- [The Neural network Zoo](https://www.asimovinstitute.org/neural-network-zoo/)

-[The Neural network Playground](https://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.45726&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false)


This page has been created as Quarto Website project.

To learn more about Quarto websites visit <https://quarto.org/docs/websites>.