Statistical-Rethinking/index.qmd at main · probably-jaden/Statistical-Rethinking · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
format: html
editor: visual
---

## What is this?

This is my documentation of the statistical techniques that I've worked on and plan to work on in the future. There's many other data science skills that I enjoy: deep learning techniques, data pipeline engineering, visualizations. But I won't showcase those here, this page is purely on statistical methods.

The textbook/course [Statistical Rethinking](https://oceanrep.geomar.de/id/eprint/55819/1/Statistical%20Rethinking%202nd%20Edition.pdf) is a fun way to unlearn the statistics you were taught in STAT 101. McElreath is iconoclastic which made me want to adopt his book as the backbone to this project: adopting his chapter structure, code, and fiery attitude towards what constitutes good science.

## Why Statistics?

Statistics are powerful!

-   Statistics helped the Oakland A's win just as many games as the Yankee's with 1/3 of the budget.

-   Your insurance, bank loan, retirement savings, car, smart phone, all took a lot of human ingenuity but that was not enough! They required lots of statistical work to be engineered, planned, and fine-tuned.

-   Statistics are a crucial element of scientific progress. The vast majority of academic papers from sociology to physics rely on similar statistical tools to validate their claims.

Some common objections to using statistics:

<details>

<summary>**Surely machine learning create more accurate predictions than traditional statistical models?**</summary>

Machine learning techniques are far more powerful than traditional statistics. But machine learning techniques have their downsides; they require lots of data, they require lots of computation, and worst of all they're black boxes. They don't tell you why they predict the things that they predict or what causes what (at least not yet). This is why science papers use traditional statistics to validate their claims.

</details>

<details>

<summary>**Businesses like to say they're data driven, but what they really want is system automation (e.g. invoices need to be categorized and sent to accounting).**</summary>

Yes, businesses need lots of automation, but they also need help making decisions. Who to hire, who to let go, what product to sell, at what price to sell, to name a few. These decisions are in a weird gray zone, they require some human judgement but would benefit from computer aid. Statistics are the best set of tools for many of these problems. They allow both humans and computers to add their judgement's to an analysis, getting a better result than either would on their own.

</details>

<details>

<summary>**Visualizations are more intuitive and persuasive to audiences. [Our World in Data](https://ourworldindata.org/) has changed more minds than dull economic statistics.**</summary>

I love visualizations and try to use them as often as I can. But they're severely constrained in what they can accomplish. You won't be able to easily show how mountain terrain, soil fertility, precipitation, and historical wealth all help predict countries GDP without showing your audience half a dozen maps. Statistical models allow us to easily see how exactly each of those factors are associated to GDP.

</details>

## Course Contents

### 1) Linear Regression 📈
How can we use one variable (e.g. Education) to predict another (e.g. Salary). Doctorates love this one weird trick which is correlated with scientific progress ($R^2$ = 0.5)

[Read Chapter →](ch4_linear_regression_polished.qmd)

---

### 2) Multiple Regression 📊
You're worried that the correlation of *ice cream* and *drownings* might be because it's hot outside

[Read Chapter →](ch5_multilinear.qmd)

---

### 3) Causality 🔀
[Oops](https://x.com/robinhanson/status/1925009123675975702) you can't just keep adding things to the regression 🫣

[Read Chapter →](ch6_DAG.qmd)

---

### 4) Model Metrics 🎯
AIC, LOO, Occam's razor, Oh my!

[Read Chapter →](ch7ModelComparison.qmd)

---

### 5) Interactions 🎨
Covariate effects conditional on other covariate effects!

[Read Chapter →](ch8_conditional.qmd)

---

### 5) Integer Models 🔢
Covariate effects conditional on other covariate effects!

[Read Chapter →](ch11_integerGLM.qmd)

---

### 5) General Linear Madness 🤪
Covariate effects conditional on other covariate effects!

[Read Chapter →](ch16_GLMs.qmd)

---

<br>

<br>


## References

Source code is on [GitHub](https://github.com/probably-jaden/Statistical-Rethinking).