methodscamp.github.io/04_calculus.qmd at main · methodscamp/methodscamp.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
# Calculus

In this section we'll focus on three big ideas from calculus: derivatives, optimization, and integrals.

## Derivatives

Derivatives are about (instantaneous) rate of change.

> "In the fall of 1972 President Nixon announced that the rate of increase of inflation was decreasing. This was the first time a sitting president used the third derivative to advance his case for reelection" ([Rossi 1996](https://www.ams.org/notices/199610/page2.pdf))

Let's dissect what Nixon might have said:

> Inflation's \[first derivative, of prices\] rate of increase \[second derivative\] is going down \[third derivative\].

A more graphical way to think about a derivatives is as a *slope*. Let's consider a linear function of the form $y = 2 x$:

```{r}
#| message: false
library(tidyverse) # could also just do library(ggplot2)
ggplot() +
  stat_function(fun = function(x){2 * x},
                xlim = c(-10, 10))
```

We can imagine any political variables in the x- and y-axes. What is the rate of change? In other words, what is the derivative? Remember that we can calculate the slope with:

$$
m = \frac{f(x_2) - f(x_1)}{x_2-x_1}
$$

Now consider another slightly more complicated function, a quadratic one, $y = x^2$:

```{r}
ggplot() +
  stat_function(fun = function(x){x ^ 2},
                xlim = c(-10, 10))
```

What happens when we apply our slope function?

::: callout-note
#### Exercise

1)  Use the slope formula to calculate the rate of change between 5 and 6.

2)  Use the slope formula to calculate the rate of change between 5 and 5.5.

3)  Use the slope formula to calculate the rate of change between 5 and 5.1.
:::

Takeaway: here the derivative depends on the value of $x$. It is actually $2x$.

Differential calculus is about finding these derivatives in a more straightforward manner! We can generalize our slope formula as follows:

$$
m = \frac{f(x_1+ \Delta x) - f(x_1)}{\Delta x}
$$

The point is that when $\Delta x$ is arbitrarily small, we'll get our rate of change. Formalizing this:

$$
\lim_{\Delta x\to0} \frac{f(x_1+ \Delta x) - f(x_1)}{\Delta x} = \frac{d}{dx} f(x) = \frac{dy}{dx} = f'(x)
$$

A few points on notation:

-   $\frac{d}{dx} f(x)$ is read "The derivative of $f$ of $x$ with respect to $x$."

    -   The variable with respect to which we're differentiating is the one that appears in the bottom (in the case above, this is $x$).

    ::: callout-warning
    While the above looks like a fraction, it's really not. Do not try to cancel out the $d$s!
    :::

-   $f'(x)$ (read: "$f$ prime $x$") is the derivative of $f(x)$. This is a more compact form to refer to derivatives when you have defined $f(x)$ elsewhere.

### Rules of differentiation

How to compute derivatives? Sometimes you can try a bunch of numbers and get at the answer. Sometimes you can use the limit-based formula above, if you know a few properties of limits. But in most cases you will either use software (more on this later) or the **rules of differentiation**, which we will cover now.

**Constant rule:** $(c)' = 0$.

There is no change in a constant:

```{r}
ggplot() +
  stat_function(fun = function(x){2}, xlim = c(-10, 10))
```

**Coefficient rule:** $(c \cdot f(x))' = c \cdot f'(x)$.

```{r}
ggplot() +
  stat_function(fun = function(x){2 * x}, xlim = c(-10, 10), aes(color = "y = 2x")) +
  stat_function(fun = function(x){4 * x}, xlim = c(-10, 10), aes(color = "y = 4x")) +
  scale_color_manual("Function", values = c("red", "blue"))
```

**Sum/difference rule:** $(f(x) \pm g(x))' = f'(x) \pm g'(x)$.

The two rules above give us that the derivative is a *linear operator*.

**Power rule:** $(x^n)'=nx^{(n-1)}$

Remember when we wanted to calculate the derivative of $y=x^2$ above? We can use the power rule, with $n=2$: $\;nx^{(n-1)} = 2x^{(2-1)}=2x$. Let's try out $\frac{d}{dx}4x^3$ and $\frac{d}{dx}(x^2 + 2x)$ on the board.

::: callout-note
#### Exercise

Use the differentiation rules we have covered so far to calculate the derivatives of $y$ with respect to $x$ of the following functions:

1)  $y = 2x^2 + 10$
2)  $y = 5x^4 - \frac{2}{3}x^3$
3)  $y = 9 \sqrt x$
4)  $y = \frac{4}{x^2}$
5)  $y = ax^3 + b$, where $a$ and $b$ are constants.
6)  $y = \frac{2w}{5}$
:::

**Exponent and logarithm rules:**

$$
\begin{aligned}
(c^x)' &= c^x \cdot ln(c), & \forall x>0 \\
(e^x)' &= e^x \\
\\
(log_a(x))' &= \frac{1}{x \cdot ln(a)}, & \forall x>0  \\
(ln(x))' &= \frac{1}{x}, & \forall x>0
\end{aligned}
$$

We saw previously how Euler's number ($e$) arises from compound interest. The properties above make it very useful in a lot of calculus applications!

::: callout-note
#### Exercise

Compute the following:

1)  $\frac{d}{dx}(10e^x)$
2)  $\frac{d}{dx}(ln(x) - \frac{e^2}{3})$
:::

Now we'll get to a couple of more advanced (and powerful) rules.

* **Product rule:** $(f(x)g(x))'=f'(x)g(x) + g'(x)f(x)$

* **Quotient rule:** $\displaystyle(\frac{f(x)}{g(x)})' = \frac{f'(x)g(x) - g'(x)f(x)}{[g(x)]^2}$

* **Chain rule:** $(f(g(x))' = f'(g(x)) \cdot g'(x)$

> Let's calculate $\frac{d}{dx}(3 \cdot ln(x) \cdot x^2)$ on the board.

> Let's compute $\frac{d}{dx}(e^{x^{2}})$ on the board.

::: callout-note
#### Exercise

Use the differentiation rules we have covered so far to calculate the derivatives of $y$ with respect to $x$ of the following functions:

1)  $x^3 \cdot x$
2)  $e^x \cdot x^2$
3)  $(3x^4-8)^2$
:::

### Higher-order derivatives

We saw how politicians can refer to higher-order derivatives. To compute them, you simply "pass the outputs," starting from the lowest order and going up.

The second derivative tells us whether the slope of a function is increasing, decreasing, or staying the same at any point $x$ on the function's domain. For example, when driving a car:

-   $f(x)$ = distance traveled at time $x$
-   $f'(x)$ = speed at time $x$
-   $f''(x)$ = acceleration at time $x$

Let's compute the following second derivative:

$$f''(x^4) = \frac{d^2(x^4)}{dx^2}$$

-   First, we take the first derivative: $f'(x^4)=4x^3$

-   Then we use that output to take the second derivative: $f''(x^4)=f'(4x^3)=12x^2$

-   We can keep going... for example, the third derivative: $$f'''(x^4) = f'(12x^2) = 24x$$

::: callout-note
#### Exercise

Compute the following:

1)  $\frac{d^3}{dx^3}(x^5)$
2)  $f''(4x^{3/2})$
3)  $f''(4 \cdot ln(x))$
:::

### Partial derivatives

For a function $f(x,z)$, we might want to know how the function changes with respect to $x$. We call this a *partial derivative*:

$$
\frac{\partial}{\partial_x}f(x,z) = \frac{\partial_y}{\partial_x} = \partial_x f
$$

To obtain a partial derivative, we treat all other variables as constants and take the derivative with respect to the variable of interest (here $x$). For example:

$$
\begin{aligned}
y = f(x,z) &= xz \\
\frac{\partial_y}{\partial_x} &= z
\end{aligned}
$$

What is $\displaystyle\frac{\partial_y}{\partial_z}?$

Let's solve $\displaystyle\frac{\partial (x^2y+xy^2-x)}{\partial x}$ and $\displaystyle\frac{\partial (x^2y+xy^2-x)}{\partial y}$ on the board.

::: callout-tip
#### Example

Let's say that $y$ is how much I like a movie, $d$ is how many dogs a movie has, and $e$ is how many explosions a movie has. I claim that how much I like a movie can be expressed by a function of the type $y = f(d, e)$. Evaluate the following situations:

1.  I like dogs and I don't care about action. So I believe that the true relationship is $y = f(d, e) = 3 \cdot d$. What is $\displaystyle\frac{\partial_y}{\partial_d}$, and how can we interpret it?

2.  I like dogs and I like action. So I believe that the true relationship is $y = f(d, e) = 3 \cdot d + 1 \cdot e$. What is $\displaystyle\frac{\partial_y}{\partial_d}$, and how can we interpret it?

3.  I like dogs and I like action. *But I definitely don't like them together*---I don't want the dogs to be in danger! So I believe that the true relationship is $y = f(d, e) = 3 \cdot d + 1 \cdot e -10 \cdot d \cdot e$. What is $\displaystyle\frac{\partial_y}{\partial_d}$, and how can we interpret it?
:::

::: callout-note
#### Exercise

Take the partial derivative with respect to $x$ and with respect to $z$ of the following functions. What would the notation for each look like?

1)  $y = 3xz - x$
2)  $x^3+z^3+x^4z^4$
3)  $e^{xz}$
:::

### Differentiability of functions

Not all functions are differentiable at every point of their domains!

An important concept here is whether functions are **continuous** at a point:

-   Informally: A function is continuous at a point if its graph has no holes or breaks at that point

-   Formally: A function is continuous at a point $a$ if: $\lim_{x \to a} f(x)=f(a)$

When is a function **differentiable** at a point?

-   If a function is differentiable at a point, it is also continuous at that point.

-   If a function is continuous at a point, it is *not* necessarily differentiable at that point.

    -   Impossible to calculate derivative at sharp turns, cusps, or vertical tangents.

```{r, message=FALSE, warning=FALSE}
ggplot() +
  stat_function(fun = function(x){abs(x) + 2}, xlim = c(-4, 4),
                aes(color = "y = |x| + 2")) +
  stat_function(fun = function(x){sqrt(abs(x)) + 1}, xlim = c(-4, 4),
                aes(color = "y = √(|x|) + 1")) +
  stat_function(fun = function(x){sign(x) * abs(x)^(1 / 3)}, xlim = c(-4, 4),
                aes(color = "y = ₃√x")) +
  scale_colour_manual("Function", values = c("red", "blue", "black")) +
  labs(title = "Examples of functions that are not differentiable at x=0")
```

> Informally, functions need to be continuous and reasonably smooth to be differentiable.

### How do computers calculate derivatives?

In quite a few statistics and machine learning problems, computers need to compute derivatives of arbitrarily complex functions, perhaps millions of times. How do they do it? (see [Baydin et al. 2018](https://dl.acm.org/doi/abs/10.5555/3122009.3242010) for discussion of these three approaches)

-   Symbolic differentiation: automatically combine the rules of differentiation (power rule, product rule, etc.). It is what math solvers use, e.g., [WolframAlpha](https://www.wolframalpha.com/calculators/derivative-calculator/) or (presumably) [Symbolab](https://www.symbolab.com/solver/derivative-calculator).

-   Numerical differentiation: infer the derivative by computing the function at different sample values (like we did with $y=x^2$ before. This is what, for example, R's `optim()` function does behind the scenes.

```{r}
# minimize the x^2 + 5 function:
optim(par = 0, fn = function(x){x ^ 2 + 5}, method = "L-BFGS-B")
```

-   Automatic differentiation: track how every function is constructed from (differentiable) elementary computer operations (e.g., binary arithmetic), and get the result using the chain rule. Implemented in the [`torch`](https://torch.mlverse.org/) R package, the [`TensorFlow`](https://www.tensorflow.org/), [`PyTorch`](https://pytorch.org/), and [`JAX`](https://jax.readthedocs.io/en/) Python libraries, and the [`ReverseDiff.jl`](https://juliadiff.org/ReverseDiff.jl/) and [`Zygote.jl`](https://fluxml.ai/Zygote.jl/) Julia packages.

![An example of computing the gradient of an esoteric function using Zygote.jl (from [its documentation](https://fluxml.ai/Zygote.jl/stable/#Taking-Gradients-1))](images/julia_zygote.png)

## Optimization

*Optimization* allows us to find the minimum or maximum values (or *extrema*) a function takes. It has many applications in the social sciences:

-   Formal theory: utility maximization, continuous choices

-   Ordinary Least Squares (OLS): Focuses on *minimizing* the squared errors between observed data and model-estimated values

-   Maximum Likelihood Estimation (MLE): Focuses on *maximizing* a likelihood function, given observed values.

### Extrema

On **extrema**: informally, a maximum is just the highest value a function takes, and a minimum is the lowest value.

In some situations, it can be easy to identify extrema intuitively by looking at a graph of the function.

- Maxima are high points ("peaks")

- Minima are low points ("valleys")

We can use derivatives (rates of change!) to get at extrema.

### Critical points and the First-Order Condition

At critical points (or stationary points), the derivative is zero or fails to exist. At these, the function has *usually* reached a (local) maximum or minimum.

- At a maximum, the function must be increasing before the point and decreasing after it.

- At a minimum, the function must be decreasing before the point and increasing after it.

::: callout-warning
Local extrema occur at critical points, but not all critical points are extrema. For instance, sometimes the graph is changing between concave and convex ("inflection points"). Or sometimes the function is not differentiable at that point for other reasons.
:::

We can find the local maxima and/or minima of a function by taking the derivative, setting it equal to zero, and solving for $x$ (or whatever variable). This gives us the First-Order Condition (FOC).

$$FOC: f'(x)=0$$

### Second-Order Condition

Notice that after this we only know that there is a critical point. **BUT** we don't know if we've found a maximum or minimum, or even if we've found an extremum.

To determine whether a we are seeing a (local) maximum or minimum, we can use the **Second Derivative Test**:

-   Start by identifying $f''(x)$

-   Substitute in the stationary points $(x^*)$ identified from the FOC.

    -   $f''(x^*) > 0$ we have a local minimum

    -   $f''(x^*) < 0$ we have a local maximum

    -   $f''(x^*) = 0$ we (may) have an inflection point - need to calculate higher-order derivatives (don't worry about this now)

Collectively these give us the **Second-Order Condition (SOC)**.

Let's do this procedure and obtain the FOC and SOC for $\displaystyle y=\frac{1}{2} x^3 + 3 x^2 - 2$ on the board. What do we learn? Compare this with the plot of the function on [Desmos](https://www.desmos.com/calculator).

### Local or global extrema?

Now when it comes to knowing whether extrema are local or global:

-   Here we use the **Extreme value theorem**, which states that if a real-valued function is continuous on a closed and bounded (i.e., finite) interval, the function must have a global minimum and a global minimum on that interval at least once. Importantly, in this situation the global extrema exist, and **they are either at the local extrema or at the boundaries** (where we cannot even find critical points).

-   So to find the minimum/maximum on some interval, compare the local min/max to the value of the function at the interval's endpoints. So, e.g., if the interval is $(-\infty, +\infty)$, check the function's limits as it approaches $-\infty$ and $+\infty$.

Let's try this last step for our example above, $\displaystyle y=\frac{1}{2} x^3 + 3 x^2 - 2$, to get the global extrema in the entire domain.

::: callout-note
#### Exercise

Identify the global extrema of the function $\displaystyle \frac{x^3}{3} - \frac{3}{2}x^2 -10x$ in the interval $[-6, 6]$.
:::

## Integrals

Informally, we can think of integrals as the flip side of derivatives.

We can motivate integrals as a way of finding the area under a curve. Sometimes finding the area is easy. What's the area under the curve between $x=-1$ and $x=1$ for this function?

$$
f(x) =
\begin{cases}
\frac{1}{3} & \text{for } x \in [0, 3] \\
0 & \text{otherwise}
\end{cases}
$$

Normally, finding the area under a curve is much harder. But this is basically the question behind integration.

### Integrals are about infinitesimals too

Let's say we have a function $y = x^2$ And we want to find the area under the curve from $x=0$ to $x=1$. How would we do this?

```{r}
ggplot() +
  # draw main function
  stat_function(fun = function(x){x ^ 2}, xlim = c(-2, 2)) +
  # fill area under the curve between x = 0 and x = 1
  geom_area(mapping = aes(x = 0), stat = "function",
            fun = function(x){x ^ 2}, xlim = c(0, 1), fill = "red")
```

One way to approximate this area is by drawing narrow rectangles that cover the area in red. Let's draw this on the board.

Our approximation is rough, but it gets better and better the narrower the rectangles are:

$$
Area = lim_{\Delta x \to 0}\sum_i^n{f(x) \cdot \Delta x}
$$

, where $\Delta x$ is the width of the rectangles and $n$ is their number.

This is actually one way to define the **definite integral**, $\displaystyle\int_a^bf(x)dx$ (also known as the Riemann integral). We'll learn how to compute these in a few moments.

### Indefinite integrals as antiderivatives

The **indefinite integral**, also known as the **antiderivative**, $F(x)$ is the inverse of the function $f'(x)$. $$F(x)= \displaystyle\int f(x) \text{ } dx$$

This means if you take the derivative of $F(x)$, you wind up back at $f(x)$. $$F' = f \text{ or } \displaystyle\frac{dF(x)}{dx} = f(x)$$

For example, what is the antiderivative for a constant function $f(x) = 1$? Is there just one? (this example comes from [Moore and Siegel, 2013](https://press.princeton.edu/books/paperback/9780691159171/a-mathematics-course-for-political-and-social-research), p. 137).

This process is called *anti-differentiation*. We can use this concept to help us solve definite integrals!

### Solving definite integrals

One way to calculate definite integrals, known as the "fundamental theorem of calculus," is shown below:

$$\displaystyle\int_{a}^{b} f(x) \text{ } dx = F(b)-F(a) = F(x)\bigg|_{a}^{b}$$

First we determine the antiderivative (indefinite integral) of $f(x)$ (and represent it $F(x)$), substitute the upper limit first and then the lower limit one by one, and subtract the results in order.

::: callout-warning
$C$ in the following definitions and rules is the called the "constant of integration." We need to add it when we define *all* antiderivatives (integrals) of a function because the anti-derivative "undoes" the derivative.

Remember that the derivative of any constant is zero. So if we find an integral $F(x)$ whose derivative is $f(x)$, adding (or subtracting) any constant will give us another integral $F(x)+C$ whose derivative is *also* $f(x)$.
:::

### Rules of integration

Many of the rules of integetration have counterparts in differentiation.

**Coefficient rule:** $\displaystyle\int c f(x)\,dx = c \int f(x)\,dx$

**Sum/difference rule:** $\displaystyle\int (f(x) \pm g(x))\,dx = \int f(x)\,dx \pm \int g(x)\,dx$

**Constant rule:** $\displaystyle\int c\,dx = cx + C$

**Power rule:** $\displaystyle\int x^n\,dx = \frac{x^{n+1}}{n+1} + C \qquad \forall n \neq -1$

**Reciprocal rule:**$\displaystyle\int \frac{1}{x}\,dx = \ln(x) + C$

**Exponent and logarithm rules:**

$$
\begin{aligned}
\displaystyle \int e^x \,dx &= e^x+C \\
\displaystyle \int c^x \,dx &= \frac{c^x}{ln(c)}+C\\
\\
\displaystyle \int ln(x) \,dx &= x \cdot ln(x) - x+C \\
\displaystyle \int log_c(x) \,dx &= \frac{x \cdot log_c(x) - x}{log_c(x)}+C
\end{aligned}
$$

The final two rules are analog to the product rule and the chain rule:

**Integration by parts:** $\displaystyle \int f(x)g'(x)\,dx = f(x)g(x) - \int f'(x)g(x)\,dx$

**Integration by substitution:**

$$
\begin{aligned}
1.& \text{ Have }\displaystyle \int f(g(x))g'(x)\,dx \\
2.& \text{ Set u=g(x)} \\
3.& \text{ Compute } \int f(u)\,du \\
4.& \text{ Replace u for g(x)}
\end{aligned}
$$

Let's do an example on the board: $\displaystyle \int e^{x^2} 2x  \,dx$.

### Solving the problem

Remember our function $y=x^2$ and our goal of finding the area under the curve from $x=0$ to $x=1$. We can describe this problem as $\displaystyle \int_0^1 x^2 dx$

Find the indefinite integral, $F(x)$:

$$
\displaystyle\int x^2 \text{ } dx = \displaystyle\frac{x^3}{3}+C
$$

Now we'll use the fundamental theory of calculus. Evaluate at our lowest and highest points, $F(0)$ and $F(1)$:

- $F(0) = 0$

- $F(1) = \displaystyle\frac{1}{3}$

- Technically $0 + C$ and $\displaystyle\frac{1}{3} + C$, but the C's will fall out in the next step

Calculate $F(1) - F(0)$ $$\displaystyle\frac{1}{3} - 0 = \displaystyle\frac{1}{3}$$

::: callout-note
#### Exercise

Solve the following indefinite integrals:

1.  $\int x^2 \text{ } dx$

2.  $\int 3x^2\text{ } dx$

3.  $\int x\text{ } dx$

4.  $\int (3x^2 + 2x - 7\text{ })dx$

5.  $\int \dfrac{2}{x}\text{ }dx$

And solve the following definite integrals:

1.  $\displaystyle\int_{1}^{7} x^2 \text{ } dx$

2.  $\displaystyle\int_{1}^{10} 3x^2 \text{ } dx$

3.  $\displaystyle\int_7^7 x\text{ } dx$

4.  $\displaystyle\int_{1}^{5} 3x^2 + 2x - 7\text{ }dx$

5.  $\int_{1}^{e} \dfrac{2}{x}\text{ }dx$
:::