PyTorchExamples/README.html at main · JohnNehls/PyTorchExamples · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<!-- 2023-07-15 Sat 18:07 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<title>PyTorch Examples</title>
<meta name="author" content="ape" />
<meta name="generator" content="Org Mode" />
<link rel="stylesheet" type="text/css" href="https://fniessen.github.io/org-html-themes/src/readtheorg_theme/css/htmlize.css"/>
<link rel="stylesheet" type="text/css" href="https://fniessen.github.io/org-html-themes/src/readtheorg_theme/css/readtheorg.css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.4/js/bootstrap.min.js"></script>
<script type="text/javascript" src="https://fniessen.github.io/org-html-themes/src/lib/js/jquery.stickytableheaders.min.js"></script>
<script type="text/javascript" src="https://fniessen.github.io/org-html-themes/src/readtheorg_theme/js/readtheorg.js"></script>
<script type="text/x-mathjax-config">
    MathJax.Hub.Config({
        displayAlign: "center",
        displayIndent: "0em",

        "HTML-CSS": { scale: 100,
                        linebreaks: { automatic: "false" },
                        webFont: "TeX"
                       },
        SVG: {scale: 100,
              linebreaks: { automatic: "false" },
              font: "TeX"},
        NativeMML: {scale: 100},
        TeX: { equationNumbers: {autoNumber: "AMS"},
               MultLineWidth: "85%",
               TagSide: "right",
               TagIndent: ".8em"
             }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js?config=TeX-AMS_HTML"></script>
</head>
<body>
<div id="content" class="content">
<h1 class="title">PyTorch Examples</h1>
<div id="table-of-contents" role="doc-toc">
<h2>Table of Contents</h2>
<div id="text-table-of-contents" role="doc-toc">
<ul>
<li><a href="#org3a62d9f">Linear Regression</a>
<ul>
<li><a href="#org4ed70ed">Prediction equation</a></li>
<li><a href="#orgb41e8f3">Optimizing  the Weights with Gradient Descent</a></li>
<li><a href="#org5365715">Choice of Loss function</a></li>
<li><a href="#org0f893bc">Examples in PyTorch</a></li>
</ul>
</li>
<li><a href="#org7716503">Logistic Regression for Linear Classification</a>
<ul>
<li><a href="#orgd56c3f0">Example: Logistic Regression in 1D</a></li>
</ul>
</li>
<li><a href="#org3c991b5">Neural Networks for Classification</a>
<ul>
<li><a href="#orgcf9974f">Single-Linear-Layer Softmax Classifier</a></li>
<li><a href="#org20a0785">Multi-Layer Perceptron with Embedding</a></li>
</ul>
</li>
<li><a href="#org0cb365f">Deep Neural Network</a></li>
<li><a href="#org353f393">Convolutional Neural Network</a></li>
<li><a href="#orgc459def">Notes</a>
<ul>
<li><a href="#org150363b">argmax example:</a></li>
<li><a href="#org4d29278">Definitions</a></li>
<li><a href="#org23a6850">PyTorch Modules</a></li>
<li><a href="#orgdd265e2">Basic outline of a script</a></li>
<li><a href="#orge67a4b9">Tensors</a></li>
</ul>
</li>
</ul>
</div>
</div>
<p>
A repository of machine learning notes and python scripts, scripts which demonstrate PyTorch in simple scenarios. Each section starts with a simple outline of the learning  method, then a discussion on the provided python scripts. The scripts increasingly integrate PyTorch's library.
</p>

<p>
GitHub does not render my equations very well.  For a better experience, view the web version of this README:  <a href="https://raw.githack.com/JohnNehls/PyTorchExamples/main/README.html">LINK HERE</a>.
</p>
<div id="outline-container-org3a62d9f" class="outline-2">
<h2 id="org3a62d9f">Linear Regression</h2>
<div class="outline-text-2" id="text-org3a62d9f">
<p>
The ubiquitous linear regression is deployed when we can stomach the assumption that the relationship between the features and the target is approximately linear and the noise is "well-behaved" which I believe means it follows a normal distribution.
</p>

<p>
Minimizing the mean squared error is equivalent to maximum likelihood estimation of a
linear model under the assumption of additive Gaussian noise. [D2L 3.1.3]
</p>
</div>

<div id="outline-container-org4ed70ed" class="outline-3">
<h3 id="org4ed70ed">Prediction equation</h3>
<div class="outline-text-3" id="text-org4ed70ed">
<p>
Linear Regression is used to predict a numerical value, \(\hat{y}\), from observations, \(X = \{x_1, x_2, \cdots,  x_{n}\}\) with the equation  \[\hat{y}(X) = \sum_{i=0}^{n} w_i f_i (X)\]  where the linear name (in linear regression)  comes from the fact that the equation is linear w.r.t. the weights, \(w_i\).
</p>

<p>
The combination of function and data point \(f_i (X)\) is called a <b>feature</b>. Features make up the <i>basis</i> for the regression, thus should be unique. <b>The features are up to us; we most only keep the prediction equation linear  w.r.t. the weights.</b>
</p>
</div>
<div id="outline-container-org0145e13" class="outline-4">
<h4 id="org0145e13">Examples</h4>
<div class="outline-text-4" id="text-org0145e13">
<p>
W can have linear combination of the observations,
  \[\hat{y} = w_0 x_0 + w_1 x_1 + w_2 x_2 + w_3 x_3\]
or a linear combination of polynomials on one data point,
  \[\hat{y} = w_0 + w_1 x + w_2 x^2 + w_3 x^3\],
even a linear combination of special functions with mixed kernel data points,  \[\hat{y} = w_0 \left( x_0 +x_3 \right) + w_1 arcsin(x_0) + tanh(x_1^2) + w_3 x_0^5\].
</p>
</div>
</div>
<div id="outline-container-orgc5f4930" class="outline-4">
<h4 id="orgc5f4930">Choosing the features</h4>
<div class="outline-text-4" id="text-orgc5f4930">
<p>
In the linear regression paradigm, the features are chosen  before the optimization. Thus, the guidance is to  investigation the data and try to find important features. how to do this is beyond the scope of the notes. <i>footnote:</i> In deep learning the features are also learned int he optimization process.
</p>
</div>
</div>

<div id="outline-container-org164dc7b" class="outline-4">
<h4 id="org164dc7b">Canonical formulation</h4>
<div class="outline-text-4" id="text-org164dc7b">
<p>
The way this prediction equation is usually expressed is that each \(x\) is a feature, and the prediction equation can be expressed with linear algebra,
\[\begin{align}
\hat{y}(X) &=& w_0\sum_{i=1}^{n} w_i x_i  \\
                  &=& w_0 + \textbf{w}^T \textbf{x} \\
                  &=& \textbf{X} \textbf{w} + w_0
\end{align}\],
which clearly shows the linear algebra under the hood ad the expense of leaving out the idea of how the features are formed.
</p>

<p>
I chose to present it as I did above as it more clearly shows, given a set of data, what we are able to do to come up with an effective model; namely, combine and transform the measurements as we wish.
</p>

<p>
Regardless of preference, the following material is agnostic to the expression of the prediction equation.
</p>
</div>

<ul class="org-ul">
<li><a id="org3ef584a"></a>including combinations of features<br />
<div class="outline-text-5" id="text-org3ef584a">
<p>
There is also a canonical formulation including the set of features \[\psi_{i,j} = x_i \times x_j\] where we ignore the repeats.
</p>
</div>
</li>
</ul>
</div>
</div>


<div id="outline-container-orgb41e8f3" class="outline-3">
<h3 id="orgb41e8f3">Optimizing  the Weights with Gradient Descent</h3>
<div class="outline-text-3" id="text-orgb41e8f3">
<p>
The goal is to have the predictions, \(\hat{y}(X)\), be accurate. This is done by having, hopefully, lots of data,
\[\{ y^0, X^0\},\{y^1, X^1\}, \cdots,  \{y^{N-1}, X^{N-1}\}\]
which now includes the observed values of the parameter we hope to predict, \(y\), to compare to.
We do this by minimizing the loss,
\[loss = \frac{1}{N}\sum_{i=0}^{N-1}\left( y^i - \hat{y}(X^i) \right)^2\],
where the only free parameters are the weights in \(\hat{y}\).
</p>

<p>
<i>footnote:</i> In a linear algebra context, we would consider the fully system as matrix multiplication, notionally written \[\bf{\hat{y}} = \textbf{X}\textbf{w}\], and be concerned whether it was over/under determined and how to <i>regularize</i>. In these notes we expect that the system is over determined (there are more data examples than features).
</p>

<p>
In the machine learning context, we calculate the gradient of the loss w.r.t. the weights, \(\nabla loss\), and take a <i>step</i> in that direction by updating the weights with
\[w^{new} = w^{old} + n \nabla loss\]
where \(n\) is a scalar called the <b>learning rate</b>.
</p>

<p>
This is done iteratively, either a number of iteration or until the loss reaches a desired threshold. It must be said that this scheme is a <span class="underline">local</span> minimum finder; it will find a local minimum, but this is necessarily the lowest possible cost, also that the local minimum depends greatly on the starting set of weights.
</p>
</div>
</div>

<div id="outline-container-org5365715" class="outline-3">
<h3 id="org5365715">Choice of Loss function</h3>
<div class="outline-text-3" id="text-org5365715">
<p>
It should be obvious that the choice of loss function greatly dictates the optimized weights. This is often stated as the crux of machine learning. At this point, we will only point out how it plays out the example of Linear Regression via  a small generalization of our previous loss function,
</p>

<p>
\[loss(m) = \frac{1}{N} \sum_{i=0}^{N-1} \left( | y^i - \hat{y}(X^i) | \right)^m\]
</p>

<p>
where the larger the value of \(m\), the more large outliers in prediction error \[\left(|y - \hat{y}|\right)\] contribute to the loss and, thus, the higher their priority for minimization during the optimization.
</p>
</div>
</div>

<div id="outline-container-org0f893bc" class="outline-3">
<h3 id="org0f893bc">Examples in PyTorch</h3>
<div class="outline-text-3" id="text-org0f893bc">
<ul class="org-ul">
<li>The following section shows PyTorch's use of Gradient Descent to fit a line to noisy data set.</li>
<li>The standard linear regression naming conventions are used:  the input data \(x\) and \(y\), the fit parameter \(w\) and bias \(b\), and the predicted dependent value,
\[\hat{y} = w x +b.\]</li>
<li>Each regression is found using the <b>mean squared error</b> (MSE) cost function,
\[loss =  \frac{1}{N} \sum (  y_i - \hat{y}_i)^2.\]</li>
<li>Each epoch moves the parameters such that the MSE(\(\hat{y},y)\) is minimized.
<ul class="org-ul">
<li>Note:  epoch 0 line in each figure displays the initial values of the parameters.</li>
</ul></li>
</ul>
</div>
<div id="outline-container-orgb5681c6" class="outline-4">
<h4 id="orgb5681c6">Simple Gradient Descent</h4>
<div class="outline-text-4" id="text-orgb5681c6">
<p>
<a href="./LR_noDatasetClass.py">Example script</a> using PyTorch for partial derivatives within a simple  linear regression on a  data set with normal noise added. This serves as the first step in using PyTorch as it does not employ any of the other PyTorch features which are the subject of the following examples.
</p>

<p>
Simple gradient the via PyTorch's partial derivative.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #98be65;"># tells the tree to calculate the partial derivatives of the loss wrt all of the</span>
<span style="color: #98be65;">#contributing tensors with the "requires_grad = True" in their constructor.</span>
loss.<span style="color: #458588; font-style: italic;">backward</span><span style="color: #458588;">()</span>

<span style="color: #98be65;">#gradient descent</span>
w.<span style="color: #83a598; font-style: italic;">data</span> <span style="color: #fb4933;">=</span> w.<span style="color: #d3869b; font-style: italic;">data</span> <span style="color: #fb4933;">-</span> lr<span style="color: #fb4933;">*</span>w.<span style="color: #d3869b; font-style: italic;">grad</span>.<span style="color: #d3869b; font-style: italic;">data</span>
b.<span style="color: #83a598; font-style: italic;">data</span> <span style="color: #fb4933;">=</span> b.<span style="color: #d3869b; font-style: italic;">data</span> <span style="color: #fb4933;">-</span> lr<span style="color: #fb4933;">*</span>b.<span style="color: #d3869b; font-style: italic;">grad</span>.<span style="color: #d3869b; font-style: italic;">data</span>

<span style="color: #98be65;">#must zero out the gradient otherwise PyTorch accumulates the gradient.</span>
w.<span style="color: #d3869b; font-style: italic;">grad</span>.<span style="color: #d3869b; font-style: italic;">data</span>.<span style="color: #458588; font-style: italic;">zero_</span><span style="color: #458588;">()</span>
b.<span style="color: #d3869b; font-style: italic;">grad</span>.<span style="color: #d3869b; font-style: italic;">data</span>.<span style="color: #458588; font-style: italic;">zero_</span><span style="color: #458588;">()</span>
</pre>
</div>


<div id="orge3ed341" class="figure">
<p><img src="./figs/LR_noDatasetClass.png" alt="LR_noDatasetClass.png" />
</p>
</div>
</div>

<ul class="org-ul">
<li><a id="org6c0af77"></a>Comments<br />
<div class="outline-text-5" id="text-org6c0af77">
<ul class="org-ul">
<li>The optimal learning rate is directly connect to how good the initial guess is and how noisy the data is.
<ul class="org-ul">
<li>If there is a very large loss (error) and a moderate learning rate, the step is possibly too large, leading to an even larger loss and thus an even larger step, etc, until the loss is NA.</li>
</ul></li>
<li>With a single learning rate, the slope learned much faster than the bias.</li>
</ul>
</div>
</li>
</ul>
</div>

<div id="outline-container-org8628d6e" class="outline-4">
<h4 id="org8628d6e">Mini-Batch Gradient Descent using Dataset and DataLoader</h4>
<div class="outline-text-4" id="text-org8628d6e">
<p>
<a href="./LR_miniBatch_datasetDataLoader.py">Example script</a> using mini-batch gradient descent for linear regression, while also using PyTorch's Dataset and DataLoader features.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #fb4933;">class</span> <span style="color: #d3869b;">noisyLineData</span><span style="color: #458588;">(</span><span style="color: #d3869b;">Dataset</span><span style="color: #458588;">)</span>:
    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__init__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">N</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">100</span>, <span style="color: #83a598;">slope</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">3</span>, <span style="color: #83a598;">intercept</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">2</span>, <span style="color: #83a598;">stdDev</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">100</span><span style="color: #458588;">)</span>:
        <span style="color: #fb4933;">self</span>.<span style="color: #83a598; font-style: italic;">x</span> <span style="color: #fb4933;">=</span> torch.<span style="color: #458588; font-style: italic;">linspace</span><span style="color: #458588;">(</span><span style="color: #fb4933;">-</span><span style="color: #d3869b;">100</span>,<span style="color: #d3869b;">100</span>,<span style="color: #d3869b;">N</span><span style="color: #458588;">)</span>
        <span style="color: #fb4933;">self</span>.<span style="color: #83a598; font-style: italic;">y</span> <span style="color: #fb4933;">=</span> slope<span style="color: #fb4933;">*</span><span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">x</span> <span style="color: #fb4933;">+</span> intercept <span style="color: #fb4933;">+</span> np.<span style="color: #d3869b; font-style: italic;">random</span>.<span style="color: #458588; font-style: italic;">normal</span><span style="color: #458588;">(</span><span style="color: #d3869b;">0</span>, stdDev, <span style="color: #d3869b;">N</span><span style="color: #458588;">)</span> <span style="color: #98be65;">#can use numpy for random</span>

    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__getitem__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">index</span><span style="color: #458588;">)</span>:
        <span style="color: #fb4933;">return</span> <span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">x</span><span style="color: #458588;">[</span>index<span style="color: #458588;">]</span>, <span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">y</span><span style="color: #458588;">[</span>index<span style="color: #458588;">]</span>

    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__len__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span><span style="color: #458588;">)</span>:
        <span style="color: #fb4933;">return</span> <span style="color: #cd8500;">len</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">x</span><span style="color: #458588;">)</span>

<span style="color: #83a598;">data</span> <span style="color: #fb4933;">=</span> <span style="color: #458588;">noisyLineData</span><span style="color: #458588;">()</span>

<span style="color: #83a598;">trainloader</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">DataLoader</span><span style="color: #458588;">(</span><span style="color: #cd8500;">dataset</span> <span style="color: #fb4933;">=</span> data, <span style="color: #cd8500;">batch_size</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">20</span><span style="color: #458588;">)</span>
</pre>
</div>


<div id="org9ac1bf0" class="figure">
<p><img src="./figs/LR_miniBatch_datasetDataLoader.png" alt="LR_miniBatch_datasetDataLoader.png" />
</p>
</div>
</div>

<ul class="org-ul">
<li><a id="org08b9321"></a>Comments<br />
<div class="outline-text-5" id="text-org08b9321">
<ul class="org-ul">
<li>The <b>Dataset</b> and <b>DataLoader</b> concepts are simple and useful for abstracting out the data.
<ul class="org-ul">
<li>They will be particularly useful when the data is larger we can hold in the machine's memory.</li>
</ul></li>
<li>With the same learning rates as for the full gradient descent, the mini-batch often learned considerably faster than simple Gradient Descent per epoch.</li>
</ul>
</div>
</li>
</ul>
</div>

<div id="outline-container-org7c7be3c" class="outline-4">
<h4 id="org7c7be3c">Mini-Batch Gradient Descent the full PyTorch Way</h4>
<div class="outline-text-4" id="text-org7c7be3c">
<p>
<a href="./LR_miniBatch_PyTorchWay.py">Example script</a> of the same linear regression scenario, now using <code>nn.modules</code> for the model and the <code>optim</code> for optimization (the step):
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #fb4933;">class</span> linear_regression<span style="color: #458588;">(</span>nn.<span style="color: #d3869b; font-style: italic;">Module</span><span style="color: #458588;">)</span>:
    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__init__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">input_size</span>, <span style="color: #83a598;">output_size</span><span style="color: #458588;">)</span>:
        <span style="color: #98be65;">#call the super's constructor and use it without having to store it directly.</span>
        <span style="color: #cd8500;">super</span><span style="color: #458588;">(</span>linear_regression, <span style="color: #fb4933;">self</span><span style="color: #458588;">)</span>.<span style="color: #458588; font-style: italic;">__init__</span><span style="color: #458588;">()</span>
        <span style="color: #fb4933;">self</span>.<span style="color: #83a598; font-style: italic;">linear</span> <span style="color: #fb4933;">=</span> nn.<span style="color: #d3869b; font-style: italic;">Linear</span><span style="color: #458588;">(</span>input_size, output_size<span style="color: #458588;">)</span>

    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">forward</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">x</span><span style="color: #458588;">)</span>:
        <span style="color: #cdbe70;">"""</span><span style="color: #cdbe70;">Prediction</span><span style="color: #cdbe70;">"""</span>
        <span style="color: #fb4933;">return</span> <span style="color: #fb4933;">self</span>.<span style="color: #458588; font-style: italic;">linear</span><span style="color: #458588;">(</span>x<span style="color: #458588;">)</span>

<span style="color: #83a598;">criterion</span> <span style="color: #fb4933;">=</span> nn.<span style="color: #d3869b; font-style: italic;">MSELoss</span><span style="color: #458588;">()</span>

<span style="color: #83a598;">model</span> <span style="color: #fb4933;">=</span> <span style="color: #458588;">linear_regression</span><span style="color: #458588;">(</span><span style="color: #d3869b;">1</span>,<span style="color: #d3869b;">1</span><span style="color: #458588;">)</span>
model.<span style="color: #458588; font-style: italic;">state_dict</span><span style="color: #458588;">()[</span><span style="color: #cdbe70;">'linear.weight'</span><span style="color: #458588;">][</span><span style="color: #d3869b;">0</span><span style="color: #458588;">]</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">0</span>
model.<span style="color: #458588; font-style: italic;">state_dict</span><span style="color: #458588;">()[</span><span style="color: #cdbe70;">'linear.bias'</span><span style="color: #458588;">][</span><span style="color: #d3869b;">0</span><span style="color: #458588;">]</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">0</span>

<span style="color: #83a598;">optimizer</span> <span style="color: #fb4933;">=</span> optim.<span style="color: #d3869b; font-style: italic;">SGD</span><span style="color: #458588;">(</span>model.<span style="color: #458588; font-style: italic;">parameters</span><span style="color: #b16286;">()</span>, <span style="color: #cd8500;">lr</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">1e-4</span><span style="color: #458588;">)</span>
</pre>
</div>


<div id="orge51928d" class="figure">
<p><img src="./figs/LR_miniBatch_PyTorchway.png" alt="LR_miniBatch_PyTorchway.png" />
</p>
</div>
</div>
</div>

<div id="outline-container-orga0dc610" class="outline-4">
<h4 id="orga0dc610">Comments</h4>
<div class="outline-text-4" id="text-orga0dc610">
<ul class="org-ul">
<li>The optimizer <code>optim.SGD</code> easily beats mini-batch easily per-epoch.</li>
</ul>
</div>
</div>
</div>
</div>

<div id="outline-container-org7716503" class="outline-2">
<h2 id="org7716503">Logistic Regression for Linear Classification</h2>
<div class="outline-text-2" id="text-org7716503">
<ul class="org-ul">
<li>We map the out put of a line/plane to [0,1] for classification. To do this, we use the sigmoid function,</li>
</ul>
<p>
\[\sigma(z) = \frac{1}{1+e^{-z}},\]
as the simple binary function flattens the gradient and thus leads to slow learning.
</p>

<ul class="org-ul">
<li>As a prediction we use,</li>
</ul>
<p>
\[ \hat{y}= 1 \text{ if } \sigma(x) >0.5 \text{ else }\hat{y} =0.\]
</p>

<ul class="org-ul">
<li>We then use new loss to reflect the predictions, <b>Binary Cross Entropy Loss</b>.</li>
</ul>
</div>

<div id="outline-container-orgd56c3f0" class="outline-3">
<h3 id="orgd56c3f0">Example: Logistic Regression in 1D</h3>
<div class="outline-text-3" id="text-orgd56c3f0">
<p>
<a href="./LogReg_PyTorch.py">Example script</a>
Now we use linear regression and with the sigmoid function to find the line/plane/hyperplane between two classes, here [0,1].
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="color: #98be65;">#create noisy data</span>
<span style="color: #fb4933;">class</span> <span style="color: #d3869b;">NoisyBinaryData</span><span style="color: #458588;">(</span><span style="color: #d3869b;">Dataset</span><span style="color: #458588;">)</span>:
    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__init__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">N</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">100</span>, <span style="color: #83a598;">x0</span><span style="color: #fb4933;">=-</span><span style="color: #d3869b;">3</span>, <span style="color: #83a598;">x1</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">5</span>, <span style="color: #83a598;">stdDev</span><span style="color: #fb4933;">=</span><span style="color: #d3869b;">2</span><span style="color: #458588;">)</span>:
        <span style="color: #83a598;">xlist</span> <span style="color: #fb4933;">=</span> <span style="color: #458588;">[]</span>; <span style="color: #83a598;">ylist</span> <span style="color: #fb4933;">=</span> <span style="color: #458588;">[]</span>
        <span style="color: #fb4933;">for</span> <span style="color: #83a598;">i</span> <span style="color: #fb4933;">in</span> <span style="color: #cd8500;">range</span><span style="color: #458588;">(</span><span style="color: #d3869b;">N</span><span style="color: #458588;">)</span>:
            <span style="color: #98be65;">#class 0</span>
            <span style="color: #fb4933;">if</span> np.<span style="color: #d3869b; font-style: italic;">random</span>.<span style="color: #458588; font-style: italic;">rand</span><span style="color: #458588;">()</span><span style="color: #fb4933;">&lt;</span><span style="color: #d3869b;">0.5</span>:
                xlist.<span style="color: #458588; font-style: italic;">append</span><span style="color: #458588;">(</span>np.<span style="color: #d3869b; font-style: italic;">random</span>.<span style="color: #458588; font-style: italic;">normal</span><span style="color: #b16286;">(</span>x0,stdDev<span style="color: #b16286;">)</span><span style="color: #458588;">)</span>
                ylist.<span style="color: #458588; font-style: italic;">append</span><span style="color: #458588;">(</span><span style="color: #d3869b;">0.0</span><span style="color: #458588;">)</span>
            <span style="color: #98be65;">#class 1</span>
            <span style="color: #fb4933;">else</span>:
                xlist.<span style="color: #458588; font-style: italic;">append</span><span style="color: #458588;">(</span>np.<span style="color: #d3869b; font-style: italic;">random</span>.<span style="color: #458588; font-style: italic;">normal</span><span style="color: #b16286;">(</span>x1,stdDev<span style="color: #b16286;">)</span><span style="color: #458588;">)</span>
                ylist.<span style="color: #458588; font-style: italic;">append</span><span style="color: #458588;">(</span><span style="color: #d3869b;">1.0</span><span style="color: #458588;">)</span>

        <span style="color: #fb4933;">self</span>.<span style="color: #83a598; font-style: italic;">x</span> <span style="color: #fb4933;">=</span> torch.<span style="color: #458588; font-style: italic;">tensor</span><span style="color: #458588;">(</span>xlist<span style="color: #458588;">)</span>.<span style="color: #458588; font-style: italic;">view</span><span style="color: #458588;">(</span><span style="color: #fb4933;">-</span><span style="color: #d3869b;">1</span>,<span style="color: #d3869b;">1</span><span style="color: #458588;">)</span>
        <span style="color: #fb4933;">self</span>.<span style="color: #83a598; font-style: italic;">y</span> <span style="color: #fb4933;">=</span> torch.<span style="color: #458588; font-style: italic;">tensor</span><span style="color: #458588;">(</span>ylist<span style="color: #458588;">)</span>.<span style="color: #458588; font-style: italic;">view</span><span style="color: #458588;">(</span><span style="color: #fb4933;">-</span><span style="color: #d3869b;">1</span>,<span style="color: #d3869b;">1</span><span style="color: #458588;">)</span>

    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__getitem__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">index</span><span style="color: #458588;">)</span>:
        <span style="color: #fb4933;">return</span> <span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">x</span><span style="color: #458588;">[</span>index<span style="color: #458588;">]</span>, <span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">y</span><span style="color: #458588;">[</span>index<span style="color: #458588;">]</span>

    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__len__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span><span style="color: #458588;">)</span>:
        <span style="color: #fb4933;">return</span> <span style="color: #cd8500;">len</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>.<span style="color: #d3869b; font-style: italic;">x</span><span style="color: #458588;">)</span>

np.<span style="color: #d3869b; font-style: italic;">random</span>.<span style="color: #458588; font-style: italic;">seed</span><span style="color: #458588;">(</span><span style="color: #d3869b;">0</span><span style="color: #458588;">)</span>
<span style="color: #83a598;">data</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">NoisyBinaryData</span><span style="color: #458588;">()</span>
<span style="color: #83a598;">trainloader</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">DataLoader</span><span style="color: #458588;">(</span><span style="color: #cd8500;">dataset</span> <span style="color: #fb4933;">=</span> data, <span style="color: #cd8500;">batch_size</span> <span style="color: #fb4933;">=</span> <span style="color: #d3869b;">20</span><span style="color: #458588;">)</span>

<span style="color: #98be65;"># create my "own" linear regression model</span>
<span style="color: #fb4933;">class</span> logistic_regression<span style="color: #458588;">(</span>nn.<span style="color: #d3869b; font-style: italic;">Module</span><span style="color: #458588;">)</span>:
    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">__init__</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">input_size</span>, <span style="color: #83a598;">output_size</span><span style="color: #458588;">)</span>:
        <span style="color: #98be65;">#call the super's constructor and use it without having to store it directly.</span>
        <span style="color: #cd8500;">super</span><span style="color: #458588;">(</span>logistic_regression, <span style="color: #fb4933;">self</span><span style="color: #458588;">)</span>.<span style="color: #458588; font-style: italic;">__init__</span><span style="color: #458588;">()</span>
        <span style="color: #fb4933;">self</span>.<span style="color: #83a598; font-style: italic;">linear</span> <span style="color: #fb4933;">=</span> nn.<span style="color: #d3869b; font-style: italic;">Linear</span><span style="color: #458588;">(</span>input_size, output_size<span style="color: #458588;">)</span>

    <span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">forward</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>, <span style="color: #83a598;">x</span><span style="color: #458588;">)</span>:
        <span style="color: #cdbe70;">"""</span><span style="color: #cdbe70;">Prediction</span><span style="color: #cdbe70;">"""</span>
        <span style="color: #fb4933;">return</span> torch.<span style="color: #458588; font-style: italic;">sigmoid</span><span style="color: #458588;">(</span><span style="color: #fb4933;">self</span>.<span style="color: #458588; font-style: italic;">linear</span><span style="color: #b16286;">(</span>x<span style="color: #b16286;">)</span><span style="color: #458588;">)</span>

</pre>
</div>
</div>

<div id="outline-container-orgf57eb83" class="outline-4">
<h4 id="orgf57eb83">Loss</h4>
<div class="outline-text-4" id="text-orgf57eb83">
<p>
The loss is changed so we seperate the data, not fit the data each epoch
I first used the Cross entropy loss, but had a problem with NANs.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #fb4933;">def</span> <span style="color: #fabd2f;">criterion</span><span style="color: #458588;">(</span><span style="color: #83a598;">yhat</span>,<span style="color: #83a598;">y</span><span style="color: #458588;">)</span>:
    <span style="color: #83a598;">out</span> <span style="color: #fb4933;">=</span> <span style="color: #fb4933;">-</span><span style="color: #d3869b;">1</span> <span style="color: #fb4933;">*</span> torch.<span style="color: #458588; font-style: italic;">mean</span><span style="color: #458588;">(</span>y <span style="color: #fb4933;">*</span> torch.<span style="color: #458588; font-style: italic;">log</span><span style="color: #b16286;">(</span>yhat<span style="color: #b16286;">)</span> <span style="color: #fb4933;">+</span> <span style="color: #b16286;">(</span><span style="color: #d3869b;">1</span> <span style="color: #fb4933;">-</span> y<span style="color: #b16286;">)</span> <span style="color: #fb4933;">*</span> torch.<span style="color: #458588; font-style: italic;">log</span><span style="color: #b16286;">(</span><span style="color: #d3869b;">1</span> <span style="color: #fb4933;">-</span> yhat<span style="color: #b16286;">)</span><span style="color: #458588;">)</span>
    <span style="color: #fb4933;">return</span> out
</pre>
</div>

<p>
PyTorch's BCELoss fixes this issue by setting \(log(0) = -\infty\). See the <a href="https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html">BCELoss documentation</a>.
</p>
<div class="org-src-container">
<pre class="src src-python"><span style="color: #83a598;">criterion</span> <span style="color: #fb4933;">=</span> nn.<span style="color: #d3869b; font-style: italic;">BCELoss</span><span style="color: #458588;">()</span>
</pre>
</div>


<div id="orgd955f34" class="figure">
<p><img src="./figs/LogReg_PyTorch.png" alt="LogReg_PyTorch.png" />
</p>
</div>
</div>
</div>

<div id="outline-container-orgdc13570" class="outline-4">
<h4 id="orgdc13570">Comments</h4>
<div class="outline-text-4" id="text-orgdc13570">
<ul class="org-ul">
<li>line does not simply separate the data as y = 0.5 would do that and not give any prediction power.</li>
</ul>
</div>
</div>
</div>
</div>

<div id="outline-container-org3c991b5" class="outline-2">
<h2 id="org3c991b5">Neural Networks for Classification</h2>
<div class="outline-text-2" id="text-org3c991b5">
</div>
<div id="outline-container-orgcf9974f" class="outline-3">
<h3 id="orgcf9974f">Single-Linear-Layer Softmax Classifier</h3>
<div class="outline-text-3" id="text-orgcf9974f">
<ul class="org-ul">
<li>Used to linearly classify between two or more classes.</li>

<li>Softmax Equation:</li>
</ul>
<p>
\[S(y_i) = \frac{exp(y_i)}{\sum exp(y_j)}\]
</p>
<ul class="org-ul">
<li>where, notably, \(S(y_i) \in [0,1]\) and \(\sum S(y_i) = 1\)</li>
</ul>
<ul class="org-ul">
<li>Softmax relies on the classic <code>argmax</code> programming function, \[\hat{y} = argmax_i(S(y_i))\]</li>

<li>Softmax uses parameter vectors where the dot product is used to classify.</li>

<li>The complicated part here is the <b>loss</b>. How to incentivize this behavior with a decent gradient for learning.</li>
</ul>
</div>

<ul class="org-ul">
<li><a id="org4e261f6"></a>Softmax in PyTorch<br />
<div class="outline-text-5" id="text-org4e261f6">
<p>
Example: <a href="./softMax_linLayer_makemore.md">./softMax_linLayer_makemore.md</a>, which can be run in VSCode with the Jupyter Notebook and Jupytext extensions or one can convert the file to a Jupyter Notebook by executing:
</p>

<div class="org-src-container">
<pre class="src src-shell"><span style="color: #458588;">jupytext</span> softMax_linLayer_makemore.md <span style="color: #d3869b;">-o</span> softMax_linLayer_makemore.ipynb
</pre>
</div>
<ul class="org-ul">
<li>may need to install first <code>pip install jupytext</code></li>
</ul>
</div>
</li>
</ul>
</div>


<div id="outline-container-org20a0785" class="outline-3">
<h3 id="org20a0785">Multi-Layer Perceptron with Embedding</h3>
<div class="outline-text-3" id="text-org20a0785">
<p>
Example <a href="./MLP_makemore.md">./MLP_makemore.md</a>, which can be run in VSCode with the Jupytext extension or converted to a Jupyter Notebook by executing:
</p>

<div class="org-src-container">
<pre class="src src-shell"><span style="color: #458588;">jupytext</span> softMax_linLayer_makemore.md <span style="color: #d3869b;">-o</span> softMax_linLayer_makemore.ipynb
</pre>
</div>
</div>
</div>
</div>


<div id="outline-container-org0cb365f" class="outline-2">
<h2 id="org0cb365f">Deep Neural Network</h2>
</div>

<div id="outline-container-org353f393" class="outline-2">
<h2 id="org353f393">Convolutional Neural Network</h2>
</div>

<div id="outline-container-orgc459def" class="outline-2">
<h2 id="orgc459def">Notes</h2>
<div class="outline-text-2" id="text-orgc459def">
</div>
<div id="outline-container-org150363b" class="outline-3">
<h3 id="org150363b">argmax example:</h3>
<div class="outline-text-3" id="text-org150363b">
<ul class="org-ul">
<li>Find three functions, on for each class, where the function that corresponds to each class has the largest value in the region where the class resides.
<ul class="org-ul">
<li>Then <code>argmax</code> is used to retrieve the class designation.</li>
</ul></li>

<li><p>
\(z0 = - x\),  \(z1 = 1\), and \(z2 = x -1\) and \(f(x) = [z0(x), z1(x), z2(x)]\),
</p>
<ul class="org-ul">
<li>class 0 for \(x \in (-\infty, -1)\)</li>
<li>class 1 for \(x \in (-1, 2)\)</li>
</ul>
<ul class="org-ul">
<li><p>
class 2 for \(x \in (2, \infty)\)
</p>

<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">


<colgroup>
<col  class="org-left" />

<col  class="org-right" />

<col  class="org-right" />

<col  class="org-right" />

<col  class="org-right" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">&#xa0;</th>
<th scope="col" class="org-right">z0</th>
<th scope="col" class="org-right">z1</th>
<th scope="col" class="org-right">z2</th>
<th scope="col" class="org-right">\(\hat{y}\)</th>
</tr>

<tr>
<th scope="col" class="org-left">arg</th>
<th scope="col" class="org-right">0</th>
<th scope="col" class="org-right">1</th>
<th scope="col" class="org-right">2</th>
<th scope="col" class="org-right">argmax</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left">f(-5)</td>
<td class="org-right">10</td>
<td class="org-right">1</td>
<td class="org-right">-6</td>
<td class="org-right">0</td>
</tr>

<tr>
<td class="org-left">f(1)</td>
<td class="org-right">-1</td>
<td class="org-right">1</td>
<td class="org-right">0</td>
<td class="org-right">1</td>
</tr>

<tr>
<td class="org-left">f(4)</td>
<td class="org-right">-4</td>
<td class="org-right">1</td>
<td class="org-right">3</td>
<td class="org-right">2</td>
</tr>
</tbody>
</table></li>
</ul></li>
</ul>
</div>
</div>

<div id="outline-container-org4d29278" class="outline-3">
<h3 id="org4d29278">Definitions</h3>
<div class="outline-text-3" id="text-org4d29278">
</div>
<div id="outline-container-orgec7de92" class="outline-4">
<h4 id="orgec7de92">Regularization</h4>
<div class="outline-text-4" id="text-orgec7de92">
<p>
Regularization is "any modification to a deep learning algorithm which is intended to decrease its generalization error but not its training error". ~Goodfellow <i>et. al.</i>
</p>
</div>
</div>
</div>

<div id="outline-container-org23a6850" class="outline-3">
<h3 id="org23a6850">PyTorch Modules</h3>
<div class="outline-text-3" id="text-org23a6850">
</div>
<div id="outline-container-org1b72699" class="outline-4">
<h4 id="org1b72699">nn</h4>
</div>

<div id="outline-container-org4f1ab15" class="outline-4">
<h4 id="org4f1ab15">torchvision.transforms</h4>
</div>

<div id="outline-container-orga849511" class="outline-4">
<h4 id="orga849511">torchvision.datasets</h4>
</div>
</div>

<div id="outline-container-orgdd265e2" class="outline-3">
<h3 id="orgdd265e2">Basic outline of a script</h3>
<div class="outline-text-3" id="text-orgdd265e2">
<ol class="org-ol">
<li>Load Data</li>
<li>Create Model</li>
<li>Train Model</li>
<li>View Results</li>
</ol>
</div>
</div>

<div id="outline-container-orge67a4b9" class="outline-3">
<h3 id="orge67a4b9">Tensors</h3>
</div>
</div>
</div>
<div id="postamble" class="status">
<p class="author">Author: ape</p>
<p class="date">Created: 2023-07-15 Sat 18:07</p>
<p class="validation"><a href="https://validator.w3.org/check?uri=referer">Validate</a></p>
</div>
</body>
</html>