Skip to content

Commit aac4a49

Browse files
committed
update
1 parent bf4a9f1 commit aac4a49

File tree

8 files changed

+151
-103
lines changed

8 files changed

+151
-103
lines changed

doc/pub/week15/html/week15-bs.html

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -345,15 +345,20 @@ <h2 id="what-are-diffusion-models" class="anchor">What are diffusion models? </h
345345
<h2 id="problems-with-probabilistic-models" class="anchor">Problems with probabilistic models </h2>
346346

347347
<p>Historically, probabilistic models suffer from a tradeoff between two
348-
conflicting objectives: \textit{tractability} and
349-
\textit{flexibility}. Models that are \textit{tractable} can be
348+
conflicting objectives: <em>tractability</em> and
349+
<em>flexibility</em>. Models that are <em>tractable</em> can be
350350
analytically evaluated and easily fit to data (e.g. a Gaussian or
351351
Laplace). However, these models are unable to aptly describe structure
352-
in rich datasets. On the other hand, models that are \textit{flexible}
352+
in rich datasets. On the other hand, models that are <em>flexible</em>
353353
can be molded to fit structure in arbitrary data. For example, we can
354354
define models in terms of any (non-negative) function \( \phi(\boldsymbol{x}) \)
355-
yielding the flexible distribution \( p\left(\boldsymbol{x}\right) =
356-
\frac{\phi\left(\boldsymbol{x} \right)}{Z} \), where \( Z \) is a normalization
355+
yielding the flexible distribution
356+
</p>
357+
$$
358+
p\left(\boldsymbol{x}\right) =\frac{\phi\left(\boldsymbol{x} \right)}{Z},
359+
$$
360+
361+
<p>where \( Z \) is a normalization
357362
constant. However, computing this normalization constant is generally
358363
intractable. Evaluating, training, or drawing samples from such
359364
flexible models typically requires a very expensive Monte Carlo
@@ -674,7 +679,7 @@ <h2 id="interpretations" class="anchor">Interpretations </h2>
674679
<h2 id="the-last-term" class="anchor">The last term </h2>
675680

676681
<ul>
677-
<li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a \textit{consistency term}; it endeavors to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
682+
<li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a <em>consistency term</em>; it attempts to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
678683
</ul>
679684
<!-- !split -->
680685
<h2 id="diffusion-models-part-2-from-url-https-arxiv-org-abs-2208-11970" class="anchor">Diffusion models, part 2, from <a href="https://arxiv.org/abs/2208.11970" target="_self"><tt>https://arxiv.org/abs/2208.11970</tt></a> </h2>

doc/pub/week15/html/week15-reveal.html

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -270,15 +270,22 @@ <h2 id="what-are-diffusion-models">What are diffusion models? </h2>
270270
<h2 id="problems-with-probabilistic-models">Problems with probabilistic models </h2>
271271

272272
<p>Historically, probabilistic models suffer from a tradeoff between two
273-
conflicting objectives: \textit{tractability} and
274-
\textit{flexibility}. Models that are \textit{tractable} can be
273+
conflicting objectives: <em>tractability</em> and
274+
<em>flexibility</em>. Models that are <em>tractable</em> can be
275275
analytically evaluated and easily fit to data (e.g. a Gaussian or
276276
Laplace). However, these models are unable to aptly describe structure
277-
in rich datasets. On the other hand, models that are \textit{flexible}
277+
in rich datasets. On the other hand, models that are <em>flexible</em>
278278
can be molded to fit structure in arbitrary data. For example, we can
279279
define models in terms of any (non-negative) function \( \phi(\boldsymbol{x}) \)
280-
yielding the flexible distribution \( p\left(\boldsymbol{x}\right) =
281-
\frac{\phi\left(\boldsymbol{x} \right)}{Z} \), where \( Z \) is a normalization
280+
yielding the flexible distribution
281+
</p>
282+
<p>&nbsp;<br>
283+
$$
284+
p\left(\boldsymbol{x}\right) =\frac{\phi\left(\boldsymbol{x} \right)}{Z},
285+
$$
286+
<p>&nbsp;<br>
287+
288+
<p>where \( Z \) is a normalization
282289
constant. However, computing this normalization constant is generally
283290
intractable. Evaluating, training, or drawing samples from such
284291
flexible models typically requires a very expensive Monte Carlo
@@ -629,7 +636,7 @@ <h2 id="interpretations">Interpretations </h2>
629636
<h2 id="the-last-term">The last term </h2>
630637

631638
<ul>
632-
<p><li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a \textit{consistency term}; it endeavors to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
639+
<p><li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a <em>consistency term</em>; it attempts to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
633640
</ul>
634641
</section>
635642

doc/pub/week15/html/week15-solarized.html

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -294,15 +294,20 @@ <h2 id="what-are-diffusion-models">What are diffusion models? </h2>
294294
<h2 id="problems-with-probabilistic-models">Problems with probabilistic models </h2>
295295

296296
<p>Historically, probabilistic models suffer from a tradeoff between two
297-
conflicting objectives: \textit{tractability} and
298-
\textit{flexibility}. Models that are \textit{tractable} can be
297+
conflicting objectives: <em>tractability</em> and
298+
<em>flexibility</em>. Models that are <em>tractable</em> can be
299299
analytically evaluated and easily fit to data (e.g. a Gaussian or
300300
Laplace). However, these models are unable to aptly describe structure
301-
in rich datasets. On the other hand, models that are \textit{flexible}
301+
in rich datasets. On the other hand, models that are <em>flexible</em>
302302
can be molded to fit structure in arbitrary data. For example, we can
303303
define models in terms of any (non-negative) function \( \phi(\boldsymbol{x}) \)
304-
yielding the flexible distribution \( p\left(\boldsymbol{x}\right) =
305-
\frac{\phi\left(\boldsymbol{x} \right)}{Z} \), where \( Z \) is a normalization
304+
yielding the flexible distribution
305+
</p>
306+
$$
307+
p\left(\boldsymbol{x}\right) =\frac{\phi\left(\boldsymbol{x} \right)}{Z},
308+
$$
309+
310+
<p>where \( Z \) is a normalization
306311
constant. However, computing this normalization constant is generally
307312
intractable. Evaluating, training, or drawing samples from such
308313
flexible models typically requires a very expensive Monte Carlo
@@ -621,7 +626,7 @@ <h2 id="interpretations">Interpretations </h2>
621626
<h2 id="the-last-term">The last term </h2>
622627

623628
<ul>
624-
<li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a \textit{consistency term}; it endeavors to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
629+
<li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a <em>consistency term</em>; it attempts to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
625630
</ul>
626631
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
627632
<h2 id="diffusion-models-part-2-from-url-https-arxiv-org-abs-2208-11970">Diffusion models, part 2, from <a href="https://arxiv.org/abs/2208.11970" target="_blank"><tt>https://arxiv.org/abs/2208.11970</tt></a> </h2>

doc/pub/week15/html/week15.html

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -371,15 +371,20 @@ <h2 id="what-are-diffusion-models">What are diffusion models? </h2>
371371
<h2 id="problems-with-probabilistic-models">Problems with probabilistic models </h2>
372372

373373
<p>Historically, probabilistic models suffer from a tradeoff between two
374-
conflicting objectives: \textit{tractability} and
375-
\textit{flexibility}. Models that are \textit{tractable} can be
374+
conflicting objectives: <em>tractability</em> and
375+
<em>flexibility</em>. Models that are <em>tractable</em> can be
376376
analytically evaluated and easily fit to data (e.g. a Gaussian or
377377
Laplace). However, these models are unable to aptly describe structure
378-
in rich datasets. On the other hand, models that are \textit{flexible}
378+
in rich datasets. On the other hand, models that are <em>flexible</em>
379379
can be molded to fit structure in arbitrary data. For example, we can
380380
define models in terms of any (non-negative) function \( \phi(\boldsymbol{x}) \)
381-
yielding the flexible distribution \( p\left(\boldsymbol{x}\right) =
382-
\frac{\phi\left(\boldsymbol{x} \right)}{Z} \), where \( Z \) is a normalization
381+
yielding the flexible distribution
382+
</p>
383+
$$
384+
p\left(\boldsymbol{x}\right) =\frac{\phi\left(\boldsymbol{x} \right)}{Z},
385+
$$
386+
387+
<p>where \( Z \) is a normalization
383388
constant. However, computing this normalization constant is generally
384389
intractable. Evaluating, training, or drawing samples from such
385390
flexible models typically requires a very expensive Monte Carlo
@@ -698,7 +703,7 @@ <h2 id="interpretations">Interpretations </h2>
698703
<h2 id="the-last-term">The last term </h2>
699704

700705
<ul>
701-
<li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a \textit{consistency term}; it endeavors to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
706+
<li> \( \mathbb{E}_{q(\boldsymbol{x}_{t-1}, \boldsymbol{x}_{t+1}|\boldsymbol{x}_0)}\left[D_{KL}(q(\boldsymbol{x}_{t}|\boldsymbol{x}_{t-1})\vert\vert p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}|\boldsymbol{x}_{t+1}))\right] \) is a <em>consistency term</em>; it attempts to make the distribution at \( \boldsymbol{x}_t \) consistent, from both forward and backward processes. That is, a denoising step from a noisier image should match the corresponding noising step from a cleaner image, for every intermediate timestep; this is reflected mathematically by the KL Divergence. This term is minimized when we train \( p_{\theta}(\boldsymbol{x}_t|\boldsymbol{x}_{t+1}) \) to match the Gaussian distribution \( q(\boldsymbol{x}_t|\boldsymbol{x}_{t-1}) \).</li>
702707
</ul>
703708
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
704709
<h2 id="diffusion-models-part-2-from-url-https-arxiv-org-abs-2208-11970">Diffusion models, part 2, from <a href="https://arxiv.org/abs/2208.11970" target="_blank"><tt>https://arxiv.org/abs/2208.11970</tt></a> </h2>
0 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)