Skip to content

Commit a1c9a35

Browse files
committed
as promised, more
1 parent b6774b9 commit a1c9a35

13 files changed

+454
-0
lines changed

assignments/2021_A1.pdf

228 KB
Binary file not shown.

assignments/2021_A1_solutions.pdf

337 KB
Binary file not shown.

assignments/2021_A2.pdf

217 KB
Binary file not shown.

assignments/2021_A2.tex

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
\documentclass[a4paper,10pt, notitlepage]{report}
2+
\usepackage{geometry}
3+
\geometry{verbose,tmargin=30mm,bmargin=25mm,lmargin=25mm,rmargin=25mm}
4+
\usepackage[utf8]{inputenc}
5+
\usepackage[sectionbib]{natbib}
6+
\usepackage{amssymb}
7+
\usepackage{amsmath}
8+
\usepackage{enumitem}
9+
\usepackage{xcolor}
10+
\usepackage{cancel}
11+
\usepackage{mathtools}
12+
\usepackage{caption}
13+
\usepackage{subcaption}
14+
\usepackage{float}
15+
\PassOptionsToPackage{hyphens}{url}\usepackage{hyperref}
16+
\hypersetup{colorlinks=true,citecolor=blue}
17+
18+
19+
\newtheorem{thm}{Theorem}
20+
\newtheorem{lemma}[thm]{Lemma}
21+
\newtheorem{proposition}[thm]{Proposition}
22+
\newtheorem{remark}[thm]{Remark}
23+
\newtheorem{defn}[thm]{Definition}
24+
25+
%%%%%%%%%%%%%%%%%%%% Notation stuff
26+
\newcommand{\pr}{\operatorname{Pr}} %% probability
27+
\newcommand{\vr}{\operatorname{Var}} %% variance
28+
\newcommand{\rs}{X_1, X_2, \ldots, X_n} %% random sample
29+
\newcommand{\irs}{X_1, X_2, \ldots} %% infinite random sample
30+
\newcommand{\rsd}{x_1, x_2, \ldots, x_n} %% random sample, realised
31+
\newcommand{\bX}{\boldsymbol{X}} %% random sample, contracted form (bold)
32+
\newcommand{\bx}{\boldsymbol{x}} %% random sample, realised, contracted form (bold)
33+
\newcommand{\bT}{\boldsymbol{T}} %% Statistic, vector form (bold)
34+
\newcommand{\bt}{\boldsymbol{t}} %% Statistic, realised, vector form (bold)
35+
\newcommand{\emv}{\hat{\theta}}
36+
\DeclarePairedDelimiter\ceil{\lceil}{\rceil}
37+
\DeclarePairedDelimiter\floor{\lfloor}{\rfloor}
38+
39+
% Title Page
40+
\title{Exam 2 (A2)}
41+
\author{Class: Bayesian Statistics \\ Instructor: Luiz Max Carvalho}
42+
\date{02/06/2021}
43+
44+
\begin{document}
45+
\maketitle
46+
47+
\textbf{Turn in date: until 16/06/2021 at 23:59h Brasilia Time.}
48+
49+
\begin{center}
50+
\fbox{\fbox{\parbox{1.0\textwidth}{\textsf{
51+
\begin{itemize}
52+
\item Please read through the whole exam before starting to answer;
53+
\item State and prove all non-trivial mathematical results necessary to substantiate your arguments;
54+
\item Do not forget to add appropriate scholarly references~\textit{at the end} of the document;
55+
\item Mathematical expressions also receive punctuation;
56+
\item You can write your answer to a question as a point-by-point response or in ``essay'' form, your call;
57+
\item Please hand in a single, \textbf{typeset} ( \LaTeX) PDF file as your final main document.
58+
Code appendices are welcome,~\textit{in addition} to the main PDF document.
59+
\item You may consult any sources, provided you cite \textbf{ALL} of your sources (books, papers, blog posts, videos);
60+
\item You may use symbolic algebra programs such as Sympy or Wolfram Alpha to help you get through the hairier calculations, provided you cite the tools you have used.
61+
\item The exam is worth 100 %$\min\left\{\text{your\:score}, 100\right\}$
62+
marks.
63+
\end{itemize}}
64+
}}}
65+
\end{center}
66+
% \newpage
67+
% \section*{Hints}
68+
% \begin{itemize}
69+
% \item a
70+
% \item b
71+
% \end{itemize}
72+
%
73+
\newpage
74+
75+
\section*{Background}
76+
77+
This exam covers applications, namely estimation, prior sensitivity and prediction.
78+
You will need a working knowledge of basic computing tools, and knowledge of MCMC is highly valuable.
79+
Chapter 6 in \cite{Robert2007} gives an overview of computational techniques for Bayesian statistics.
80+
81+
\section*{Inferring population sizes -- theory}
82+
83+
Consider the model
84+
\begin{equation*}
85+
x_i \sim \operatorname{Binomial}(N, \theta),
86+
\end{equation*}
87+
with \textbf{both} $N$ and $\theta$ unknown and suppose one observes $\boldsymbol{x} = \{x_1, x_2, \ldots, x_K\}$.
88+
Here, we will write $\xi = (N, \theta)$.
89+
90+
\begin{enumerate}[label=\alph*)]
91+
\item (10 marks) Formulate a hierarchical prior ($\pi_1$) for $N$, i.e., elicit $F$ such that $N \mid \alpha \sim F(\alpha)$ and $\alpha \sim \Pi_A$.
92+
Justify your choice;
93+
\item (5 marks) Using the prior from the previous item, write out the full joint posterior kernel for all unknown quantities in the model, $p_1(\xi \mid \boldsymbol{x})$. \textit{Hint:} do not forget to include the appropriate indicator functions!;
94+
\item (5 marks) Is your model identifiable?
95+
\item (5 marks) Exhibit the marginal posterior density for $N$, $p_1(N \mid \boldsymbol{x})$;
96+
\item (5 marks) Return to point (a) above and consider an alternative, uninformative prior structure for $\xi$, $\pi_2$.
97+
Then, derive $p_2(N \mid \boldsymbol{x})$;
98+
\item (10 marks) Formulate a third prior structure on $\xi$, $\pi_3$, that allows for the closed-form marginalisation over the hyperparameters $\alpha$ -- see (a) -- and write out $p_3(N \mid \boldsymbol{x})$;
99+
\item (10 marks) Show whether each of the marginal posteriors considered is proper.
100+
Then, derive the posterior predictive distribution, $g_i(\tilde{x} \mid \boldsymbol{x})$, for each of the posteriors considered ($i = 1, 2, 3$).
101+
\item (5 marks) Consider the loss function
102+
\begin{equation}
103+
\label{eq:relative_loss}
104+
L(\delta(\boldsymbol{x}), N) = \left(\frac{\delta(\boldsymbol{x})-N}{N} \right)^2.
105+
\end{equation}
106+
Derive the Bayes estimator under this loss.
107+
\end{enumerate}
108+
109+
\section*{Inferring population sizes -- practice}
110+
Consider the problem of inferring the population sizes of major herbivores~\citep{Carroll1985}.
111+
In the first case, one is interested in estimating the number of impala (\textit{Aepyceros melampus}) herds in the Kruger National Park, in northeastern South Africa.
112+
In an initial survey collected the following numbers of herds: $\boldsymbol{x}_{\text{impala}} = \{15, 20, 21, 23, 26\}$.
113+
Another scientific question is the number of individual waterbuck (\textit{Kobus ellipsiprymnus}) in the same park.
114+
The observed numbers of waterbuck in separate sightings were $\boldsymbol{x}_{\text{waterbuck}} = \{53, 57, 66, 67, 72\}$ and may be regarded (for simplicity) as independent and identically distributed.
115+
116+
\begin{figure}[H]
117+
\centering
118+
\begin{subfigure}[b]{0.45\textwidth}
119+
\centering
120+
\includegraphics[scale=0.75]{figures/impala.jpeg}
121+
\caption{Impala}
122+
\end{subfigure}
123+
\begin{subfigure}[b]{0.45\textwidth}
124+
\centering
125+
\includegraphics[scale=0.75]{figures/waterbuck.jpeg}
126+
\caption{Waterbuck}
127+
\end{subfigure}
128+
\caption{Two antelope species whose population sizes we want to estimate.}
129+
\label{fig:antelopes}
130+
\end{figure}
131+
132+
133+
\begin{enumerate}[label=\alph*)]
134+
\setcounter{enumi}{8}
135+
\item (20 marks) For each data set, sketch the marginal posterior distributions $p_1(N \mid \boldsymbol{x})$, $p_2(N \mid \boldsymbol{x})$ and $p_3(N \mid \boldsymbol{x})$.
136+
Moreover, under each posterior, provide (i) the Bayes estimator under quadratic loss and under the loss in (\ref{eq:relative_loss}) and (ii) a 95\% credibility interval for $N$.
137+
Discuss the differences and similarities between these distributions and estimates: do the prior modelling choices substantially impact the final inferences? If so, how?
138+
\item (25 marks) Let $\bar{x} = K^{-1}\sum_{k =1}^K x_k$ and $s^2 = K^{-1}\sum_{k =1}^K (x_k-\bar{x})^2$.
139+
For this problem, a sample is said to be \textit{stable} if $\bar{x}/s^2 \geq (\sqrt{2} + 1)/\sqrt{2}$ and \textit{unstable} otherwise.
140+
Devise a simple method of moments estimator (MME) for $N$.
141+
Then, using a Monte Carlo simulation, compare the MME to the three Bayes estimators under quadratic loss in terms of relative mean squared error.
142+
How do the Bayes estimators compare to MME in terms of the statibility of the generated samples?
143+
\textit{Hint}: You may want to follow the simulation setup of~\cite{Carroll1985}.
144+
\end{enumerate}
145+
146+
\bibliographystyle{apalike}
147+
\bibliography{a2}
148+
\end{document}

assignments/2024_A1.tex

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
\documentclass[a4paper,10pt, notitlepage]{report}
2+
\usepackage[utf8]{inputenc}
3+
\usepackage{natbib}
4+
\usepackage{amssymb}
5+
\usepackage{amsmath}
6+
\usepackage{enumitem}
7+
\usepackage{dsfont}
8+
\usepackage{xcolor}
9+
\usepackage{url}
10+
\usepackage{cancel}
11+
\usepackage{mathtools}
12+
\usepackage{newclude}
13+
\usepackage{booktabs}
14+
\usepackage[normalem]{ulem}
15+
16+
%%%%%%%%%%%%%%%%%%%% Notation stuff
17+
\newcommand{\pr}{\operatorname{Pr}} %% probability
18+
\newcommand{\vr}{\operatorname{Var}} %% variance
19+
\newcommand{\rs}{X_1, X_2, \ldots, X_n} %% random sample
20+
\newcommand{\irs}{X_1, X_2, \ldots} %% infinite random sample
21+
\newcommand{\rsd}{x_1, x_2, \ldots, x_n} %% random sample, realised
22+
\newcommand{\bX}{\boldsymbol{X}} %% random sample, contracted form (bold)
23+
\newcommand{\bx}{\boldsymbol{x}} %% random sample, realised, contracted form (bold)
24+
\newcommand{\bT}{\boldsymbol{T}} %% Statistic, vector form (bold)
25+
\newcommand{\bt}{\boldsymbol{t}} %% Statistic, realised, vector form (bold)
26+
\newcommand{\emv}{\hat{\theta}}
27+
\DeclarePairedDelimiter\ceil{\lceil}{\rceil}
28+
\DeclarePairedDelimiter\floor{\lfloor}{\rfloor}
29+
\DeclareMathOperator*{\argmax}{arg\,max}
30+
\DeclareMathOperator*{\argmin}{arg\,min}
31+
32+
\DeclareRobustCommand{\bbone}{\text{\usefont{U}{bbold}{m}{n}1}}
33+
\DeclareMathOperator{\EX}{\mathbb{E}} %% Expected Value
34+
35+
%%%%
36+
\newif\ifanswers
37+
\answerstrue % comment out to hide answers
38+
39+
% Title Page
40+
\title{Fist exam (A1)}
41+
\author{Class: Bayesian Statistics \\ Instructor: Luiz Max Carvalho \\ TA: Isaque Pim}
42+
\date{22 May 2024}
43+
44+
\begin{document}
45+
\maketitle
46+
47+
\begin{center}
48+
\fbox{\fbox{\parbox{1.0\textwidth}{\textsf{
49+
\begin{itemize}
50+
\item You have 4 (four) hours to complete the exam;
51+
\item Please read through the whole exam before you start giving your answers;
52+
\item Answer all questions briefly;
53+
\item Clealy mark your final answer with a square, circle or preferred geometric figure;
54+
\item The exam is worth $\min\left\{\text{your\:score}, 100\right\}$ marks.
55+
\item You can bring \textbf{\underline{one} ``cheat sheet''} A4 both sides, which must be turned in together with your answers.
56+
\end{itemize}}
57+
}}}
58+
\end{center}
59+
60+
\newpage
61+
62+
\section*{1. I like 'em short.}
63+
64+
For a prior distribution $\pi$, a set $C_x$ is said to be an
65+
$\alpha$-credible set if $$P^\pi (\theta \in C_x |x) \geq 1-\alpha.$$
66+
This region is called an HPD $\alpha$-credible region (for highest posterior density) if it can be written in the form:
67+
\begin{equation*}
68+
\{\theta; \pi(\theta|x) > k_{\alpha}\} \subset C_x^\pi \subset \{\theta; \pi(\theta|x) \geq k_{\alpha}\},
69+
\end{equation*}
70+
where $k_{\alpha}$ is the largest bound such that
71+
$P^\pi (\theta \in C_x^\alpha |x) \geq 1-\alpha$.
72+
This construction is motivated by the fact that they minimise the volume among $\alpha$-credible regions.
73+
A special and important case are \textit{HPD intervals}, when $C_x$ is an interval $(a, b)$.
74+
75+
\begin{enumerate}[label=\alph*)]
76+
\item (20 marks) Show that if the posterior density (i) is unimodal and (ii) never uniform for all intervals of ($1 - \alpha$) probability mass of $\Omega$, then the HPD region is an interval and it is unique.
77+
78+
\textbf{Hint:} formulate a minimisation problem on two variables $a$ and $b$ with a probability restriction and solve for the Lagrangian.
79+
80+
\item (20 marks) We can also use decision-theoretical criteria to pick between credible intervals.
81+
A first idea is to balance between the volume of the region and coverage guarantees through the loss function $$L(\theta, C) = \operatorname{vol}(C) + \mathds{1}_{C^c}(\theta).$$
82+
Explain why the above loss is problematic.
83+
\item * (20 bonus marks) Define the new loss function $$L^*(\theta, C) = g\left(\operatorname{vol}(C)\right) + \mathds{1}_{C^c}(\theta),$$
84+
where $g$ is increasing and $0 \leq g(t) \leq 1$ for all $t$. Show that the Bayes estimator $C^\pi_x$ for $L^*$ is a HPD region.
85+
\end{enumerate}
86+
\ifanswers
87+
\nocite{*}
88+
\include*{sol1}
89+
\fi
90+
91+
\section*{2. Savage!}
92+
93+
We will now study the case of point hypothesis testing as a case of two nested models.
94+
Let $\theta_0 \in \Omega_0 \subset \Omega$.
95+
We want to compare model $M_0: \theta = \theta_0$ to $M_1: \theta \in \Omega$.
96+
That is, under model $M_1$, $\theta$ can vary freely.
97+
Assume further that the models are \textit{properly nested}, that is,
98+
$$P(x | \theta, M_0) = P(x | \theta = \theta_0, M_1).$$
99+
100+
\begin{enumerate}[label=\alph*)]
101+
\item (25 marks) Given observed data $x$, show that the Bayes Factor $\operatorname{BF_{01}}$ can be written as
102+
\begin{equation*}
103+
\operatorname{BF_{01}} = \frac{p(\theta_0| x, M_1)}{p(\theta_0|M_1)},
104+
\end{equation*}
105+
where the numerator is the posterior under $M_1$ and the denominator the prior probability under $M_1$.
106+
\item (25 marks) Apply the result from part (a) to the problem of testing whether a coin is fair.
107+
Specifically, we want to compare $H_0: \theta = 0.5$ against $H_1: \theta \neq 0.5$, where theta is the probability of the coin landing heads.
108+
Given $n=24$ trials and $x = 3$ heads and employing a uniform prior on $\theta$, calculate the Bayes factor $\operatorname{BF_{01}}$.
109+
Based on the Bayes factor, would you prefer $H_0$ over $H_1$? How strong should the prior be for a change in this preference?
110+
\end{enumerate}
111+
\textbf{Note}: The ratio above is called the \textit{Savage-Dickey} ratio. It provides a straightforward way to compute Bayes factors, which can be more intuitive and less computationally intensive than other methods.
112+
\ifanswers
113+
\include*{sol2}
114+
\fi
115+
116+
\section*{3. Hey, you're biased! }
117+
118+
Let $\bX = (\rs)$ be a random sample from an $\operatorname{Exponential}(\theta)$ distribution with $\theta > 0$ and common density $f(x \mid \theta) = \theta^{-1}\exp(-x/\theta)\mathbb{I}(x > 0)$ w.r.t. the Lebesgue measure on $\mathbb{R}$.
119+
120+
\begin{enumerate}[label=\alph*)]
121+
\item (10 marks) Find a conjugate prior for $\theta$;
122+
\item (20 marks) Exhibit the Bayes estimator under quadratic loss for $\theta$, $\delta_B(\bX)$;
123+
\item (10 marks) Show that the bias $\delta_B(\bX)$ is $O(n^{-1})$.
124+
\item $\ast$ (10 bonus marks) Show how to obtain the uniformly minimum variance unbiased estimator (UMVUE) from $\delta_B(\bX)$ by taking limits of the hyperparameters.
125+
\end{enumerate}
126+
127+
\ifanswers
128+
\include*{sol3}
129+
\fi
130+
131+
\bibliographystyle{apalike}
132+
\bibliography{refs}
133+
134+
\end{document}

assignments/2024_A1_solutions.pdf

275 KB
Binary file not shown.
4.7 KB
Binary file not shown.
5.36 KB
Binary file not shown.
4.82 KB
Binary file not shown.

assignments/refs.bib

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
@article{Dickey1971,
2+
title={The weighted likelihood ratio, linear hypotheses on normal location parameters},
3+
author={Dickey, James M},
4+
journal={The Annals of Mathematical Statistics},
5+
pages={204--223},
6+
year={1971},
7+
publisher={JSTOR}
8+
}
9+
10+
@book{Shao2003,
11+
title={Mathematical Statistics},
12+
author={Shao, Jun},
13+
year={2003},
14+
publisher={Springer Science \& Business Media}
15+
}
16+
@book{Robert2007,
17+
title={The Bayesian choice: from decision-theoretic foundations to computational implementation},
18+
author={Robert, Christian P and others},
19+
volume={2},
20+
year={2007},
21+
publisher={Springer}
22+
}

0 commit comments

Comments
 (0)