My-CodeHub/simulation and sampling.Rmd at main · Nickmutembei/My-CodeHub · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
title: "Project 1: Simulation and Sampling"
output:
  html_document:
    toc: yes
    toc_float:
      collapsed: yes
      smooth_scroll: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```


# Question 1: Confidence interval approximation assessment

Create a function called estimate_coverage (in code.R; also document the function) to perform interval
coverage estimation taking arguments CI (a function object for confidence interval construction taking
arguments y and alpha and returning a 2-element vector [see Lab 4]), N (the number of simulation replications to use for the coverage estimate), alpha (1-alpha is the intended coverage probability), n (the sample size) and lambda (the true lambda values for the Poisson model).

Use your function estimate_coverage to estimate the coverage of the construction of confidence intervals from samples Poisson(λ) and discuss the following:

```{r}
#' Function to estimate interval coverage
#' @param CI A function object for constructing confidence intervals
#' @param N The number of simulation replications to use for the coverage estimate
#' @param alpha 1-alpha is the intended coverage probability
#' @param n The sample size
#' @param lambda The true lambda values for the Poisson model
#' @return A list containing the coverage probability and its standard error
estimate_coverage <- function(CI, N, alpha, n, lambda) {
  # Initialize coverage counter
  coverage_count <- 0

  # Simulate data and compute confidence interval for each replication
  for (i in 1:N) {
    # Simulate Poisson data
    y <- rpois(n, lambda)

    # Compute confidence interval
    ci <- CI(y, alpha)

    # Check if true parameter value is within confidence interval
    if (lambda >= ci[1] && lambda <= ci[2]) {
      coverage_count <- coverage_count + 1
    }
  }

  # Compute estimated coverage probability
  coverage_prob <- coverage_count / N

  # Return estimated coverage probability
  return(coverage_prob)
}


```

• Since the model involves the discrete Poisson distribution, one might ask how sensitive these results are to the precise values of λ and n. To investigate this, run the coverage estimation for different combinations of model parameters λ and n (fix N = 10000 and α = 0.1)

```{r}
# Define a function to construct a confidence interval using the MLE estimate
mle_ci <- function(y, alpha) {
  y_bar <- mean(y)
  se <- sqrt(y_bar / length(y))
  z_alpha <- qnorm(1 - alpha / 2)
  lower <- max(0, y_bar - z_alpha * se)
  upper <- y_bar + z_alpha * se
  return(c(lower, upper))
}

# Set parameters
N <- 10000
alpha <- 0.1
lambdas <- c(1, 5, 10)
ns <- c(10, 50, 100)

# Estimate coverage for different combinations of lambda and n
for (lambda in lambdas) {
  for (n in ns) {
    coverage_prob <- estimate_coverage(mle_ci, N, alpha, n, lambda)
    cat(sprintf("lambda = %d, n = %d, coverage probability = %.3f\n", lambda, n, coverage_prob))
  }
}

```

• Present your results of estimated coverage in two plots, (1) as a function of λ for fixed n = 2, and (2) as a function of n, for fixed λ = 3.

```{r}
# Load the grid package
library(grid)

# Set parameters
N <- 10000
alpha <- 0.1
lambdas <- seq(0.1, 10, length.out = 50)
n_fixed <- 2
lambda_fixed <- 3

# Estimate coverage as a function of lambda for fixed n
coverage_lambda <- sapply(lambdas, function(lambda) {
  estimate_coverage(mle_ci, N, alpha, n_fixed, lambda)
})
```

# plot for estimated coverage probability of confidence intervals at λ = 2

```{r}
# Plot coverage as a function of lambda for fixed n
plot(lambdas, coverage_lambda, type = "l", xlab = expression(lambda), ylab = "Coverage Probability",
     main = paste0("Coverage as a function of ", " (", "n =", n_fixed, ")"))

# Estimate coverage as a function of n for fixed lambda
ns <- seq(5, 500, length.out = 50)
coverage_n <- sapply(ns, function(n) {
  estimate_coverage(mle_ci, N, alpha, n, lambda_fixed)
})
```

# plot for estimated coverage probability of confidence intervals at λ = 3

```{r}
# Plot coverage as a function of n for fixed lambda
plot(ns, coverage_n, type = "l", xlab = "n", ylab = "Coverage Probability",
     main = paste0("Coverage as a function of ", "n", " (", " =", lambda_fixed, ")"))

```
# plots Interpretation

• Discuss the plots in regards to whether the coverage of the intervals achieve the desired 90% confidence level, and if not identify under which cases and provide a suggestion as to why.

- The two plots generated by the estimate_coverage function show the coverage probability of confidence intervals for different values of λ and n. The desired coverage level is 90%, which means that the true parameter value should be contained within the confidence interval in 90% of the simulations.

The first plot shows the coverage probability as a function of λ for fixed n = 2. It can be observed that the coverage probability is lower than the desired level of 0.9 for most values of λ, except for very small values of λ. The coverage probability decreases as λ increases, and the intervals become wider. This is likely due to the fact that the Poisson distribution becomes more skewed as λ increases, and the approximation of the MLE confidence interval using a normal distribution becomes less accurate. As a result, the coverage probability is lower than 0.9, indicating that the confidence intervals are too narrow.

The second plot shows the coverage probability as a function of n for fixed λ = 3. In this case, the coverage probability is close to the desired level of 0.9 for most values of n. However, the coverage probability is slightly lower for very small sample sizes and very large sample sizes. This is likely due to the fact that the MLE estimator becomes less accurate for very small sample sizes, and the normal approximation becomes less accurate for very large sample sizes. As a result, the coverage probability is slightly lower than 0.9 for these extreme values of n.

In summary, the plots show that the coverage probability of the MLE confidence intervals for the Poisson distribution is not always achieving the desired level of 0.9. The accuracy of the MLE estimator and the normal approximation are affected by the values of λ and n, and the coverage probability is lower than 0.9 when these approximations become less accurate. Therefore, it is important to carefully choose the sample size and parameter values to ensure that the approximations are accurate and the coverage probability is close to the desired level. Additionally, alternative methods for constructing confidence intervals, such as using bootstrap or Bayesian methods, may be more appropriate in cases where the normal approximation is not accurate

# Question 2: 3D printer materials prediction

The aim is to estimate the parameters of a Bayesian statistical model of material use in a 3D printer.

```{r}
library(robustbase)
data(filament1)

```