simpriord/README.Rmd at master · mncube/simpriord · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# simpriord

<!-- badges: start -->
<!-- badges: end -->

Using Bayesian models with informative priors is a common strategy to deal with the limitations of
statistical analysis with small sample size.  One way to incorporate prior information into the model is by binding the data set from a previous study to the study's main data set.  The goal of this package is to streamline
a workflow for running simulations to assess the performance of parameter estimation when the main and previous data
come from different populations.

This document first goes over software setup and then it outlines the general workflow of the package.

## Installation

You can install the development version of simpriord from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("mncube/simpriord")
```

## Load Package

```{r example}
library(simpriord)
```

## Workflow for Assessing Parameter Estimation

Step 1: Create the main data sets and previous study data sets using the definitions for simglm's sim_args parameter.

Note: This example closely follows the data generating mechanism from the Generate Response Variable section of simglm's Tidy Simulation vignette (https://cran.r-project.org/web/packages/simglm/vignettes/tidy_simulation.html)

```{r}
#Define data generating process for the main data's population
args_main <- list(
  formula = y ~ 1 + weight + age + sex,
  fixed = list(weight = list(var_type = 'continuous', mean = 0, sd = 1),
               age = list(var_type = 'ordinal', levels = 30:60),
               sex = list(var_type = 'factor', levels = c('male', 'female'))),
  error = list(variance = 25),
  sample_size = 15,
  reg_weights = c(2, 0.3, -0.1, 0.5))

#Define data generating process for the previous data's population
args_prior <- list(
  formula = y ~ 1 + weight + age + sex,
  fixed = list(weight = list(var_type = 'continuous', mean = 0, sd = 2),
               age = list(var_type = 'ordinal', levels = 30:60),
               sex = list(var_type = 'factor', levels = c('male', 'female'))),
  error = list(variance = 25),
  sample_size = 10,
  reg_weights = c(2, 0.4, -0.2, 0.5))
```


Step 2: Use the sim_sglm_dfs function to generate 25 dataframes from the lists defined in Step 1

```{r}
#Get dataframes
grv_out <- sim_sglm_dfs(data_m = NULL,
                        sim_args_m = args_main,
                        data_p = NULL,
                        sim_args_p = args_prior,
                        with_prior = 1,
                        frames = 25,
                        mod_name = list("mod1"))
```


grv_out is the list object returned from sim_sglm_dfs.  The object contains the dataframes and the data generating information

```{r}
#Extract head and tail of the first dataframe
head(grv_out$df_mod[[1]])
tail(grv_out$df_mod[[1]])

#Extract data generating information for the first dataframe's main data
grv_out$df_info[[1]]$main

#Extract data generating information for the first dataframe's prior data
grv_out$df_info[[1]]$prior
```


Step 3: Use the sim_sglm_run function to fit the model specified in Step 1 extended to include a prior data indicator as a fixed effect:

y ~ 1 + weight + age + sex + prior

The model was fit to each dataset in grv_out$df_mod using brms with default priors.

Note: To save time on computation this example set is using a small number of dataframes (with frames = 25 in step 2) and
a small number of iterations (iter = 200).

```{r}
#Fit data to model (Code commented out in this chunk and run with include=FALSE in the chunk below for cleaner presentation)
#sglm_mod_fits <- sim_sglm_run(df_list = list(grv_out), iter = 200)
```


```{r include=FALSE}
#Fit data to model
sglm_mod_fits <- sim_sglm_run(df_list = list(grv_out), iter = 200)
```


sim_sglm_run returns a list object of parameter estimates.  Extract parameter estimates from grv_out$df_mod first dataframe as follows

```{r}
sglm_mod_fits$fits[[1]][[1]]
```


Step 4: Use the sim_sglm_comp function to assess parameter estimation based on the dissimilarity between the main data's population and the previous data's population.

```{r}
#Get comparisons
sglm_comps <- sim_sglm_comp(fit_obj = sglm_mod_fits)
```


sim_sglm_comp returns the assessment of the parameter estimates in sglm_mod_fits when compared to the main data's population parameters in Step 1's reg_weights on arg_main's list

```{r}
sglm_comps[["main"]][[1]][,c("term", "parameter", "bias", "bias_mcse",
                             "mse", "mse_mcse", "covered")]
```


The function also returns the parameter estimates compared to the previous population's parameters

```{r}
sglm_comps[["prior"]][[1]][,c("term", "parameter", "bias", "bias_mcse",
                             "mse", "mse_mcse", "covered")]
```