You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This document tracks all changes made to the book chapters.
4
+
5
+
| file | line | text before | text after |
6
+
|------|------|-------------|------------|
7
+
| introduction/getting-started.qmd | 141 | The fact that that the estimate | The fact that the estimate |
8
+
| introduction/getting-started.qmd | 176 | with`DeclareDesign`| with `DeclareDesign`|
9
+
| declaration-diagnosis-redesign/diagnosing-designs.qmd | 112 | Examples of iagnosands | Examples of diagnosands |
10
+
| declaration-diagnosis-redesign/diagnosing-designs.qmd | 345 | The mean squared error is highest on the left and lowest in the middle. | The mean squared error is highest on the left and lowest in the middle. @fig-ch10num5 visualizes this. |
11
+
| introduction/what-is-a-research-design.qmd | 233 | in 60% of our the models | in 60% of the models |
12
+
| introduction/what-is-a-research-design.qmd | 281 | intact and, but we have to work | intact, but we have to work |
13
+
| introduction/what-is-a-research-design.qmd | 320 | not sigificant | not significant |
14
+
| introduction/what-is-a-research-design.qmd | 348 | Finally with look your designs | Finally, looking at your designs, they |
Copy file name to clipboardExpand all lines: declaration-diagnosis-redesign/defining-inquiry.qmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ In this book when we talk about inquiries, we will usually be referring to singl
17
17
18
18
While most inquiries are "atomic" in this way, some inquiries are more complex than a single-number summary. For example, the best linear predictor of $Y$ given $X$ is a two-number summary: it is the pair of numbers (the slope and intercept) that minimizes the total squared distance between the line and each value of $Y$. No need to stop at two-number summaries though. We could imagine the best quadratic predictor of $Y$ given $X$ (a three-number summary), and so on. We could have an inquiry that is the full conditional expectation function of $Y$ given $X$, no matter how wiggly, nonlinear, and nuanced the shape of that function. It could in principle be a 1,000-number summary of the model, or something still more complex.
19
19
20
-
The inquiry could be constituted by a series of interrelated questions about the model. Indeed the goal of the research may be to generate or to test a model of the world.^[Here, we are referring to an "inquiry model" not a "reference model," as discussed in @sec-ch2s1ss1. We provide an example of this type of inquiry model in @sec-ch19s2.] For instance, a researcher might articulate a handful of important questions about the model that all have to come out a certain way or the model itself should be rejected. These complex inquires are made up of a series of atomic inquiries. We're interested in the sub-inquiries only insofar as they help us understand the real inquiry---is this model of the world a good one or not?
20
+
The inquiry could be constituted by a series of interrelated questions about the model. Indeed the goal of the research may be to generate or to test a model of the world.^[Here, we are referring to an "inquiry model" not a "reference model," as discussed in @sec-ch2s1ss1. We provide an example of this type of inquiry model in @sec-ch19s2.] For instance, a researcher might articulate a handful of important questions about the model that all have to come out a certain way or the model itself should be rejected. These complex inquiries are made up of a series of atomic inquiries. We're interested in the sub-inquiries only insofar as they help us understand the real inquiry---is this model of the world a good one or not?
Copy file name to clipboardExpand all lines: declaration-diagnosis-redesign/diagnosing-designs.qmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -109,7 +109,7 @@ As described above, a diagnostic statistic is any summary function of $a_m$ and
109
109
| Robustness | Joint probability of rejecting the null hypothesis across multiple tests ||
110
110
| Maximum proportion of subjects harmed || $\max{\Pr(\mathrm{harm})}$ |
111
111
112
-
: Examples of iagnosands. {#tbl-diagnosands}
112
+
: Examples of diagnosands. {#tbl-diagnosands}
113
113
114
114
115
115
@@ -342,7 +342,7 @@ Often, we need to look at several diagnosands in order to understand what might
342
342
343
343
Many research design decisions involve trading off bias and variance. In trade-off settings, we may need to accept higher variance in order to decrease bias. Likewise, we may need to accept a bit of bias in order to achieve lower variance. The trade-off is captured by mean-squared error, which is the average squared distance between $a_d$ and $a_m$. Of course, we would ideally like to have as low a mean-squared error as possible, that is, we would like to achieve low variance and low bias simultaneously.
344
344
345
-
To illustrate, consider the following three designs as represented by three targets. The inquiry is the bullseye of the target. The data and answer strategies combine to generate a process by which arrows are shot toward the target. On the left, we have a very bad archer: even though the estimates are unbiased in the sense that they hit the bullseye "on average", very few of the arrows are on target. In the middle, we have an excellent shot: they are both on target and low variance. On the right, we have an archer who is very consistent (low variance) but biased. The mean squared error is highest on the left and lowest in the middle.
345
+
To illustrate, consider the following three designs as represented by three targets. The inquiry is the bullseye of the target. The data and answer strategies combine to generate a process by which arrows are shot toward the target. On the left, we have a very bad archer: even though the estimates are unbiased in the sense that they hit the bullseye "on average", very few of the arrows are on target. In the middle, we have an excellent shot: they are both on target and low variance. On the right, we have an archer who is very consistent (low variance) but biased. The mean squared error is highest on the left and lowest in the middle. @fig-ch10num5 visualizes this.
346
346
347
347
{#fig-ch10num5}
Copy file name to clipboardExpand all lines: introduction/getting-started.qmd
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -138,7 +138,7 @@ diagnosis_4.1 |>
138
138
139
139
:::
140
140
141
-
The output of the diagnosis includes the diagnosand values (top row), such as bias of $-0.02$, and our uncertainty about the diagnosand value (bootstrapped standard error in parentheses in the bottom row). The uncertainty estimates tell us whether we have conducted enough simulations to precisely estimate the diagnosands. The fact that that the estimate of bias is $-0.02$ and the standard error is $0.03$ means that we cannot distinguish the amount of bias from no bias at all.
141
+
The output of the diagnosis includes the diagnosand values (top row), such as bias of $-0.02$, and our uncertainty about the diagnosand value (bootstrapped standard error in parentheses in the bottom row). The uncertainty estimates tell us whether we have conducted enough simulations to precisely estimate the diagnosands. The fact that the estimate of bias is $-0.02$ and the standard error is $0.03$ means that we cannot distinguish the amount of bias from no bias at all.
142
142
143
143
## Redesign {#redesign-function}
144
144
@@ -173,7 +173,7 @@ block_cluster_design <-
173
173
174
174
## Long term code usability
175
175
176
-
We have written the code examples with`DeclareDesign` version 1.0.0, the package version we released along with the book. We are committed to the long-term maintenance of this software, but inevitably, the evolution of the `R` ecosystem and further package development will mean that some of the printed code will break in the future.
176
+
We have written the code examples with`DeclareDesign` version 1.0.0, the package version we released along with the book. We are committed to the long-term maintenance of this software, but inevitably, the evolution of the `R` ecosystem and further package development will mean that some of the printed code will break in the future.
177
177
178
178
However, even if code eventually becomes obsolete, a virtue of writing out designs in code is that they are explicit: the entire design is encoded in the declarations we provide. Even if the code itself won't run, you can still use it to understand the design and to draw insights from the diagnosis.
Copy file name to clipboardExpand all lines: introduction/what-is-a-research-design.qmd
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -230,7 +230,7 @@ The two designs cost the same but differ on the empirical side. Which strategy s
230
230
231
231
\index{Assignment!Complete random assignment}
232
232
233
-
**M*: We first define a model that stipulates a set of 18,000 units representing each officer and an unknown treatment effect of the training lying somewhere between 0 and 0.5. This range of possible effects implies in 60% of our the models we consider, the true effect is above our threshold for a program worth implementing, 0.2. Outcomes for each individual depend on their past infractions against citizens (their history). The importance of history is captured by the parameter `b`. We don't know how important the history variable is, so we will simulate over a plausible range for `b`. *M* here is a *set* of models as each "run" of the model will presuppose a different treatment effect for all subjects as well as distinct outcomes for all individuals.
233
+
**M*: We first define a model that stipulates a set of 18,000 units representing each officer and an unknown treatment effect of the training lying somewhere between 0 and 0.5. This range of possible effects implies in 60% of the models we consider, the true effect is above our threshold for a program worth implementing, 0.2. Outcomes for each individual depend on their past infractions against citizens (their history). The importance of history is captured by the parameter `b`. We don't know how important the history variable is, so we will simulate over a plausible range for `b`. *M* here is a *set* of models as each "run" of the model will presuppose a different treatment effect for all subjects as well as distinct outcomes for all individuals.
234
234
235
235
**I*: The inquiry is the difference between the average treated outcome and the average untreated outcome, which correspond to the average treatment effect. We are writing it this way to highlight the similarity between the inquiry and the difference-in-means answer strategy that we will adopt.
236
236
@@ -278,7 +278,7 @@ program_diagnosands <-
278
278
279
279
The alternative design that differs on the empirical side in three ways. First fewer subjects are sampled. Second, information about the subjects' background (their "history") is used to implement a block randomization that conditions assignment on history. Third the subjects' history is taken into account in the analysis. This last choice is an instance of adjusting the answer strategy in light of a change to the data strategy.
280
280
281
-
In @def-ch2num2, we can leave the model and inquiry intact and, but we have to work on the data and answer strategies.
281
+
In @def-ch2num2, we can leave the model and inquiry intact, but we have to work on the data and answer strategies.
282
282
283
283
::: {#def-ch2num2 .declaration}
284
284
@@ -317,7 +317,7 @@ The results are shown in @fig-ch2num2.
317
317
318
318
When background factors don't make much of a difference for the social norms outcome, the first design outperforms the second: after all, the first design has a sample size of 150 compared with the second design's 100. We're successful over 30% of the time when using the first design, compared with about 25% when using the second. These rates seem low, but recall the treatment effect variation we built into the model implies that the program is *worth* implementing only 60% of the time, because the other 40% of the time, the true effects are smaller than 0.2.
319
319
320
-
As subject history has a bigger impact on the outcome variable, however, the first design does worse and worse. In essence, the additional variation due to background factors makes it more difficult to separate signal from noise, making it more likely that our estimates are not sigificant and therefore more likely that we decline to implement the program.
320
+
As subject history has a bigger impact on the outcome variable, however, the first design does worse and worse. In essence, the additional variation due to background factors makes it more difficult to separate signal from noise, making it more likely that our estimates are not significant and therefore more likely that we decline to implement the program.
321
321
322
322
Here is where the smaller design that blocks on subject history shines: this variation is conditioned on in two places, in the assignment strategy and in the estimator. The result is a more precise procedure that is better able to separate signal from noise. Ultimately, the blocked design has the same success rate regardless of the importance of the background factors.
323
323
@@ -345,7 +345,7 @@ Planning entails some or all of the following steps, depending on the design: co
345
345
346
346
Realization is the phase of research in which all those plans are executed. We implement the data strategy in order to gather information from the world. Once that's done, we follow the answer strategy in order to finally generate answers to the inquiry. Of course, that's only if things go exactly according to plan, which they never do. Survey questions don't work as we imagine, partner organizations lose interest in our study, subjects move or become otherwise unreachable. A critic or a reviewer may insist we change our answer strategy, or may think a different inquiry altogether is theoretically more appropriate. We may ourselves change how we think of the design as we embark on writing up the research project. It is likely that some features of *MIDA* will change during the realization phase in which case you can again use diagnosis to assess whether changes to *MIDA* are for good or for bad. Some design changes have very bad properties, like sifting through the data ex-post, finding a statistically significant result, then back-fitting a new *I* to match the new *A*. Indeed, if we declare and diagnose this actual answer strategy (sifting through data ex-post), we can show through design diagnosis that it yields misleading answers. Other changes made along the way may help the design quite a bit. If the planned design did not include covariate adjustment, but a friendly critic suggests adjusting for the pretreatment measure of the outcome, the "standard error" diagnosand might drop nicely. The point here is that design changes during the implementation process, whether necessitated by unforeseen logistical constraints or required by the review process, can be understood in terms of *M*, *I*, *D*, and *A* by reconciling the planned design with the design as implemented.
347
347
348
-
A happy realization phase concludes with the publication of results. But the research design lifecycle is not finished: the study and its results should be integrated into the broader community of scientists, decision-makers, and the public. Studies should be archived, along with design information, to prepare for reanalysis. Future scholars may well want to reanalyze your data in order to learn more than is represented in the published article or book. Good reanalysis of study data requires a full understanding of the design as implemented, so archiving design information along with code and data is critical. Not only may your design be reanalyzed, it may also be replicated with fresh data. Ensuring that replication studies answer the same theoretical questions as original studies requires explicit design information without which replicators and original study authors may simply talk past one another. Indeed, as our studies are integrated into the scientific literature and beyond, we should anticipate disagreement over our claims. Resolving disputes is very difficult if parties do not share a common understanding of the research design. We might also anticipate that our results will be formally synthesized with others' work via meta-analysis. Meta-analysts need design information in order to be sure they aren't inappropriately mixing together studies that ask different questions or answer them too poorly to be of use. Finally with look your designs will be a model for others. Having an analytically complete representation of your design at hand will make it that much easier to use redesign to build on what you have done.
348
+
A happy realization phase concludes with the publication of results. But the research design lifecycle is not finished: the study and its results should be integrated into the broader community of scientists, decision-makers, and the public. Studies should be archived, along with design information, to prepare for reanalysis. Future scholars may well want to reanalyze your data in order to learn more than is represented in the published article or book. Good reanalysis of study data requires a full understanding of the design as implemented, so archiving design information along with code and data is critical. Not only may your design be reanalyzed, it may also be replicated with fresh data. Ensuring that replication studies answer the same theoretical questions as original studies requires explicit design information without which replicators and original study authors may simply talk past one another. Indeed, as our studies are integrated into the scientific literature and beyond, we should anticipate disagreement over our claims. Resolving disputes is very difficult if parties do not share a common understanding of the research design. We might also anticipate that our results will be formally synthesized with others' work via meta-analysis. Meta-analysts need design information in order to be sure they aren't inappropriately mixing together studies that ask different questions or answer them too poorly to be of use. Finally, looking at your designs, they will be a model for others. Having an analytically complete representation of your design at hand will make it that much easier to use redesign to build on what you have done.
0 commit comments