Skip to content

Commit be288dd

Browse files
committed
Add fish and wells datasets and update show_loading
Added new Quarto markdown files for the fish and wells datasets in the data directory, providing descriptions and example analyses. Updated the show_loading function in quarto_related.R to accept the dataset directly instead of its attributes, improving usability and consistency with show_code.
1 parent d418543 commit be288dd

3 files changed

Lines changed: 113 additions & 2 deletions

File tree

R/edudat/R/quarto_related.R

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ show_code <- function(df) {
2121

2222
#' Generate R and Python code showing how to load a dataset, which is needed in quarto documents
2323
#'
24-
#' @param attributes A list containing the attributes of the dataset, including 'download_url' and 'source'
24+
#' @param df The edudat dataset, including 'download_url' and 'source' in the attributes
2525
#' @return No return value, called for side effects. Outputs formatted R and Python code for loading the data.
2626
#' @export
2727
#' @examples
@@ -32,7 +32,8 @@ show_code <- function(df) {
3232
#' attributes = attributes(df)
3333
#' show_loading(attributes(df))
3434
#' ```
35-
show_loading <- function(attributes, show_python = FALSE) {
35+
show_loading <- function(df, show_python = FALSE) {
36+
attributes <- attributes(df)
3637
r_code <- sprintf("df <- read.csv(\"%s\")", attributes$download_url)
3738
r_code2 <- sprintf("df <- edudat::load_data(\"%s\")", attributes$source)
3839

data/fish.qmd

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
title: "Fish Data"
3+
---
4+
5+
## Description
6+
The data-set contains the number of Fish a group of camper caught in a state park (taken from https://stats.idre.ucla.edu/r/dae/zip/). The task here is to predict the number of fish caught by a fishing party. We have a small data set, of 250 groups, which visited a state park and provided the following information:
7+
8+
- how many people are in the group
9+
- the number children in the group
10+
- the use of live bait
11+
- whether the group came with a camper to the park.
12+
13+
```{r, echo=FALSE, eval=FALSE}
14+
# Importing of the data
15+
df <- read.csv("https://stats.idre.ucla.edu/stat/data/fish.csv")
16+
write.csv(df, "fish.csv", row.names = FALSE)
17+
```
18+
19+
20+
21+
## Investiating the Zero Inflated Data
22+
23+
```{r plot_data, echo=TRUE, eval=TRUE}
24+
data <- edudat::load_data("fish.csv")
25+
plot(table(data$count), xlim=c(0, 10), col="lightblue", main="Histogram of Fish Caught", xlab="Number of Fish Caught")
26+
```
27+
28+
29+
## Investiating the Zero Inflated Data
30+
```{r zib, echo=TRUE, eval=FALSE}
31+
data <- edudat::load_data("fish.csv")
32+
# Creating factor variables
33+
data <- within(data, {
34+
nofish <- factor(nofish)
35+
livebait <- factor(livebait)
36+
camper <- factor(camper)
37+
})
38+
str(data)
39+
library(pscl)
40+
summary(m1 <- zeroinfl(count ~ child + camper | persons, data = data))
41+
```
42+

data/wells.qmd

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
---
2+
title: "Wells Data"
3+
---
4+
5+
## Description (from rstanarm)
6+
A survey of 3200 residents in a small area of Bangladesh suffering from arsenic contamination of groundwater.
7+
Respondents with elevated arsenic levels in their wells had been encouraged to switch their water source to a safe public or private well in the nearby area (dist meters away) and the survey was conducted several years later to learn which of the affected residents had switched wells.
8+
9+
Souce: Gelman and Hill (2007)
10+
11+
3020 obs. of 5 variables
12+
13+
- switch Indicator for well-switching
14+
15+
- arsenic Arsenic level in respondent's well
16+
17+
- dist Distance (meters) from the respondent's house to the nearest well with safe drinking water.
18+
19+
- assoc Indicator for member(s) of household participate in community organizations
20+
21+
- educ Years of education (head of household)
22+
23+
Gelman, A. and Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge, UK. https://stat.columbia.edu/~gelman/arm/
24+
25+
26+
27+
28+
```{r, echo=TRUE, eval=FALSE}
29+
# Importing of the data
30+
library(rstanarm)
31+
data(wells)
32+
?wells
33+
write.csv(wells, "wells.csv", row.names = FALSE)
34+
35+
if (FALSE){
36+
# More info
37+
install.packages("haven")
38+
library(haven)
39+
d = read_dta('/Users/oli/Downloads/ARM_Data/arsenic/all.dta')
40+
}
41+
42+
```
43+
44+
45+
46+
## Data
47+
48+
```{r plot_data, echo=TRUE, eval=TRUE}
49+
data <- edudat::load_data("wells.csv")
50+
summary(data)
51+
```
52+
53+
```{r plot1, echo=FALSE, eval=FALSE}
54+
library(ggplot2)
55+
56+
```
57+
58+
```{r plot, echo=FALSE, eval=FALSE}
59+
library(ggplot2)
60+
ggplot(wells, aes(x = dist, y = ..density.., fill = switch == 1)) +
61+
geom_density(alpha=0.5) +
62+
scale_fill_manual(values = c("gray30", "skyblue"))
63+
```
64+
So especially people which are living close to a contaminated well are more likely to switch to a safe well.
65+
66+
67+
68+

0 commit comments

Comments
 (0)