irw/getstarted.qmd at main · datapages/irw · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
title: "Getting Started"
---

IRW datasets are hosted on [Redivis](https://redivis.com), a platform for academic data sharing and analysis. There are a few different ways you can access IRW data—whether you’re just browsing, downloading tables, or working programmatically in R or Python.

NOTE: While you can explore datasets without logging in, a Redivis account is required to download data or use programmatic tools. You can create an account [here](https://redivis.com/?createAccount).

The sections below walk through each access option and include example code to help you get started.

## Option 1: Browse and Download from the Redivis Web Interface

If you're just looking to explore or manually download individual datasets, you can:

-   View datasets through the [IRW Data Browser](data.qmd)
-   Download tables as CSV files from the [Redivis web interface](https://redivis.com/datasets/as2e-cv7jb41fd/tables) (Redivis login required).

This is the simplest way to get started if you don’t need programmatic access.

## Option 2: Use the `irw` packages for R or Python (Recommended)

We recommend the `irw` packages for R or Python if you want a simple, streamlined way to access IRW data.

You can find setup instructions and function references for the [R package](https://itemresponsewarehouse.github.io/Rpkg/) and the [Python package](https://github.com/itemresponsewarehouse/Python-pkg).

On first use, you’ll be prompted to log in with your Redivis account and grant access via OAuth (or use an API token). This authentication step is required once per session to securely connect to Redivis. For details, see the "Redivis Authentication" section in the package documentation.

### Setup & Example Usage

::: panel-tabset
## R

```{r}
#| eval: false
#| echo: true
# Install package with `devtools::install_github("itemresponsewarehouse/Rpkg")`
library(irw)

irw_info()                  # Overview of the IRW
irw_list_tables()          # List available tables
irw_filter(var = "rt")     # Search for tables with a specific variable
df <- irw_fetch("4thgrade_math_sirt")
```

## Python

```{python}
#| eval: false
#| echo: true
#| python.reticulate: false
# Install package with`pip install "git+https://github.com/itemresponsewarehouse/Python-pkg.git"`
import irw

irw.info()                  # Overview of the IRW
irw.list_tables()          # List available tables
irw.filter(var = "rt")     # Search for tables with a specific variable
df = irw.fetch("4thgrade_math_sirt")
```
:::

## Option 3: Use Redivis Client Libraries (R or Python)

If you prefer working outside of R or want low-level access to Redivis features, you can use the official Redivis client libraries. These are available for both [R](https://apidocs.redivis.com/client-libraries/redivis-r) and [Python](https://apidocs.redivis.com/client-libraries/redivis-python).

### Example access to IRW with Redivis Client Libraries:

::: panel-tabset
## R (redivis-r)

```{r}
#| eval: false
#| echo: true
# first install redivis R package with `devtools::install_github("redivis/redivis-r", ref="main")`
library("redivis")

dataset <- redivis::user("datapages")$dataset("item_response_warehouse") # connect to IRW
df <- dataset$table("4thgrade_math_sirt")$to_tibble() # download data
```

## Python (redivis-python)

```{python}
#| eval: false
#| echo: true
#| python.reticulate: false
# first install redivis Python package with `pip install --upgrade redivis`
import redivis

dataset = redivis.user('datapages').dataset('item_response_warehouse') # connect to IRW
df = dataset.table('4thgrade_math_sirt').to_pandas_dataframe() # download data
```
:::

### How to use the Redivis Client Libraries

There are two main ways to use the Redivis client libraries:

1.  Use a Redivis Notebook

-   Redivis notebooks come preloaded with the latest library -- no installation or authentication required. Ideal for first-time users or lightweight workflows.
-   We also provide some example workflows with IRW in Redivis notebooks [here](https://redivis.com/workspace/studies/1812/workflows).

2.  Use in Other Environments (e.g., RStudio, Jupyter, Colab, etc.)

-   Requires installing the appropriate client library (see example code above for installation)
-   You will need to authenticate with your Redivis account (First-time use will prompt browser login for OAuth), or you may also use API tokens for authentication for long-running jobs (see [here](https://apidocs.redivis.com/rest-api/authorization) for more information about how to generate and set up your API token).

For more detailed setup and usage examples, see the full Redivis R and Python client documentation here:

-   [Redivis R Client documentation](https://apidocs.redivis.com/client-libraries/redivis-r)
-   [Redivis Python Client documentation](https://apidocs.redivis.com/client-libraries/redivis-python)

# Analysis of IRW data

We next provide some examples for working with IRW data. The below code blocks import multiple datasets from the IRW and compute some simple metadata (e.g., the number of responses). This should be a useful starting point for conducting your own analyses of the data.

### A first analysis

::: panel-tabset
## R

```{r}
#| echo: true

library(irw)
library(dplyr)
library(purrr)


compute_metadata <- function(df) {
  df <- df |> filter(!is.na(resp)) |> mutate(resp = as.numeric(resp))
  tibble(
    n_responses = nrow(df),
    n_categories = n_distinct(df$resp),
    n_participants = n_distinct(df$id),
    n_items = n_distinct(df$item),
    responses_per_participant = n_responses / n_participants,
    responses_per_item = n_responses / n_items,
    density = (sqrt(n_responses) / n_participants) * (sqrt(n_responses) / n_items)
  )
}

dataset_names <- c("4thgrade_math_sirt", "chess_lnirt", "dd_rotation")
tables<-irw::irw_fetch(dataset_names)
summaries_list <- lapply(tables,compute_metadata)
summaries <- bind_rows(summaries_list)
summaries<-cbind(table=dataset_names,summaries)
summaries
```

## Python

```{python}
#| eval: false
#| echo: true
#| python.reticulate: false

import pandas as pd
from math import sqrt
import redivis

dataset_names = ["4thgrade_math_sirt", "chess_lnirt", "dd_rotation"]

def compute_metadata(df):
    df = (df
          .loc[~df['resp'].isna()]
          .assign(resp=pd.to_numeric(df['resp']))
         )

    return pd.DataFrame({
        'n_responses': [len(df)],
        'n_categories': [df['resp'].nunique()],
        'n_participants': [df['id'].nunique()],
        'n_items': [df['item'].nunique()],
        'responses_per_participant': [len(df) / df['id'].nunique()],
        'responses_per_item': [len(df) / df['item'].nunique()],
        'density': [(sqrt(len(df)) / df['id'].nunique()) * (sqrt(len(df)) / df['item'].nunique())]
    })

dataset = redivis.user('datapages').dataset('item_response_warehouse')
def get_data_summary(dataset_name):
  df = pd.DataFrame(dataset.table(dataset_name).to_pandas_dataframe())

  summary = compute_metadata(df)
  summary.insert(0, 'dataset_name', dataset_name)
  return summary

summaries_list = [get_data_summary(name) for name in dataset_names]
summaries = pd.concat(summaries_list, ignore_index=True)
print(summaries)
```
:::