Skip to content

Commit 97e8d9e

Browse files
jeremymanningclaude
andcommitted
Add FRFR (Feature-Rich Free Recall) dataset with tutorial
- Add frfr.egg dataset: 452 subjects across 11 experimental conditions investigating how different word features affect memory organization - Update load_example_data() to support 'frfr' dataset - Add plot_frfr_data.py tutorial demonstrating SPC, PFR, Lag-CRP, and memory fingerprint analyses by condition and early/late lists Features in FRFR data: category, color, location, size, firstLetter, wordLength, temporal, condition (experiment type), list_type (early/late) Conditions: feature-rich, category, color, length, first-letter, location, size, adaptive, reduced, reduced-early, reduced-late Reference: Heusser, Fitzpatrick & Manning (2018). bioRxiv. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 4b5d851 commit 97e8d9e

3 files changed

Lines changed: 163 additions & 8 deletions

File tree

examples/plot_frfr_data.py

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
=============================
4+
Analyze Feature-Rich Free Recall (FRFR) Data
5+
=============================
6+
7+
This example demonstrates analyzing the Feature-Rich Free Recall (FRFR) dataset,
8+
which investigates how different word features affect memory organization during
9+
free recall. The dataset contains 452 subjects across 11 experimental conditions,
10+
each varying which word features were made salient during encoding.
11+
12+
Experimental conditions:
13+
- feature-rich: All features varied (color, location, category, size, etc.)
14+
- category: Only category information varied
15+
- color: Only color information varied
16+
- length: Only word length varied
17+
- first-letter: Only first letter varied
18+
- location: Only spatial location varied
19+
- size: Only semantic size varied
20+
- adaptive: Features adapted based on participant performance
21+
- reduced: Minimal feature variation
22+
- reduced-early: Reduced features in early lists
23+
- reduced-late: Reduced features in late lists
24+
25+
Each subject studied 16 lists of 16 words. Lists 1-8 are considered "early" lists
26+
and lists 9-16 are considered "late" lists.
27+
28+
We'll analyze recall performance using:
29+
1. Serial Position Curve (SPC) - recall probability by encoding position
30+
2. Probability of First Recall (PFR) - probability of recalling each position first
31+
3. Lag-CRP - conditional recall probability by temporal lag
32+
4. Memory Fingerprint - clustering by multiple features
33+
34+
Reference:
35+
Heusser, A.C., Fitzpatrick, P.C., & Manning, J.R. (2018). How is experience
36+
transformed into memory? bioRxiv. https://doi.org/10.1101/409987
37+
38+
"""
39+
40+
# Code source: Contextual Dynamics Laboratory
41+
# License: MIT
42+
43+
from collections import Counter
44+
45+
import quail
46+
import matplotlib.pyplot as plt
47+
import warnings
48+
49+
# Suppress RuntimeWarnings about empty slices
50+
warnings.filterwarnings('ignore', category=RuntimeWarning)
51+
52+
# Load the FRFR dataset
53+
egg = quail.load_example_data('frfr')
54+
55+
print(f"Loaded FRFR data: {egg.n_subjects} subjects, {egg.n_lists} lists, "
56+
f"{egg.list_length} items per list")
57+
58+
# Build subjgroup: map each subject to its experimental condition
59+
subjgroup = []
60+
for subj_idx in range(egg.n_subjects):
61+
try:
62+
sample = egg.pres.loc[(subj_idx, 0)][0]
63+
if sample and 'condition' in sample:
64+
subjgroup.append(sample['condition'])
65+
else:
66+
subjgroup.append('unknown')
67+
except (KeyError, IndexError, TypeError):
68+
subjgroup.append('unknown')
69+
70+
# Count subjects per condition
71+
condition_counts = Counter(subjgroup)
72+
print("\nSubjects per condition:")
73+
for cond, count in sorted(condition_counts.items()):
74+
print(f" {cond}: {count}")
75+
76+
# Build per-subject listgroups: early (lists 0-7) vs late (lists 8-15)
77+
# Each subject has their own listgroup since we want to compare early vs late
78+
# within each condition
79+
listgroup = []
80+
for subj_idx in range(egg.n_subjects):
81+
subj_listgroup = []
82+
for list_idx in range(egg.n_lists):
83+
if list_idx < 8:
84+
subj_listgroup.append('early')
85+
else:
86+
subj_listgroup.append('late')
87+
listgroup.append(subj_listgroup)
88+
89+
# Create a listgroup for averaging all lists together (for fingerprint)
90+
listgroup_average = ['average'] * egg.n_lists
91+
92+
# Create figure with 2x2 subplots
93+
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
94+
95+
# 1. Serial Position Curve - by condition, colored by early/late
96+
print("\nAnalyzing Serial Position Curves...")
97+
spc = egg.analyze('spc', listgroup=listgroup)
98+
spc.plot(ax=axes[0, 0], subjgroup=subjgroup, plot_type='subject', legend=True)
99+
axes[0, 0].set_title('Serial Position Curve by Condition (Early vs Late)')
100+
axes[0, 0].set_xlabel('Serial Position')
101+
axes[0, 0].set_ylabel('Recall Probability')
102+
axes[0, 0].set_ylim([0, 1])
103+
# Move legend outside plot
104+
axes[0, 0].legend(loc='upper right', fontsize=7, ncol=2)
105+
106+
# 2. Probability of First Recall - by condition, early/late
107+
print("Analyzing Probability of First Recall...")
108+
pfr = egg.analyze('pfr', listgroup=listgroup)
109+
pfr.plot(ax=axes[0, 1], subjgroup=subjgroup, plot_type='subject', legend=False)
110+
axes[0, 1].set_title('Probability of First Recall by Condition')
111+
axes[0, 1].set_xlabel('Serial Position')
112+
axes[0, 1].set_ylabel('Probability')
113+
axes[0, 1].set_ylim([0, 0.25])
114+
115+
# 3. Lag-CRP - by condition, early/late
116+
print("Analyzing Lag-CRP...")
117+
lagcrp = egg.analyze('lagcrp', listgroup=listgroup)
118+
lagcrp.plot(ax=axes[1, 0], subjgroup=subjgroup, plot_type='subject', legend=False)
119+
axes[1, 0].set_title('Lag-CRP by Condition')
120+
axes[1, 0].set_xlabel('Lag')
121+
axes[1, 0].set_ylabel('Conditional Recall Probability')
122+
axes[1, 0].set_xlim([-10, 10])
123+
axes[1, 0].axvline(x=0, color='gray', linestyle='--', alpha=0.5)
124+
125+
# 4. Memory Fingerprint - by available features
126+
# Note: color and location are list-type features that require special handling
127+
print("Analyzing Memory Fingerprints...")
128+
fingerprint_features = ['category', 'size', 'wordLength', 'firstLetter', 'temporal']
129+
fingerprint = egg.analyze('fingerprint', features=fingerprint_features,
130+
listgroup=listgroup_average)
131+
fingerprint.plot(ax=axes[1, 1], subjgroup=subjgroup, plot_type='subject',
132+
title='Memory Fingerprint by Condition', ylim=[0, 1])
133+
axes[1, 1].set_xlabel('Feature')
134+
axes[1, 1].set_ylabel('Clustering Score')
135+
# No legend here since we already have one in SPC plot
136+
137+
plt.tight_layout()
138+
plt.suptitle('Feature-Rich Free Recall (FRFR) Dataset Analysis', y=1.02, fontsize=14)
139+
plt.savefig('frfr_analysis.png', dpi=150, bbox_inches='tight')
140+
plt.show()
141+
142+
print("\nAnalysis complete! Saved plot to frfr_analysis.png")

quail/data/frfr.egg

32.6 MB
Binary file not shown.

quail/load.py

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -545,15 +545,25 @@ def load_example_data(dataset='automatic'):
545545
Conditions: LL10-2s, LL15-2s, LL20-1s, LL20-2s, LL30-1s, LL40-1s.
546546
Features include: item, temporal (serial position), list_length, rate, and condition.
547547
548+
The frfr example data contains behavioral data from a series of free recall experiments
549+
investigating how different word features affect memory organization. The dataset contains
550+
452 subjects across 11 experimental conditions: feature-rich (all features varied),
551+
category, color, length, first-letter, location, size, adaptive, reduced, reduced-early,
552+
and reduced-late. Each subject studied 16 lists of 16 words. Features include: item,
553+
category, color, location, size, firstLetter, wordLength, temporal (serial position),
554+
condition (experiment type), and list_type (early or late lists).
555+
Reference: Heusser, A.C., Fitzpatrick, P.C., & Manning, J.R. (2018). How is experience
556+
transformed into memory? bioRxiv. https://doi.org/10.1101/409987
557+
548558
Parameters
549559
----------
550560
dataset : str
551-
The dataset to load. Can be 'automatic', 'manual', 'naturalistic', 'cmr', or 'murd62'.
552-
The free recall audio recordings for the 'automatic' dataset was transcribed by Google
553-
Cloud Speech and the 'manual' dataset was transcribed by humans. The 'naturalistic'
554-
dataset was transcribed by humans and transformed as described above. The 'cmr'
555-
dataset is from Polyn, Norman & Kahana (2009). The 'murd62' dataset is from
556-
Murdock (1962).
561+
The dataset to load. Can be 'automatic', 'manual', 'naturalistic', 'cmr', 'murd62',
562+
or 'frfr'. The free recall audio recordings for the 'automatic' dataset was transcribed
563+
by Google Cloud Speech and the 'manual' dataset was transcribed by humans. The
564+
'naturalistic' dataset was transcribed by humans and transformed as described above.
565+
The 'cmr' dataset is from Polyn, Norman & Kahana (2009). The 'murd62' dataset is from
566+
Murdock (1962). The 'frfr' dataset is from Heusser, Fitzpatrick & Manning (2018).
557567
558568
Returns
559569
----------
@@ -562,15 +572,18 @@ def load_example_data(dataset='automatic'):
562572
"""
563573

564574
# can only be auto or manual
565-
assert dataset in ['automatic', 'manual', 'naturalistic', 'cmr', 'murd62'], \
566-
"Dataset can only be automatic, manual, naturalistic, cmr, or murd62"
575+
assert dataset in ['automatic', 'manual', 'naturalistic', 'cmr', 'murd62', 'frfr'], \
576+
"Dataset can only be automatic, manual, naturalistic, cmr, murd62, or frfr"
567577

568578
if dataset == 'cmr':
569579
# open cmr egg (Polyn et al. 2009 data)
570580
egg = Egg(**joblib.load(os.path.dirname(os.path.abspath(__file__)) + '/data/cmr.egg'))
571581
elif dataset == 'murd62':
572582
# open murd62 egg (Murdock 1962 data)
573583
egg = Egg(**joblib.load(os.path.dirname(os.path.abspath(__file__)) + '/data/murd62.egg'))
584+
elif dataset == 'frfr':
585+
# open frfr egg (Heusser et al. 2018 feature-rich free recall data)
586+
egg = Egg(**joblib.load(os.path.dirname(os.path.abspath(__file__)) + '/data/frfr.egg'))
574587
elif dataset == 'naturalistic':
575588
# open naturalistic egg
576589
egg = Egg(**joblib.load(os.path.dirname(os.path.abspath(__file__)) + '/data/' + dataset + '.egg'))

0 commit comments

Comments
 (0)