Final Project: Framing in U.S. State of the Union Speeches

DSC 161 - Text as Data
Author: Ho Lim

Project Overview

This project analyzes how U.S. presidents frame key policy domains in State of the Union (SOTU) speeches from 1790 to 2015. The analysis focuses on five main frames:

Economy: growth, jobs, taxes, trade, inflation
Security: defense, war, terrorism, foreign policy
Welfare: healthcare, education, inequality, social programs
Governance: budget, deficit, government performance, institutional reform
Values: democracy, rights, freedom, faith, family, moral rhetoric

Research Questions

How do U.S. presidents frame the nation's economy, security, welfare, governance, and values in SOTU speeches?
How does framing differ by party (Democrat vs Republican)?
How has framing changed over time (from Washington to Obama)?

Project Structure

nlp-clustering-us-speech/
├── data/
│   ├── SOTU_WithText.csv                    # Raw SOTU data (236 speeches)
│   └── ps2_q4_handcoding_sample_labeled.csv # Hand-coded labels (20 speeches)
├── R/
│   ├── 00_sample_for_handcoding.R          # (Optional) Sample speeches for additional hand-coding
│   ├── 00_run_all.R                         # Master script (runs all steps)
│   ├── 01_load_and_segment.R                # Load data, optional paragraph segmentation
│   ├── 02_preprocess_dfm.R                  # Text preprocessing with quanteda
│   ├── 03_lda_exploration.R                 # LDA topic modeling (exploratory)
│   ├── 04_handcoding_join.R                 # Join hand-coded frame labels
│   ├── 05_frame_model_or_rules.R            # Assign frames using dictionary
│   ├── 06_descriptive_plots.R               # Create visualizations
│   ├── 07_results_analysis.R                # Additional analysis (era, presidents, wartime)
│   └── 08_focused_analyses.R                # Focused analyses (modern era, key presidents)
├── results/
│   ├── figures/                              # All plots and visualizations (8 PNG files)
│   └── tables/                               # Summary statistics and tables (14 CSV files)
└── README.md                                 # This file

Note: Large RDS files (processed data) are excluded from git via .gitignore. Run the scripts to regenerate them.

Methodology

Data

Source: SOTU_WithText.csv (236 speeches, 1790-2015)
Variables: president, year, party, sotu_type, text

Preprocessing

Tokenization (remove punctuation and numbers)
Lowercase conversion
Stopword removal (English)
Feature trimming (keep words in ≥5% of documents)

Frame Assignment

Method: Dictionary-based assignment
Dictionary: Keywords for each of the 5 frame categories
Assignment rule: Frame with highest keyword count (ties → NA)

Validation

Hand-coded sample: 20 speeches from PS2 (minimum)
Optional: Additional hand-coding recommended (30-80 more speeches)
Used to validate dictionary-based assignment
Can train supervised classifier if ≥50 hand-coded speeches available

Running the Analysis

Option 0: Interactive Analysis (R Markdown)

For an interactive analysis with code and results together:

# In RStudio, open and knit:
# analysis.Rmd

This will generate an HTML report with all analysis steps and visualizations.

Step 0: (Optional) Additional Hand-Coding

If you want to add more hand-coded speeches for better validation:

source("R/00_sample_for_handcoding.R")

This will create CSV files with sampled speeches. Open them in Excel/Google Sheets, assign frame_main to each speech, and save as *_labeled.csv. The 04_handcoding_join.R script will automatically pick them up.

Recommended: Code 30-80 additional speeches (total 50-100) for better validation.

Option 1: Run all steps at once

source("R/00_run_all.R")

Option 2: Run steps individually

source("R/01_load_and_segment.R")
source("R/02_preprocess_dfm.R")
source("R/03_lda_exploration.R")
source("R/04_handcoding_join.R")      # Uses PS2 labels + any additional labels
source("R/05_frame_model_or_rules.R")
source("R/06_descriptive_plots.R")
source("R/07_results_analysis.R")     # (Optional) Additional analysis
source("R/08_focused_analyses.R")     # (Optional) Focused analyses

Required Packages

install.packages(c(
  "tidyverse",
  "quanteda",
  "topicmodels",
  "scales"  # For percentage formatting in plots
))

Results

Key Visualizations

Frame Distribution Over Time: Stacked area chart showing frame proportions across all years
Economy Frame by Party: Time series comparing Democratic vs Republican emphasis on economy
Frame Distribution by Party: Bar chart comparing frame usage across parties
Security vs Economy Over Time: Line chart tracking these two key frames (focus on smoothed trend line)
Frame Heatmap by Decade: Heatmap showing frame distribution across decades
Modern Era Analysis (1990-2015): Party differences in recent decades
Key Presidents Comparison: Individual framing strategies of modern presidents

Output Files

results/figures/: All visualization PNG files
results/tables/: CSV files with summary statistics
data/: Processed RDS files for intermediate steps

Notes

This project builds on work from PS1 (LDA topic modeling) and PS2 (frame definition and hand-coding)
Current implementation uses speech-level analysis (can be extended to paragraph-level)
Dictionary-based assignment is a simple rule-based method; supervised classification could improve accuracy

Future Improvements

Expand hand-coded sample (currently 20 speeches)
Train supervised classifier (e.g., Naive Bayes, Logistic Regression)
Move to paragraph-level analysis for finer granularity
Implement multi-label frame assignment (allow multiple frames per speech)
Refine dictionary keywords based on validation results

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
R		R
data		data
docs		docs
results		results
.gitignore		.gitignore
GITHUB_UPLOAD.md		GITHUB_UPLOAD.md
README.md		README.md
RUN_STRATEGIC_ANALYSIS.R		RUN_STRATEGIC_ANALYSIS.R
analysis.Rmd		analysis.Rmd
create_presentation_slides.R		create_presentation_slides.R
presentation.Rmd		presentation.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project: Framing in U.S. State of the Union Speeches

Project Overview

Research Questions

Project Structure

Methodology

Data

Preprocessing

Frame Assignment

Validation

Running the Analysis

Option 0: Interactive Analysis (R Markdown)

Step 0: (Optional) Additional Hand-Coding

Option 1: Run all steps at once

Option 2: Run steps individually

Required Packages

Results

Key Visualizations

Output Files

Notes

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Final Project: Framing in U.S. State of the Union Speeches

Project Overview

Research Questions

Project Structure

Methodology

Data

Preprocessing

Frame Assignment

Validation

Running the Analysis

Option 0: Interactive Analysis (R Markdown)

Step 0: (Optional) Additional Hand-Coding

Option 1: Run all steps at once

Option 2: Run steps individually

Required Packages

Results

Key Visualizations

Output Files

Notes

Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages