Skip to content

Latest commit

 

History

History
76 lines (60 loc) · 4.68 KB

File metadata and controls

76 lines (60 loc) · 4.68 KB

RMACC Visualization Project

Overview

Interactive US map visualization showing RMACC (Rocky Mountain Advanced Computing Consortium) member institutions with an overlay of cross-institutional NSF OAC grants between them.

Project Files

rmacc_nsf_collector.py

Python script (stdlib only, Python 3.7+) that queries the NSF Awards API for OAC-related grants at all 41 RMACC member institutions.

Usage:

python3 rmacc_nsf_collector.py --export          # Collect + export JSON
python3 rmacc_nsf_collector.py --export-only      # Just export from existing DB
python3 rmacc_nsf_collector.py --db custom.db     # Custom DB path

What it does:

  1. Initializes SQLite DB with RMACC institution data (41 members with lat/lon coordinates)
  2. Queries NSF Awards API for each institution using OAC-related keywords: "cyberinfrastructure", "high-performance computing", "HPC"
  3. Resolves NSF awardeeName variants to RMACC abbreviations via NAME_ALIASES dict
  4. Identifies "Collaborative Research:" grants shared across institutions by matching normalized title suffixes
  5. Exports rmacc_grants.json for the visualization

Database schema:

  • institutions — 41 RMACC members with is_rmacc flag (extensible for non-RMACC partners)
  • grants — with agency, office, program columns (extensible beyond NSF OAC)
  • grant_institutions — junction table with role column (awardee vs collaborator)
  • v_cross_institutional — view auto-identifying grants shared by 2+ RMACC members

Known issues (needs debugging):

  • Script has errors when run locally — needs troubleshooting
  • GROUP_CONCAT(DISTINCT i.name) was previously fixed (SQLite doesn't support separator with DISTINCT) — verify this is still correct
  • NSF API fundProgramName=OAC returns zero results; the script uses keyword-based search instead which is the correct approach

rmacc_members_map.html

Interactive D3.js + TopoJSON visualization. Currently uses hardcoded grant data (16 grants found via web search).

Features:

  • D3 US map with all 41 RMACC members plotted
  • Color-coded dots by institution type (R1, University, Community College, Federal Lab, etc.)
  • Curved arc lines connecting institutions that share grants
  • Sidebar with clickable grant cards that highlight connections on the map
  • Filter buttons: All Grants, CC* Awards, OAC Core/CRII, Multi-Institution Only
  • Legend with toggleable institution types
  • Collision nudging for the dense Denver metro cluster
  • Hover tooltips on dots and grant lines

Key design decisions:

  • Columbia College is the Denver campus (lat: 39.7400, lon: -104.9870, state: "CO"), NOT Columbia, Missouri
  • Dark theme (#0a0e1a background)
  • Uses CDN-hosted D3 v7.8.5 and TopoJSON v3.0.2

rmacc_grants.db

SQLite database (generated by the collector script). May be empty or partially populated depending on script run state.

rmacc_grants.json (not yet generated)

JSON export from the collector script. Will contain institutions, grants, cross_institutional, and summary sections.

Next Steps

  1. Fix rmacc_nsf_collector.py errors — debug and get it running cleanly from terminal
  2. Run the collector: python3 rmacc_nsf_collector.py --export to populate DB and generate JSON
  3. Update rmacc_members_map.html to load from rmacc_grants.json instead of hardcoded grant data:
    • Replace the hardcoded grants array with a fetch('rmacc_grants.json') call
    • Map the JSON structure (which uses institutions array per grant with abbr, name, role) to the current format (which uses flat institutions array of abbreviation strings)
    • Update stats bar to use summary from JSON
    • The members array can stay hardcoded (or be loaded from JSON institutions)
    • Note: loading from JSON requires serving via HTTP (file:// won't work with fetch) — consider adding a simple python3 -m http.server instruction, or embedding the JSON inline

Architecture Notes

  • Institution name resolution is critical — NSF uses variant names (e.g., "University of Colorado at Boulder" vs "University of Colorado Boulder"). The NAME_ALIASES dict maps these. New variants may appear and need to be added.
  • SEARCH_NAMES defines what to query the NSF API for each institution. Some institutions need multiple search terms (e.g., INL searches both "Idaho National Laboratory" and "Battelle Energy Alliance").
  • Collaborative Research detection groups grants by normalized title suffix (everything after "Collaborative Research:") to link awards at different institutions that are part of the same project.
  • The DB schema is designed to be extensible for future grant types beyond NSF OAC — the agency, office, and program columns support this.