NPEX

NASA Project Exploration & eXtraction enables to search and explore thousands of NASA's groundbreaking scientific and technological projects. Powered by hybrid search combining traditional keywords with AI-powered semantic understanding.

Search Engine

The core feature is a web interface for searching and exploring NASA projects. It provides:

Smart Search: Hybrid retrieval combining keyword matching and semantic similarity
Embedding Options: Choose between OpenAI (3072-dim) or MiniLM (384-dim) embeddings
Multiple Search Modes: Keyword-only, enhanced keyword, semantic, or hybrid ranking
Rich Metadata: Projects include taxonomy classification, facilities, partners, and technology readiness levels
Responsive Design: Built with Next.js and React for fast, intuitive browsing

Switching Between Embeddings

The search API (web/app/api/search/route.ts) is configured to easily switch between embedding providers:

// To use OpenAI: Uncomment the OpenAI section and comment out MiniLM
// To use MiniLM: Keep MiniLM active (default)

// OpenAI section (commented by default)
/*
async function getEmbedding(text: string): Promise<number[]> {
    // OpenAI implementation
}
*/

// MiniLM section (active by default)
async function getEmbedding(text: string): Promise<number[]> {
    // MiniLM implementation with Transformers.js
}

Search Modes

Keyword: Standard Solr keyword search
Keyword+: Enhanced keyword search with semantic boosts
Semantic: Pure vector similarity search
Hybrid: Application-side RRF (Reciprocal Rank Fusion) combining vector and keyword search

Running the Web Application

# From the project root (pri-stellar folder)
docker-compose up -d

Note

The Docker setup includes both the web application and Solr search engine. Make sure to configure the appropriate schema in docker-compose.yml and ensure the JSON data is properly loaded for the chosen embedding model (OpenAI or MiniLM).

To switch between embedding models, update the solr-init service in docker-compose.yml:

For MiniLM: Use schema-hybrid-final-MiniLM.json and output_techport_embeddings_MiniLM.json
For OpenAI: Change to schema-hybrid-final-OpenAI.json and output_techport_embeddings_OpenAI.json

Note

For OpenAI embeddings, ensure OPENAI_API_KEY is set in web/.env.local. MiniLM works locally without API keys.

Note

Before running docker-compose up -d, ensure that contacts.json and organizations.json are present in the database/extracted_data/ folder (they may need to be unzipped from an archive). These files populate the MongoDB database. Also ensure the appropriate output_techport_embeddings_*.json file is present in the data/ folder based on your chosen embedding model (MiniLM or OpenAI).

Pipeline (ETL)

scraping → facilities merge → taxonomies merge → final document

E (Extract) → Scrape TechPort data to JSON
T (Transform) → Merge with facilities and clean data
L (Load) → Merge with taxonomies to create Solr-ready JSON file

How to run

Prerequisites

uv (recommended) or Python 3.8+
Chrome browser (for web scraping)
Required data files (see Data Setup below)

Quick Start (Recommended)

# Run complete pipeline (auto-setup + validation + ETL)
make all

Step-by-Step Commands

# 1. Setup environment and install dependencies
make setup

# 2. Validate data files exist
make validate-data

# 3. Run individual pipeline steps
make extract                 # Step 1: Scrape TechPort data
make transform-facilities    # Step 2: Merge facilities data  
make transform-taxonomies    # Step 3: Merge taxonomies data

# Or run all steps at once
make all

Data Setup

Option 1: Individual Downloads

Download and place these files in the data/ folder:

NASA_TechPort_rows.csv
- Download: https://catalog.data.gov/dataset/nasa-techport-3d05e
NASA_Facilities_rows.csv
- Download: https://catalog.data.gov/dataset/nasa-facilities
NASA_Taxonomies.xlsx
- Download: https://techport.nasa.gov/taxonomy/8817
- ⚠️ Important: Rename downloaded file from XXXX-XX-XX TechPort Taxonomies Export.xlsx to NASA_Taxonomies.xlsx
- The pipeline will auto-convert Excel to CSV

Option 2: Bulk Download

If you have a datasets.zip file in data folder containing all CSV files (version: October 2025):

# Place datasets.zip in data/ folder, then:
make unzip-datasets  # Extract all files
make all            # Run pipeline

Development Commands

Validation & Analysis

make status          # Show environment and file status
make stats           # Show pipeline statistics
make validate-output # Validate final JSON output

Cleanup

make clean-output    # Remove generated pipeline files
make clean-all      # Remove everything (data + environment)

Troubleshooting

# Force reinstall requirements
make reinstall-requirements

# Restart from specific step
make restart-from-facilities   # Re-run facilities + taxonomies merge
make restart-from-taxonomies   # Re-run only taxonomies merge
make restart-solr              # Re-run Solr transformation
make restart-database          # Re-run database extraction

# Run full Solr + database extraction
make solr-pipeline             # Run both Solr and database extraction sequentially

# Check all validations
make check

Need Help?

make help  # Show all available commands and usage

Manual Setup (Alternative)

If you prefer not to use the Makefile:

Using uv (recommended)

# Create virtual environment
uv venv

# Install dependencies  
uv pip install -r requirements.txt

# Run pipeline manually
uv run scraping.py
uv run merge_facilities.py
uv run merge_taxonomies.py

Using pip

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  

# Install dependencies
pip install -r requirements.txt

# Run pipeline manually
python scraping.py
python merge_facilities.py  
python merge_taxonomies.py

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
characterization		characterization
data		data
database		database
docs		docs
evaluation		evaluation
images		images
solr		solr
web		web
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-info.md		docker-info.md
merge_facilities.py		merge_facilities.py
merge_taxonomies.py		merge_taxonomies.py
requirements.txt		requirements.txt
scraping.py		scraping.py
transform.py		transform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPEX

Search Engine

Switching Between Embeddings

Search Modes

Running the Web Application

Pipeline (ETL)

How to run

Prerequisites

Quick Start (Recommended)

Step-by-Step Commands

Data Setup

Option 1: Individual Downloads

Option 2: Bulk Download

Development Commands

Validation & Analysis

Cleanup

Troubleshooting

Need Help?

Manual Setup (Alternative)

Using uv (recommended)

Using pip

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NPEX

Search Engine

Switching Between Embeddings

Search Modes

Running the Web Application

Pipeline (ETL)

How to run

Prerequisites

Quick Start (Recommended)

Step-by-Step Commands

Data Setup

Option 1: Individual Downloads

Option 2: Bulk Download

Development Commands

Validation & Analysis

Cleanup

Troubleshooting

Need Help?

Manual Setup (Alternative)

Using uv (recommended)

Using pip

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages