This repo contains the materials for a hands-on workshop on Hugging Face datasets and models.
-
Navigate the Hugging Face ecosystem
- Explore Spaces, tasks, datasets, and models on the Hub
- Read dataset and model cards
- Understand why Hugging Face formats are well suited for open source
- Parquet for datasets
- Safetensors for model weights
- Work with the
datasetsandtransformerslibraries
-
Use Hugging Face MCP with Claude Code or Codex
- Access Claude through a Harvard billing account
- Access Codex with your HUID
- Explore MCP-powered workflows
-
Run models in minutes
- Prepare the Restor dataset with
datasets - Perform inference with
transformers - Compare NVIDIA SegFormer and TCD-SegFormer on a test sample
- Prepare the Restor dataset with
-
Build an end-to-end training loop
- Fine-tune NVIDIA SegFormer on the Restor dataset
- Track evaluation metrics
- Briefly explore SAM (Segment Anything Model) as a modern foundation model for segmentation
export ANTHROPIC_BEDROCK_BASE_URL=https://apis.huit.harvard.edu/ais-bedrock-llm/v2
export ANTHROPIC_API_KEY=$(cat api_key.txt)
export ANTHROPIC_SMALL_FAST_MODEL=us.anthropic.claude-opus-4-5-20251101-v1:0
export CLAUDE_CODE_SKIP_BEDROCK_AUTH=1
export CLAUDE_CODE_USE_BEDROCK=1
If you want to run the notebooks locally:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt