From 0548e4ba8c435440f2700168e66f4ba1e7695c59 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Tue, 16 Jun 2026 23:10:00 +0000 Subject: [PATCH] Add dev environment setup: requirements.txt, AGENTS.md, .gitignore Co-authored-by: Abhishek Pandya --- .gitignore | 4 ++++ AGENTS.md | 33 +++++++++++++++++++++++++++++++++ requirements.txt | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+) create mode 100644 .gitignore create mode 100644 AGENTS.md create mode 100644 requirements.txt diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e3e31ca --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +.ipynb_checkpoints/ +__pycache__/ +*.pyc +.venv/ diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..d169b6b --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,33 @@ +# AGENTS.md + +This repository is a collection of standalone data-science / ML **Jupyter notebooks** +(originally authored for Google Colab). There is no server, database, or build step — +the "application" is JupyterLab plus the scientific Python stack. + +Notebooks: +- `CIS545FinalProject.ipynb` — EDA + sklearn models on the US Accidents dataset (pandas, geopandas, sklearn). +- `Transformer_Exercise.ipynb` — implement GPT-2 from scratch (torch, transformers, einops). +- `WAFChallenge.ipynb` — TF-IDF + KMeans clustering on a YouTube videos CSV (sklearn). + +## Cursor Cloud specific instructions + +- Dependencies are installed with `pip install --user -r requirements.txt` (handled by the + startup update script). Packages land in `~/.local`. +- The Jupyter CLIs (`jupyter`, `jupyter-lab`) live in `~/.local/bin`, which is **not on PATH** + by default. Prefix commands with `export PATH="$HOME/.local/bin:$PATH"` or invoke via + `python3 -m jupyterlab` / `python3 -m jupyter`. +- Run JupyterLab with: + `jupyter lab --no-browser --ip=0.0.0.0 --port=8888 --ServerApp.token=devtoken` + then open `http://localhost:8888/lab?token=devtoken`. +- Execute a notebook headlessly (good for CI-style checks / quick validation): + `jupyter nbconvert --to notebook --execute --ExecutePreprocessor.timeout=120 .ipynb` +- `requirements.txt` intentionally **omits** the Colab-only pieces that cannot run outside + Colab: `google.colab`, `PyDrive`, `oauth2client`, `google_drive_downloader`, and the GitHub + research packages `easy_transformer` / `pysvelte`. Cells that import those (used only to pull + input CSVs from Google Drive, or for the interpretability visualizations) will fail locally; + this is expected. Supply the input data files locally to run the data-dependent cells. +- The notebooks target newer library versions than Colab pinned; some legacy calls (e.g. + `from sklearn.externals.six import StringIO` in `CIS545FinalProject.ipynb`) are removed in + current sklearn and will error. This is a notebook code issue, not an environment problem. +- No GPU is available; PyTorch is the CPU build (`torch==*+cpu`). The Transformer notebook runs + on CPU but slowly. diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..542f097 --- /dev/null +++ b/requirements.txt @@ -0,0 +1,34 @@ +# Python dependencies for the Jupyter notebooks in this repository. +# These notebooks were authored for Google Colab; this file lets you run the +# scientific/ML portions locally in JupyterLab. Google Colab-only pieces +# (google.colab, PyDrive, Google Drive data ingestion) are intentionally +# omitted since they only work inside Colab. See AGENTS.md for details. + +# Use CPU-only PyTorch wheels (no GPU in the dev environment). +--extra-index-url https://download.pytorch.org/whl/cpu + +# Jupyter +jupyterlab +notebook +ipykernel + +# Data-science core (all notebooks) +numpy +pandas +matplotlib +seaborn +scikit-learn + +# Geospatial / decision-tree viz (CIS545FinalProject.ipynb) +geopandas +shapely +pydotplus + +# Transformer exercise (Transformer_Exercise.ipynb) +torch +transformers +datasets +einops +fancy_einsum +tqdm +plotly