Change input CSV for classification guide (#177)

rgambee · web-flow · commit cedcd2c595ac · 2026-02-20T16:45:03.000-05:00
The guide for classifying rows pointed to the same CSV as the guide for
filtering. But the classification guide's linked session and content
referred to a different input CSV. The two guides are now pointing to
separate CSVs.
diff --git a/docs/add-column-web-lookup.md b/docs/add-column-web-lookup.md
@@ -22,7 +22,7 @@ pip install everyrow
 export EVERYROW_API_KEY=your_key_here  # Get one at everyrow.io/api-key
 ```
 
-The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. Download [saas_products.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/saas_products.csv) to follow along. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats.
+The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/saas_products.csv) and save the CSV file to your computer. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats.
 
 ```python
 import asyncio
diff --git a/docs/classify-dataframe-rows-llm.md b/docs/classify-dataframe-rows-llm.md
@@ -24,7 +24,7 @@ If you're categorizing support tickets, labeling training data, or tagging conte
 
 ## Walkthrough
 
-The `agent_map` function processes each row in parallel with structured output via Pydantic models. You define the schema, describe the task, and get back a DataFrame with your new columns. Download [hn_jobs.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs.csv) to follow along.
+The `agent_map` function processes each row in parallel with structured output via Pydantic models. You define the schema, describe the task, and get back a DataFrame with your new columns. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs_classify.csv) and save the CSV file to your computer.
 
 ```bash
 pip install everyrow
@@ -50,7 +50,7 @@ class JobClassification(BaseModel):
 
 
 async def main():
-    jobs = pd.read_csv("hn_jobs.csv")
+    jobs = pd.read_csv("hn_jobs_classify.csv")
 
     result = await agent_map(
         task="""Classify this job posting by primary role:
diff --git a/docs/data/hn_jobs_classify.csv b/docs/data/hn_jobs_classify.csv
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:00dcf82dcd04e564409fc3a352361c02841f816c5c46f8497a3864895b7a074e
+size 219165
diff --git a/docs/data/hn_jobs_screen.csv b/docs/data/hn_jobs_screen.csv
diff --git a/docs/filter-dataframe-with-llm.md b/docs/filter-dataframe-with-llm.md
@@ -5,7 +5,7 @@ description: How to screen data by criteria that require research in Python. LLM
 
 # How to Filter a DataFrame with an LLM
 
-Here we show how to filter a pandas dataframe by qualitative criteria, when normal filtering like df[df['column'] == value] won't work.
+Here we show how to filter a pandas dataframe by qualitative criteria, when normal filtering like `df[df['column'] == value]` won't work.
 
 LLMs, and LLM-web-agents, can evaluate qualitative criteria at high accuracy. But they can be very expensive and difficult to orchestrate at scale. We provide a low cost solution by handling the orchestration, batching, and consistency checking.
 
@@ -34,7 +34,7 @@ df[df['posting'].str.contains('remote', case=False)]
 
 What you need is a filter that understands: this posting explicitly allows remote work, requires senior experience, and states a specific salary number.
 
-We use a dataset of 3,616 job postings from Hacker News "Who's Hiring" threads, 10% of all posts every month since March 2020 through January 2026. Download [hn_jobs.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs.csv) to follow along.
+We use a dataset of 3,616 job postings from Hacker News "Who's Hiring" threads, 10% of all posts every month since March 2020 through January 2026. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs_screen.csv) and save the CSV file to your computer.
 
 ```bash
 pip install everyrow
@@ -47,7 +47,7 @@ import pandas as pd
 from pydantic import BaseModel, Field
 from everyrow.ops import screen
 
-jobs = pd.read_csv("hn_jobs.csv")  # 3,616 job postings
+jobs = pd.read_csv("hn_jobs_screen.csv")  # 3,616 job postings
 
 class JobScreenResult(BaseModel):
     qualifies: bool = Field(description="True if meets ALL criteria")
diff --git a/docs/resolve-entities-python.md b/docs/resolve-entities-python.md
@@ -23,7 +23,7 @@ pip install everyrow
 export EVERYROW_API_KEY=your_key_here  # Get one at everyrow.io
 ```
 
-We'll use a messy CRM dataset with 500 company records. The same companies appear multiple times with different spellings, abbreviations, and missing fields. Download [case_01_crm_data.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/case_01_crm_data.csv) to follow along.
+We'll use a messy CRM dataset with 500 company records. The same companies appear multiple times with different spellings, abbreviations, and missing fields. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/case_01_crm_data.csv) and save the CSV file to your computer.
 
 ```python
 import asyncio

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+version https://git-lfs.github.com/spec/v1`
	`2`	`+oid sha256:00dcf82dcd04e564409fc3a352361c02841f816c5c46f8497a3864895b7a074e`
	`3`	`+size 219165`