Skip to content

Commit cedcd2c

Browse files
authored
Change input CSV for classification guide (#177)
The guide for classifying rows pointed to the same CSV as the guide for filtering. But the classification guide's linked session and content referred to a different input CSV. The two guides are now pointing to separate CSVs.
1 parent c9eb18e commit cedcd2c

6 files changed

Lines changed: 10 additions & 7 deletions

docs/add-column-web-lookup.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ pip install everyrow
2222
export EVERYROW_API_KEY=your_key_here # Get one at everyrow.io/api-key
2323
```
2424

25-
The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. Download [saas_products.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/saas_products.csv) to follow along. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats.
25+
The dataset is a list of 246 SaaS and developer tools like Slack, Notion, Asana. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/saas_products.csv) and save the CSV file to your computer. We find the annual price of each product's lowest paid tier, which isn't available through any structured API; it requires visiting pricing pages that change frequently and present information in different formats.
2626

2727
```python
2828
import asyncio

docs/classify-dataframe-rows-llm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ If you're categorizing support tickets, labeling training data, or tagging conte
2424

2525
## Walkthrough
2626

27-
The `agent_map` function processes each row in parallel with structured output via Pydantic models. You define the schema, describe the task, and get back a DataFrame with your new columns. Download [hn_jobs.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs.csv) to follow along.
27+
The `agent_map` function processes each row in parallel with structured output via Pydantic models. You define the schema, describe the task, and get back a DataFrame with your new columns. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs_classify.csv) and save the CSV file to your computer.
2828

2929
```bash
3030
pip install everyrow
@@ -50,7 +50,7 @@ class JobClassification(BaseModel):
5050

5151

5252
async def main():
53-
jobs = pd.read_csv("hn_jobs.csv")
53+
jobs = pd.read_csv("hn_jobs_classify.csv")
5454

5555
result = await agent_map(
5656
task="""Classify this job posting by primary role:

docs/data/hn_jobs_classify.csv

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:00dcf82dcd04e564409fc3a352361c02841f816c5c46f8497a3864895b7a074e
3+
size 219165

docs/filter-dataframe-with-llm.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: How to screen data by criteria that require research in Python. LLM
55

66
# How to Filter a DataFrame with an LLM
77

8-
Here we show how to filter a pandas dataframe by qualitative criteria, when normal filtering like df[df['column'] == value] won't work.
8+
Here we show how to filter a pandas dataframe by qualitative criteria, when normal filtering like `df[df['column'] == value]` won't work.
99

1010
LLMs, and LLM-web-agents, can evaluate qualitative criteria at high accuracy. But they can be very expensive and difficult to orchestrate at scale. We provide a low cost solution by handling the orchestration, batching, and consistency checking.
1111

@@ -34,7 +34,7 @@ df[df['posting'].str.contains('remote', case=False)]
3434

3535
What you need is a filter that understands: this posting explicitly allows remote work, requires senior experience, and states a specific salary number.
3636

37-
We use a dataset of 3,616 job postings from Hacker News "Who's Hiring" threads, 10% of all posts every month since March 2020 through January 2026. Download [hn_jobs.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs.csv) to follow along.
37+
We use a dataset of 3,616 job postings from Hacker News "Who's Hiring" threads, 10% of all posts every month since March 2020 through January 2026. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/hn_jobs_screen.csv) and save the CSV file to your computer.
3838

3939
```bash
4040
pip install everyrow
@@ -47,7 +47,7 @@ import pandas as pd
4747
from pydantic import BaseModel, Field
4848
from everyrow.ops import screen
4949

50-
jobs = pd.read_csv("hn_jobs.csv") # 3,616 job postings
50+
jobs = pd.read_csv("hn_jobs_screen.csv") # 3,616 job postings
5151

5252
class JobScreenResult(BaseModel):
5353
qualifies: bool = Field(description="True if meets ALL criteria")

docs/resolve-entities-python.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ pip install everyrow
2323
export EVERYROW_API_KEY=your_key_here # Get one at everyrow.io
2424
```
2525

26-
We'll use a messy CRM dataset with 500 company records. The same companies appear multiple times with different spellings, abbreviations, and missing fields. Download [case_01_crm_data.csv](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/case_01_crm_data.csv) to follow along.
26+
We'll use a messy CRM dataset with 500 company records. The same companies appear multiple times with different spellings, abbreviations, and missing fields. To follow along, [right click this link](https://media.githubusercontent.com/media/futuresearch/everyrow-sdk/refs/heads/main/docs/data/case_01_crm_data.csv) and save the CSV file to your computer.
2727

2828
```python
2929
import asyncio

0 commit comments

Comments
 (0)