Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/command/gh-pr.md → .claude/commands/gh-pr.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
Create a PR on github with an accurate description following our naming convention for the current changes. $ARGUMENTS

84 changes: 0 additions & 84 deletions .claude/skills/create-skill.md

This file was deleted.

21 changes: 14 additions & 7 deletions .claude/skills/skill.md → .claude/skills/gh-pr.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ git log -1 --pretty=%s # Last commit message

- Check current branch: `git branch --show-current`
- If on main/master/next, create feature branch with conventional naming
- Switch to new branch: `git checkout -b <username>/<features|fixes>/<branch-name>`
- Current branch convention `<username>/<features|fixes>/<branch-name>`
- Branch convention: `<username>/<type>/<description>` (e.g., `fzuppichini/features/new-feature`)
- Switch to new branch: `git checkout -b <username>/<type>/<description>`

2. **Analyze & Stage**:

Expand Down Expand Up @@ -160,21 +160,28 @@ When updating existing PRs, use these comment templates to preserve the original
1. Create branch and make changes
2. Stage, commit, push → triggers PR creation
3. Each subsequent push triggers update comment
4. By default assume the PR is *wip* (work in progress) so open it appropriately

### Commit Message Conventions

See **[docs/GIT_STYLE.md](docs/GIT_STYLE.md)** for full guide.

- `feat:` - New features
- `fix:` - Bug fixes
- `refactor:` - Code refactoring
- `docs:` - Documentation changes
- `test:` - Test additions/modifications
- `chore:` - Maintenance tasks
- `style:` - Formatting changes
- `content:` - Content changes (blog, copy)
- `perf:` - Performance improvements

### Branch Naming Conventions

- `feature/description` - New features
- `fix/bug-description` - Bug fixes
- `refactor/component-name` - Code refactoring
- `docs/update-readme` - Documentation updates
- `test/add-unit-tests` - Test additions
Always use `<username>/<type>/<description>` format:

- `username/features/description` - New features
- `username/fix/description` - Bug fixes
- `username/refactor/description` - Code refactoring
- `username/docs/description` - Documentation updates
- `username/test/description` - Test additions
26 changes: 26 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
- run: bun install
- run: bun run test

lint:
name: Lint & Typecheck
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: oven-sh/setup-bun@v2
- run: bun install
- run: bun run check
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
node_modules
node_modules
dist/
.DS_Store
bun.lock
*.tsbuildinfo
.env
doc/
40 changes: 40 additions & 0 deletions ISSUES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Issues Found During Example Testing

## SDK Bugs

### ~~1. Health endpoint uses wrong base URL~~ FIXED
- **File**: `src/scrapegraphai.ts` — `checkHealth()`
- **Problem**: The health endpoint lives at `https://api.scrapegraphai.com/healthz` (no `/v1` prefix), but the SDK was sending `GET /v1/healthz` which returned 404.
- **Fix**: `checkHealth()` now uses `HEALTH_URL` (root domain, no `/v1` prefix).

### ~~7. Crawl poll response nested inside `result` wrapper~~ FIXED
- **File**: `src/scrapegraphai.ts` — `submitAndPoll()`
- **Problem**: The crawl poll API returns `{ status: "success", result: { status: "done", pages: [...], crawled_urls: [...] } }`. The SDK was returning the outer wrapper as `data`, so `data.pages` and `data.crawled_urls` were `undefined`.
- **Fix**: Added `unwrapResult()` that detects the nested `result` object and promotes it to the top level. Also added `llm_result`, `credits_used`, `pages_processed`, `elapsed_time` to `CrawlResponse` type.

## API-Side Issues

### 2. Agentic Scraper returns 500
- **Example**: `agenticscraper/agenticscraper_basic.ts`, `agenticscraper/agenticscraper_ai_extraction.ts`
- **Error**: `Server error — try again later` (HTTP 500)
- **Note**: Both basic and AI extraction modes fail. Likely an API deployment issue.

### 3. Generate Schema — modify existing returns empty schema
- **Example**: `schema/modify_existing_schema.ts`
- **Error**: `generated_schema` comes back as `{}`
- **Note**: Basic generation works fine. Modifying an existing schema returns empty. May be async and needs polling, or the API doesn't fully support modification yet.

### 4. Crawl markdown mode returns 0 pages
- **Example**: `crawl/crawl_markdown.ts`
- **Error**: `extraction_mode: false` returns `{ pages: [] }` despite status `success`
- **Note**: Extraction mode crawls (with prompt) work fine and return pages. Markdown-only mode seems broken on the API side.

### 5. Scrape endpoint rejects uppercase country codes
- **Example**: `scrape/scrape_stealth.ts`
- **Error**: `Invalid country code` when sending `"US"` — must be lowercase `"us"`
- **Note**: Fixed in the example. SDK could validate/lowercase this automatically.

### 6. SearchScraper markdown mode returns empty result
- **Example**: `searchscraper/searchscraper_markdown.ts`
- **Error**: `result` is `{}` when `extraction_mode: false`, though `reference_urls` are populated
- **Note**: The markdown content may be in a different response field, or the API doesn't support this mode correctly.
Loading