First off, thank you for considering contributing to FlyScrape! We value your time and effort, and we're excited to have you join our community. Whether you're fixing a bug, adding a new feature, or improving documentation, your help is appreciated.
- Getting Started
- Development Workflow
- Project Structure
- Pull Request Process
- Style Guide
- Reporting Issues
Start by forking the FlyScrape repository to your GitHub account.
Clone your forked repository to your local machine:
git clone https://github.com/YOUR_USERNAME/flyscrape.git
cd flyscrapeWe use bun for dependency management. Install the required packages:
bun installAlways work on a new branch for your changes. Use a descriptive name:
# For features
git checkout -b feature/amazing-new-feature
# For bug fixes
git checkout -b fix/nasty-bugImplement your feature or fix. Keep your changes focused and minimal.
- Follow the Single Responsibility Principle.
- Write clear, self-documenting code.
- Add comments for complex logic.
Ensure that your changes don't break existing functionality. We use Vitest for testing.
# Run all tests
bun test
# Run tests in watch mode (useful during development)
bun run test:watchWe maintain a consistent code style. Run the formatter before committing:
bun run formatUnderstanding the codebase structure will help you navigate easily:
src/core: The heart of FlyScrape. Contains the main crawler logic (crawler.ts) and orchestration.src/extraction: Strategies for extracting data (CSS selectors, LLM-based extraction).src/processing: Data processing modules, including Markdown generation, content pruning, and chunking.src/stealth: Browser stealth techniques to avoid detection (User-Agent rotation, fingerprinting).src/cache: Caching mechanisms (Memory, Disk) to optimize performance.src/utils: Shared utility functions (DOM manipulation, text processing).
When you're ready to submit your changes:
- Push your branch to your fork:
git push origin feature/amazing-new-feature
- Open a Pull Request against the
mainbranch of the originalflyrank/flyscraperepository. - Description: Provide a clear and concise description of your changes. Explain why this change is necessary and how it solves the problem.
- Link Issues: If your PR fixes an issue, link it (e.g.,
Fixes #123). - Checklist:
- My code follows the project's style guidelines.
- I have performed a self-review of my code.
- I have added tests that prove my fix is effective or that my feature works.
- New and existing unit tests pass locally.
- TypeScript: We use strict TypeScript. Avoid
anyunless absolutely necessary. - Naming: Use
camelCasefor variables/functions andPascalCasefor classes/interfaces. - Comments: Write JSDoc comments for public APIs.
Found a bug? Great! (Well, not great that it's broken, but great that you found it).
- Check the existing issues to see if it has already been reported.
- Open a new issue with a clear title and description.
- Include a reproduction code snippet or steps to reproduce the issue.
- Mention your OS, Node.js version, and FlyScrape version.
By contributing, you agree that your contributions will be licensed under the MIT License.
Happy Coding! π