Skip to content

Conversation

@blentz
Copy link

@blentz blentz commented Dec 18, 2025

Summary

Implement OutputControl feature allowing LLM clients to request truncated responses to avoid exceeding context windows. Addresses large response handling for crawl4ai MCP tools.

New Features

  • content_offset/content_limit: Paginate text fields (html, markdown, etc.)
  • max_links/max_media/max_tables: Limit collection sizes
  • exclude_fields: Omit fields entirely (supports dot-notation like markdown.references_markdown)
  • _output_meta: Response metadata with truncation stats

Integration

Integrated into all endpoints:

  • /md
  • /html
  • /crawl
  • /crawl/stream
  • /execute_js
  • /screenshot
  • /pdf

Backward Compatibility

Fully backward compatible - omitting the output param returns the full response as before.

Files Changed

  • deploy/docker/output_control.py - Core pagination/control logic
  • deploy/docker/schemas.py - OutputControl schema definitions
  • deploy/docker/server.py - Endpoint integration
  • deploy/docker/tests/test_output_control.py - Comprehensive test suite

Implement OutputControl feature allowing LLM clients to request truncated
responses to avoid exceeding context windows. Addresses large response
handling for crawl4ai MCP tools.

New features:
- content_offset/content_limit: Paginate text fields (html, markdown, etc.)
- max_links/max_media/max_tables: Limit collection sizes
- exclude_fields: Omit fields entirely (supports dot-notation)
- _output_meta: Response metadata with truncation stats

Integrated into all endpoints: /md, /html, /crawl, /crawl/stream,
/execute_js, /screenshot, /pdf

Fully backward compatible - omitting 'output' param returns full response.
@blentz blentz changed the base branch from main to develop December 26, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant