Skip to content

Conversation

@vzucher
Copy link
Contributor

@vzucher vzucher commented Dec 4, 2025

Summary

Implements GitHub-flavored Markdown output format using Strategy and Registry design patterns. Addresses user request for markdown export functionality with zero breaking changes.

Implementation

Architecture (FAANG-Level)

  • Strategy Pattern: Pluggable formatter system
  • Registry Pattern: Centralized formatter management
  • SOLID Principles: All 5 principles applied
  • Zero Breaking Changes: Fully backward compatible

New Formatter Infrastructure

Created:

  • src/brightdata/formatters/__init__.py - Package exports
  • src/brightdata/formatters/base.py - BaseFormatter interface
  • src/brightdata/formatters/registry.py - FormatterRegistry
  • src/brightdata/formatters/json_formatter.py - Refactored existing
  • src/brightdata/formatters/pretty_formatter.py - Refactored existing
  • src/brightdata/formatters/minimal_formatter.py - Refactored existing
  • src/brightdata/formatters/markdown.py - NEW Markdown formatter

Features

Markdown formatter includes:

  • ✅ GitHub-flavored markdown tables
  • ✅ Status badges (✅ Success / ❌ Failed)
  • ✅ Metadata tables (cost, timing, platform, method)
  • ✅ Smart column selection (5 most important fields)
  • ✅ Row limiting (10 rows for readability)
  • ✅ Long value truncation (50-100 chars)
  • ✅ Handles lists, dicts, and strings
  • ✅ Proper markdown escaping

Integration

SDK Usage:

result = client.search.google(query="python")
md = result.to_markdown()  # NEW method
result.save_to_file("report.md", format="markdown")  # NEW format

CLI Usage:

brightdata search google "query" --output-format markdown
brightdata search google "query" --output-format markdown --output-file report.md

Changes

Modified:

  • src/brightdata/models.py - Added to_markdown() method to BaseResult
  • src/brightdata/models.py - Updated save_to_file() to support all formats via registry
  • src/brightdata/cli/commands/scrape.py - Added markdown to output choices
  • src/brightdata/cli/commands/search.py - Added markdown to output choices
  • src/brightdata/cli/utils.py - Refactored to use FormatterRegistry
  • README.md - Added markdown examples and documentation

Added:

  • tests/unit/test_markdown.py - 13 comprehensive tests

Testing

  • ✅ 13 new markdown-specific tests
  • ✅ All tests passing: 483/483 (100%)
  • ✅ Real API calls tested (Google, Amazon)
  • ✅ CLI integration tested
  • ✅ File saving tested
  • ✅ All formatters tested

Benefits

For Users:

  • Copy-paste into GitHub issues/PRs/docs
  • Beautiful terminal output with tables
  • Works in Jupyter notebooks
  • Commit-friendly format for git

For Architecture:

  • Extensible: Easy to add CSV, YAML, XML later
  • Testable: Each formatter isolated
  • Maintainable: Clean separation of concerns
  • Type-safe: Full type hints

Backward Compatibility

✅ All existing code works unchanged:

result.to_json()  # Still works
result.to_dict()  # Still works
result.save_to_file("file.json")  # Still works

Performance

No impact - formatters are lazy-loaded and only used when requested.

Documentation

  • Added markdown examples to README
  • Updated CLI help text
  • Added comprehensive docstrings
  • Included usage examples

@Tonel
Copy link

Tonel commented Dec 4, 2025

Does this also include the Markdown data format documentation here:
https://docs.brightdata.com/scraping-automation/web-unlocker/features#scrape-as-markdown

It would be important to have the option to specify the markdown data format instead of just raw and json as shown below

results = await client.scrape.generic.url_async(
    url=url,
    response_format="markdown" # <-----
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants