Skip to content

feat: Add bigframes.execution_history API to track BigQuery jobs#16588

Open
shuoweil wants to merge 2 commits intomainfrom
shuowei-execution-history
Open

feat: Add bigframes.execution_history API to track BigQuery jobs#16588
shuoweil wants to merge 2 commits intomainfrom
shuowei-execution-history

Conversation

@shuoweil
Copy link
Copy Markdown
Contributor

@shuoweil shuoweil commented Apr 8, 2026

This PR promotes execution_history() to the top-level bigframes namespace and upgrades it to track rich metadata for every BigQuery job executed during your session.

Key User Benefits:

  • Easier Access: Call bigframes.execution_history() directly instead of digging into sub-namespaces.

  • Rich Metadata Tracking: Captures structured statistics for both Query Jobs and Load Jobs including:
    - job_id and a direct Google Cloud Console URL for easy debugging.
    - Performance metrics: total_bytes_processed, duration_seconds, and slot_millis.
    - Query details (truncated preview of the SQL ran).

  • Clean, Focused Logs: Automatically filters out internal library overhead (like schema validations and index uniqueness checks) so your history only shows the data processing steps you actually care about.

Usage Example:

    1 import bigframes.pandas as bpd
    2 import pandas as pd
    3 import bigframes
    4
    5 # ... run some bigframes operations ...
    6 df = bpd.read_gbq("SELECT 1")
    7
    8 # Upload some local data (triggers a Load Job)
    9 bpd.read_pandas(pd.DataFrame({'a': [1, 2, 3]}))
   10
   11 # Get a DataFrame of all BQ jobs run in this session
   12 history = bigframes.execution_history()
   13
   14 # Inspect recent queries, their costs, and durations
   15 print(history[['job_id', 'job_type', 'total_bytes_processed', 'duration_seconds', 'query']])

verified at:

  1. vs code notebook: screen/8u2yhaRV9iHbDbF
  2. colab notebook: screen/9L8VrP5y9DXhnZz

More testcases and notebook update will be checked in using separate PRs for easier review.

Fixes #<481840739> 🦕

@shuoweil shuoweil self-assigned this Apr 8, 2026
@shuoweil shuoweil requested review from a team as code owners April 8, 2026 22:01
@shuoweil shuoweil force-pushed the shuowei-execution-history branch from 35a379d to eab6cdb Compare April 8, 2026 22:03
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an execution history feature to track and display BigQuery and local Polars jobs initiated during a session. Key changes include the addition of a JobMetadata dataclass, updates to ExecutionMetrics for job tracking, and a specialized _ExecutionHistory DataFrame for formatted output. Review feedback identifies opportunities to improve error logging in the HTML representation, remove redundant attribute assignments in the metrics logic, and ensure that bytes processed during local executions are consistently aggregated into session-level metrics.

Comment on lines +141 to +142
except Exception:
return super()._repr_html_() # type: ignore
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a broad except Exception: can hide bugs in the HTML representation logic. It's better to catch specific exceptions or at least log the caught exception to aid in debugging if the formatting fails. This will provide visibility into any issues without breaking the user's interactive session.

Suggested change
except Exception:
return super()._repr_html_() # type: ignore
except Exception as e:
logger.warning("Failed to generate custom HTML representation for execution history: %s", e)
return super()._repr_html_() # type: ignore

Comment on lines +198 to +199
metadata.total_bytes_processed = bytes_processed
metadata.total_slot_ms = slot_millis
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These assignments are redundant because JobMetadata.from_job already populates total_bytes_processed and total_slot_ms from the query_job object when the job is a QueryJob. The values from get_performance_stats are sourced from the same attributes on the job object. Removing these lines will make the code cleaner.

if isinstance(event, bigframes.core.events.ExecutionFinished):
if event.result and isinstance(event.result, LocalExecuteResult):
self.execution_count += 1
bytes_processed = event.result.total_bytes_processed or 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The execution_count is being incremented for local Polars executions, but bytes_processed is not. For consistency with how other job types are handled in count_job_stats, self.bytes_processed should also be updated. This ensures that metrics like session.bytes_processed_sum are comprehensive. Note that the docstring for bytes_processed_sum might need to be updated in a separate change to reflect that it includes more than just BigQuery jobs.

Suggested change
bytes_processed = event.result.total_bytes_processed or 0
bytes_processed = event.result.total_bytes_processed or 0
self.bytes_processed += bytes_processed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant