Skip to content

How to estimate query execution costs in Bigquery? #150

@DanielB280320

Description

@DanielB280320

Course

data-engineering-zoomcamp

Question

How we can estimate the cost of executed queries within a specified range of time in bigquery, and determine what users and when in time this executions were made.

Answer

The metadata table that contains all the history related with query and Jobs executions is called 'JOBS_BY_PROJECT' and we can find it in the path <your_project_id>.<your_region>.INFORMATION_SCHEMA.JOBS_BY_PROJECT; the following query must be executed within the bigquery query editor.

SELECT
DATE(creation_time) as date,
job_type,
statement_type,
user_email,
SUM(total_bytes_billed) / 1099511627776 * 6.25 as estimated_cost_usd, -- on-demand model costs are defined US$ 6.25 per TB
COUNT(*) as query_count

FROM project-0c3c5223-416f-4242-b0f.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= DATE_ADD(CURRENT_TIMESTAMP(), INTERVAL -60 DAY) --Interval can be modified
GROUP BY date, user_email, job_type, statement_type
ORDER BY estimated_cost_usd DESC -- Order by the highest cost execution based on total bytes billed
;

Output:

Image

Image
With this query we will be able to find insights about in which dates and what users did query executions with highest costs and optimize ones that are not efficient.

Checklist

  • I have searched existing FAQs and this question is not already answered
  • The answer provides accurate, helpful information
  • I have included any relevant code examples or links

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions