Conversation
There was a problem hiding this comment.
Greptile Summary
This PR introduces a new BigQuery template for the Bruin project. The template includes:
- A basic pipeline configuration with BigQuery-specific settings
- Example SQL assets demonstrating table creation with schema constraints
- A seed asset for loading CSV data into BigQuery
- Reusable Jinja macros organized into three categories:
- Aggregations (count_by, sum_by, top_n)
- Filters (date_range, recent_records, filter_null, in_list)
- Transformations (pivot_sum, deduplicate, generate_surrogate_key, safe_divide)
- Comprehensive README with setup instructions and macro usage examples
The template follows the same structure as the existing DuckDB template, adapted for BigQuery-specific SQL syntax and functions. The macros use BigQuery-native functions like SAFE_DIVIDE, DATE_SUB, and TO_HEX(MD5()) where appropriate.
| {% macro safe_divide(numerator, denominator, default=0) -%} | ||
| SAFE_DIVIDE({{ numerator }}, {{ denominator }}) | ||
| {%- endmacro %} |
There was a problem hiding this comment.
[P3] The default parameter is unused. BigQuery's SAFE_DIVIDE returns NULL on division by zero, not a configurable default. Either remove the parameter or use IFNULL(SAFE_DIVIDE(...), {{ default }}).
Prompt To Fix With AI
This is a comment left during a code review.
Path: templates/bigquery/macros/transformations.sql
Line: 35:37
Comment:
[P3] The `default` parameter is unused. BigQuery's `SAFE_DIVIDE` returns NULL on division by zero, not a configurable default. Either remove the parameter or use `IFNULL(SAFE_DIVIDE(...), {{ default }})`.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,3 @@ | |||
| name,networking_through,position,contact_date | |||
| Y,LinkedIn,SDE,2024-01-01 | |||
| B,LinkedIn,SDE 2,2024-01-01 | |||
There was a problem hiding this comment.
[P3] CSV file should end with a newline for better compatibility with some parsers.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: templates/bigquery/assets/seed.csv
Line: 3:3
Comment:
[P3] CSV file should end with a newline for better compatibility with some parsers.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
arsalann
left a comment
There was a problem hiding this comment.
nothing major, just some minor comments
|
|
||
| columns: | ||
| - name: id | ||
| type: int64 |
There was a problem hiding this comment.
if i remember correctly, int64 is not correct and it should be integer instead
| - name: name | ||
| type: string | ||
| description: Just a name | ||
| update_on_merge: true |
There was a problem hiding this comment.
is this necessary if there's no merge strategy? I don't think it would break but it might be confusing
| /* @bruin | ||
|
|
||
| materialization: | ||
| type: table |
There was a problem hiding this comment.
would be nice to also demonstrate cluster and partition
| checks: | ||
| - name: not_null |
There was a problem hiding this comment.
already demonstrated not_null check above, would be nice to demonstrate nullable tag
| description: Number of people per country | ||
|
|
There was a problem hiding this comment.
should we demonstrate a custom check here?
| {{ group_column }}, | ||
| SUM({{ sum_column }}) as total | ||
| FROM {{ table }} | ||
| GROUP BY {{ group_column }} |
There was a problem hiding this comment.
can also just be group by all 😅 sorry just being picky
No description provided.