feat(oracle): add Oracle dialect with materialization strategies#1747
feat(oracle): add Oracle dialect with materialization strategies#1747maliackgoz wants to merge 2 commits intobruin-data:mainfrom
Conversation
## Summary Adds full materialization support for the Oracle dialect, including all major strategies and Oracle-specific PL/SQL handling. ## Materialization Strategies | Strategy | Implementation | |---|---| | `create+replace` | PL/SQL block: `DROP TABLE PURGE` + `CREATE TABLE AS` via `EXECUTE IMMEDIATE` | | `append` | Standard `INSERT INTO` | | `delete+insert` | `EXISTS`-based DELETE + INSERT in PL/SQL block | | `truncate+insert` | `EXECUTE IMMEDIATE 'TRUNCATE TABLE ...'` + INSERT (TRUNCATE is DDL in Oracle) | | `merge` | Standard `MERGE INTO` with NULL-safe PK comparisons | | `time_interval` | DELETE by date/timestamp range + INSERT | | `SCD2_by_time` | UPDATE to expire + MERGE with inline view pattern (workaround for ORA-38104) | | [view](cci:1://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/mssql/materialization.go:41:0-43:1) | `CREATE OR REPLACE VIEW` | ## Key Design Decisions - **`LOCALTIMESTAMP` over `CURRENT_TIMESTAMP`**: Oracle's `CURRENT_TIMESTAMP` returns `TIMESTAMP WITH TIME ZONE` (Typ=188), but SCD2 columns use plain `TIMESTAMP` (Typ=187). `LOCALTIMESTAMP` matches the column type exactly. - **`EXISTS` over `IN (SELECT DISTINCT ...)`**: Lets Oracle's optimizer choose the best semi-join strategy without forcing a full DISTINCT sort. - **NULL-safe PK comparisons**: [(a.pk = b.pk OR (a.pk IS NULL AND b.pk IS NULL))](cci:1://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/materialization.go:287:0-289:1) prevents phantom duplicate inserts when primary keys contain NULL values. - **`DROP TABLE PURGE`**: Avoids Oracle recyclebin pollution on table replacement. - **Identifier validation**: Regex-based validation rejects names with SQL-injection-prone characters before interpolation into raw SQL. - **View error guards**: Unsupported strategies (append, merge, etc.) explicitly error when materialization type is [view](cci:1://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/mssql/materialization.go:41:0-43:1). - **PL/SQL blocks for DDL**: Oracle cannot execute DDL directly inside PL/SQL, so `EXECUTE IMMEDIATE` is used for DROP, CREATE, and TRUNCATE statements. ## Files Changed - [pkg/oracle/materialization.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/materialization.go:0:0-0:0) — All materialization strategy implementations - [pkg/oracle/materialization_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/materialization_test.go:0:0-0:0) — 28 test cases covering all strategies - [pkg/oracle/operator.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/operator.go:0:0-0:0) — Oracle task operator - [pkg/oracle/operator_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/operator_test.go:0:0-0:0) — Operator tests - [pkg/oracle/checks.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/checks.go:0:0-0:0) — Oracle check implementations - [pkg/oracle/checks_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/checks_test.go:0:0-0:0) — Check tests - [pkg/oracle/db.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/db.go:0:0-0:0) — Connection handling improvements - [pkg/oracle/db_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/db_test.go:0:0-0:0) — Connection tests - [cmd/run.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/cmd/run.go:0:0-0:0) — Register Oracle dialect ## Testing - All unit tests pass with `go test -race` (81.9% coverage) - Integration tested against Oracle Free 26ai Docker container - Verified all 6 strategies end-to-end: SCD2, create+replace, delete+insert, append, merge (composite PK), truncate+insert - `gofmt`, `gci`, `go vet` all clean
|
|
||
| createAsQuery := fmt.Sprintf(`SELECT | ||
| src.*, | ||
| CAST(src.%s AS TIMESTAMP) AS bruin_valid_from, |
There was a problem hiding this comment.
SQL injection risk: asset.Materialization.IncrementalKey is interpolated directly into SQL without validation. While asset.Name is validated, the incremental key column is not.
| CAST(src.%s AS TIMESTAMP) AS bruin_valid_from, | |
| CAST(src.%s AS TIMESTAMP) AS bruin_valid_from, |
Add validation before line 108:
if err := validateIdentifier(asset.Materialization.IncrementalKey, "incremental_key column"); err != nil {
return "", err
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/materialization.go
Line: 110
Comment:
SQL injection risk: `asset.Materialization.IncrementalKey` is interpolated directly into SQL without validation. While `asset.Name` is validated, the incremental key column is not.
```suggestion
CAST(src.%s AS TIMESTAMP) AS bruin_valid_from,
```
Add validation before line 108:
```
if err := validateIdentifier(asset.Materialization.IncrementalKey, "incremental_key column"); err != nil {
return "", err
}
```
How can I resolve this? If you propose a fix, please make it concise.| matchedUpdateStatements = append(matchedUpdateStatements, fmt.Sprintf("target.%s = %s", col.Name, col.MergeSQL)) | ||
| } else { | ||
| matchedUpdateStatements = append(matchedUpdateStatements, fmt.Sprintf("target.%s = source.%s", col.Name, col.Name)) |
There was a problem hiding this comment.
SQL injection risk: col.Name is directly interpolated into SQL without validation. Column names should be validated or quoted to prevent injection.
Add validation for each column name, or use an identifier quoting function similar to Postgres's QuoteIdentifier.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/materialization.go
Line: 246-248
Comment:
SQL injection risk: `col.Name` is directly interpolated into SQL without validation. Column names should be validated or quoted to prevent injection.
Add validation for each column name, or use an identifier quoting function similar to Postgres's `QuoteIdentifier`.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| insertCols = append(insertCols, "bruin_valid_from", "bruin_valid_until", "bruin_is_current") | ||
| insertValues = append(insertValues, | ||
| "CAST(source."+asset.Materialization.IncrementalKey+" AS TIMESTAMP)", |
There was a problem hiding this comment.
SQL injection risk: asset.Materialization.IncrementalKey is interpolated directly without validation in the SCD2 query.
While validation exists at line 333, it only checks if the column exists in definitions - not if it's a safe identifier. Add validateIdentifier check earlier in the function.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/materialization.go
Line: 345
Comment:
SQL injection risk: `asset.Materialization.IncrementalKey` is interpolated directly without validation in the SCD2 query.
While validation exists at line 333, it only checks if the column exists in definitions - not if it's a safe identifier. Add `validateIdentifier` check earlier in the function.
How can I resolve this? If you propose a fix, please make it concise.| res := strings.Join(val, "','") | ||
| res = fmt.Sprintf("'%s'", res) | ||
|
|
||
| qq := fmt.Sprintf("SELECT COUNT(*) FROM %s WHERE CAST(%s as VARCHAR2(4000)) NOT IN (%s)", ti.GetAsset().Name, ti.Column.Name, res) |
There was a problem hiding this comment.
SQL injection risk: both ti.GetAsset().Name and ti.Column.Name are interpolated directly into SQL without validation or quoting.
Add identifier validation before constructing the query.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/checks.go
Line: 47
Comment:
SQL injection risk: both `ti.GetAsset().Name` and `ti.Column.Name` are interpolated directly into SQL without validation or quoting.
Add identifier validation before constructing the query.
How can I resolve this? If you propose a fix, please make it concise.| "SELECT count(*) FROM %s WHERE NOT REGEXP_LIKE(%s, '%s')", | ||
| ti.GetAsset().Name, | ||
| ti.Column.Name, |
There was a problem hiding this comment.
SQL injection risk: table name and column name are interpolated without validation or quoting.
Add identifier validation for ti.GetAsset().Name and ti.Column.Name.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/checks.go
Line: 68-70
Comment:
SQL injection risk: table name and column name are interpolated without validation or quoting.
Add identifier validation for `ti.GetAsset().Name` and `ti.Column.Name`.
How can I resolve this? If you propose a fix, please make it concise.Address code review feedback: add validateIdentifier checks for all identifier interpolations that were missing validation. - Validate IncrementalKey in buildSCD2ByTimeFullRefresh - Validate IncrementalKey in buildSCD2ByTimeQuery - Validate all column names in buildMergeQuery - Validate table name and column name in AcceptedValuesCheck - Validate table name and column name in PatternCheck
Summary
Adds full materialization support for the Oracle dialect, including all major strategies and Oracle-specific PL/SQL handling.
Materialization Strategies
create+replaceDROP TABLE PURGE+CREATE TABLE ASviaEXECUTE IMMEDIATEdelete+insertEXISTS-based DELETE + INSERT in PL/SQL blockKey Design Decisions
LOCALTIMESTAMPoverCURRENT_TIMESTAMP: Oracle'sCURRENT_TIMESTAMPreturnsTIMESTAMP WITH TIME ZONE(Typ=188), but SCD2 columns use plainTIMESTAMP(Typ=187).LOCALTIMESTAMPmatches the column type exactly.EXISTSoverIN (SELECT DISTINCT ...): Lets Oracle's optimizer choose the best semi-join strategy without forcing a full DISTINCT sort.DROP TABLE PURGE: Avoids Oracle recyclebin pollution on table replacement.EXECUTE IMMEDIATEis used for DROP, CREATE, and TRUNCATE statements.Files Changed
Testing
go test -race(81.9% coverage)gofmt,gci,go vetall clean