Skip to content

feat(oracle): add Oracle dialect with materialization strategies#1747

Open
maliackgoz wants to merge 2 commits intobruin-data:mainfrom
maliackgoz:feat/oracle-materialization
Open

feat(oracle): add Oracle dialect with materialization strategies#1747
maliackgoz wants to merge 2 commits intobruin-data:mainfrom
maliackgoz:feat/oracle-materialization

Conversation

@maliackgoz
Copy link
Copy Markdown

Summary

Adds full materialization support for the Oracle dialect, including all major strategies and Oracle-specific PL/SQL handling.

Materialization Strategies

Strategy Implementation
create+replace PL/SQL block: DROP TABLE PURGE + CREATE TABLE AS via EXECUTE IMMEDIATE
delete+insert EXISTS-based DELETE + INSERT in PL/SQL block

Key Design Decisions

  • LOCALTIMESTAMP over CURRENT_TIMESTAMP: Oracle's CURRENT_TIMESTAMP returns TIMESTAMP WITH TIME ZONE (Typ=188), but SCD2 columns use plain TIMESTAMP (Typ=187). LOCALTIMESTAMP matches the column type exactly.
  • EXISTS over IN (SELECT DISTINCT ...): Lets Oracle's optimizer choose the best semi-join strategy without forcing a full DISTINCT sort.
  • NULL-safe PK comparisons: (a.pk = b.pk OR (a.pk IS NULL AND b.pk IS NULL)) prevents phantom duplicate inserts when primary keys contain NULL values.
  • DROP TABLE PURGE: Avoids Oracle recyclebin pollution on table replacement.
  • Identifier validation: Regex-based validation rejects names with SQL-injection-prone characters before interpolation into raw SQL.
  • View error guards: Unsupported strategies (append, merge, etc.) explicitly error when materialization type is view.
  • PL/SQL blocks for DDL: Oracle cannot execute DDL directly inside PL/SQL, so EXECUTE IMMEDIATE is used for DROP, CREATE, and TRUNCATE statements.

Files Changed

  • pkg/oracle/materialization.go — All materialization strategy implementations
  • pkg/oracle/materialization_test.go — 28 test cases covering all strategies
  • pkg/oracle/operator.go — Oracle task operator
  • pkg/oracle/operator_test.go — Operator tests
  • pkg/oracle/checks.go — Oracle check implementations
  • pkg/oracle/checks_test.go — Check tests
  • pkg/oracle/db.go — Connection handling improvements
  • pkg/oracle/db_test.go — Connection tests
  • cmd/run.go — Register Oracle dialect

Testing

  • All unit tests pass with go test -race (81.9% coverage)
  • Integration tested against Oracle Free 26ai Docker container
  • Verified all 6 strategies end-to-end: SCD2, create+replace, delete+insert, append, merge (composite PK), truncate+insert
  • gofmt, gci, go vet all clean

## Summary

Adds full materialization support for the Oracle dialect, including all major strategies
and Oracle-specific PL/SQL handling.

## Materialization Strategies

| Strategy | Implementation |
|---|---|
| `create+replace` | PL/SQL block: `DROP TABLE PURGE` + `CREATE TABLE AS` via `EXECUTE IMMEDIATE` |
| `append` | Standard `INSERT INTO` |
| `delete+insert` | `EXISTS`-based DELETE + INSERT in PL/SQL block |
| `truncate+insert` | `EXECUTE IMMEDIATE 'TRUNCATE TABLE ...'` + INSERT (TRUNCATE is DDL in Oracle) |
| `merge` | Standard `MERGE INTO` with NULL-safe PK comparisons |
| `time_interval` | DELETE by date/timestamp range + INSERT |
| `SCD2_by_time` | UPDATE to expire + MERGE with inline view pattern (workaround for ORA-38104) |
| [view](cci:1://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/mssql/materialization.go:41:0-43:1) | `CREATE OR REPLACE VIEW` |

## Key Design Decisions

- **`LOCALTIMESTAMP` over `CURRENT_TIMESTAMP`**: Oracle's `CURRENT_TIMESTAMP` returns
  `TIMESTAMP WITH TIME ZONE` (Typ=188), but SCD2 columns use plain `TIMESTAMP` (Typ=187).
  `LOCALTIMESTAMP` matches the column type exactly.
- **`EXISTS` over `IN (SELECT DISTINCT ...)`**: Lets Oracle's optimizer choose the best
  semi-join strategy without forcing a full DISTINCT sort.
- **NULL-safe PK comparisons**: [(a.pk = b.pk OR (a.pk IS NULL AND b.pk IS NULL))](cci:1://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/materialization.go:287:0-289:1)
  prevents phantom duplicate inserts when primary keys contain NULL values.
- **`DROP TABLE PURGE`**: Avoids Oracle recyclebin pollution on table replacement.
- **Identifier validation**: Regex-based validation rejects names with SQL-injection-prone
  characters before interpolation into raw SQL.
- **View error guards**: Unsupported strategies (append, merge, etc.) explicitly error
  when materialization type is [view](cci:1://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/mssql/materialization.go:41:0-43:1).
- **PL/SQL blocks for DDL**: Oracle cannot execute DDL directly inside PL/SQL, so
  `EXECUTE IMMEDIATE` is used for DROP, CREATE, and TRUNCATE statements.

## Files Changed

- [pkg/oracle/materialization.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/materialization.go:0:0-0:0) — All materialization strategy implementations
- [pkg/oracle/materialization_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/materialization_test.go:0:0-0:0) — 28 test cases covering all strategies
- [pkg/oracle/operator.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/operator.go:0:0-0:0) — Oracle task operator
- [pkg/oracle/operator_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/operator_test.go:0:0-0:0) — Operator tests
- [pkg/oracle/checks.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/checks.go:0:0-0:0) — Oracle check implementations
- [pkg/oracle/checks_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/checks_test.go:0:0-0:0) — Check tests
- [pkg/oracle/db.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/db.go:0:0-0:0) — Connection handling improvements
- [pkg/oracle/db_test.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/pkg/oracle/db_test.go:0:0-0:0) — Connection tests
- [cmd/run.go](cci:7://file:///Users/maliackgoz/Documents/GitHub/bruin/cmd/run.go:0:0-0:0) — Register Oracle dialect

## Testing

- All unit tests pass with `go test -race` (81.9% coverage)
- Integration tested against Oracle Free 26ai Docker container
- Verified all 6 strategies end-to-end: SCD2, create+replace, delete+insert,
  append, merge (composite PK), truncate+insert
- `gofmt`, `gci`, `go vet` all clean
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 5 comments

Edit Code Review Agent Settings | Greptile


createAsQuery := fmt.Sprintf(`SELECT
src.*,
CAST(src.%s AS TIMESTAMP) AS bruin_valid_from,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: asset.Materialization.IncrementalKey is interpolated directly into SQL without validation. While asset.Name is validated, the incremental key column is not.

Suggested change
CAST(src.%s AS TIMESTAMP) AS bruin_valid_from,
CAST(src.%s AS TIMESTAMP) AS bruin_valid_from,

Add validation before line 108:

if err := validateIdentifier(asset.Materialization.IncrementalKey, "incremental_key column"); err != nil {
  return "", err
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/materialization.go
Line: 110

Comment:
SQL injection risk: `asset.Materialization.IncrementalKey` is interpolated directly into SQL without validation. While `asset.Name` is validated, the incremental key column is not.

```suggestion
  CAST(src.%s AS TIMESTAMP) AS bruin_valid_from,
```

Add validation before line 108:
```
if err := validateIdentifier(asset.Materialization.IncrementalKey, "incremental_key column"); err != nil {
  return "", err
}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +246 to +248
matchedUpdateStatements = append(matchedUpdateStatements, fmt.Sprintf("target.%s = %s", col.Name, col.MergeSQL))
} else {
matchedUpdateStatements = append(matchedUpdateStatements, fmt.Sprintf("target.%s = source.%s", col.Name, col.Name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: col.Name is directly interpolated into SQL without validation. Column names should be validated or quoted to prevent injection.

Add validation for each column name, or use an identifier quoting function similar to Postgres's QuoteIdentifier.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/materialization.go
Line: 246-248

Comment:
SQL injection risk: `col.Name` is directly interpolated into SQL without validation. Column names should be validated or quoted to prevent injection.

Add validation for each column name, or use an identifier quoting function similar to Postgres's `QuoteIdentifier`.

How can I resolve this? If you propose a fix, please make it concise.


insertCols = append(insertCols, "bruin_valid_from", "bruin_valid_until", "bruin_is_current")
insertValues = append(insertValues,
"CAST(source."+asset.Materialization.IncrementalKey+" AS TIMESTAMP)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: asset.Materialization.IncrementalKey is interpolated directly without validation in the SCD2 query.

While validation exists at line 333, it only checks if the column exists in definitions - not if it's a safe identifier. Add validateIdentifier check earlier in the function.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/materialization.go
Line: 345

Comment:
SQL injection risk: `asset.Materialization.IncrementalKey` is interpolated directly without validation in the SCD2 query.

While validation exists at line 333, it only checks if the column exists in definitions - not if it's a safe identifier. Add `validateIdentifier` check earlier in the function.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread pkg/oracle/checks.go
res := strings.Join(val, "','")
res = fmt.Sprintf("'%s'", res)

qq := fmt.Sprintf("SELECT COUNT(*) FROM %s WHERE CAST(%s as VARCHAR2(4000)) NOT IN (%s)", ti.GetAsset().Name, ti.Column.Name, res)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: both ti.GetAsset().Name and ti.Column.Name are interpolated directly into SQL without validation or quoting.

Add identifier validation before constructing the query.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/checks.go
Line: 47

Comment:
SQL injection risk: both `ti.GetAsset().Name` and `ti.Column.Name` are interpolated directly into SQL without validation or quoting.

Add identifier validation before constructing the query.

How can I resolve this? If you propose a fix, please make it concise.

Comment thread pkg/oracle/checks.go
Comment on lines +68 to +70
"SELECT count(*) FROM %s WHERE NOT REGEXP_LIKE(%s, '%s')",
ti.GetAsset().Name,
ti.Column.Name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL injection risk: table name and column name are interpolated without validation or quoting.

Add identifier validation for ti.GetAsset().Name and ti.Column.Name.

Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/oracle/checks.go
Line: 68-70

Comment:
SQL injection risk: table name and column name are interpolated without validation or quoting.

Add identifier validation for `ti.GetAsset().Name` and `ti.Column.Name`.

How can I resolve this? If you propose a fix, please make it concise.

Address code review feedback: add validateIdentifier checks for all
identifier interpolations that were missing validation.

- Validate IncrementalKey in buildSCD2ByTimeFullRefresh
- Validate IncrementalKey in buildSCD2ByTimeQuery
- Validate all column names in buildMergeQuery
- Validate table name and column name in AcceptedValuesCheck
- Validate table name and column name in PatternCheck
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant