Feature/sap hana by ness-david-dedu · Pull Request #4462 · redpanda-data/connect

ness-david-dedu · 2026-05-28T14:43:31Z

No description provided.

ness-david-dedu

Non-inline issues

internal/impl/saphana/bench/ngdbc.jar — A proprietary SAP HANA JDBC driver binary is committed to the repository. The README already documents how to obtain it from the SAP Software Downloads Center; ship the instructions, not the file. Remove it and add it to .gitignore.

docs/benchmark-results/SUMMARY.md — The SAP HANA connector adds a results file (docs/benchmark-results/sap-hana.md) but SUMMARY.md is not updated. Per docs/benchmarking.md: "When adding a new benchmark suite — update the table in this document, and update docs/benchmark-results/SUMMARY.md."

docs/benchmarking.md — The "Existing Benchmarks" table in this file does not include the SAP HANA entry. The same maintenance rule applies: add a row for the new suite.

ness-david-dedu · 2026-06-09T11:12:57Z

Commits

2ef9e7e2 — fix(sap_hana): ...: fix is not a valid system name in this repo's commit format. The pattern is system: message or system(subsystem): message where system is a known area name. Should be sap_hana: fix read ... (or the subsystem form sap_hana(input): ...).
24eb7ef3 — working bulk read: vague WIP message. Use imperative form describing the change (e.g. sap_hana: add bulk read mode).
9e36dff8 — imrpove hana rows write speed: typo ("imrpove") and missing system prefix. Should be e.g. sap_hana(output): improve bulk write throughput.
cf122214 — add bencmark results: typo ("bencmark") and missing system prefix. Should be e.g. sap_hana: add benchmark results.
4673e059 — add incrementing-mode benchmark...: leading whitespace in the message.

Review

Five issues found across security and dependency classification.

SQL injection — tableRef() (input_sap_hana.go#L283–L287): schemaName and tableName are concatenated into double-quoted SQL identifiers without escaping internal " characters. Fix: strings.ReplaceAll(name, ", "") before concatenation.
SQL injection — column names in openRows() (input_sap_hana.go#L296–L332): incrementingCol and timestampCol are interpolated into SQL identifiers without escaping internal ". Same fix.
SQL injection — output column names (output_sap_hana.go#L119–L121): same pattern, same fix.
SQL injection — table field completely unquoted (output_sap_hana.go#L125): table is interpolated raw into the INSERT SQL via %s with no quoting. Needs schema/table splitting and identifier escaping.
go.mod misclassified as indirect (go.mod#L233): github.com/SAP/go-hdb is // indirect but is directly imported. Run go mod tidy.
Proprietary binary committed (internal/impl/saphana/bench/ngdbc.jar): SAP HANA JDBC driver JAR should not be in the repo. The README documents how to obtain it — remove the file and add it to .gitignore.
docs/benchmark-results/SUMMARY.md not updated with SAP HANA entry (required per docs/benchmarking.md#L322–L326).
docs/benchmarking.md Existing Benchmarks table not updated with SAP HANA suite (same rule).

…put config Covers: quoteIdentifier helper, tableRef/openRows escaping in input, output split into schema_name+table, go.mod direct dep, ngdbc.jar removal, SUMMARY.md and benchmarking.md updates.

ness-david-dedu

docs/benchmark-results/SUMMARY.md — SAP HANA row is added to the "At a Glance" table but there is no corresponding paragraph under the "What These Numbers Mean" section. Every other connector has one. Add a short paragraph explaining what the numbers mean and what the benchmark measured (e.g. fetch_size sensitivity, Kafka overhead, recommended config).

ness-david-dedu · 2026-06-09T12:32:28Z

Commits
LGTM

Review

Three code issues and one docs gap found in the updated PR.

**Input DSN not ** (input_sap_hana.go#L69–L72): The input dsn field is missing .Secret(), exposing embedded credentials in DEBUG logs and the config API. The output connector marks its DSN Secret correctly — same fix needed here.
Nil pointer dereference in poll loop (input_sap_hana.go#L377–L397): dbMut is released during the poll sleep (line 377); Close() can acquire the lock, set s.db = nil, and release it. When ReadBatch re-acquires at line 389, s.db is nil and openRows() panics. The timerFired guard at line 390 is dead code (both other select branches return before reaching line 389). Fix: add if s.db == nil { return nil, nil, service.ErrNotConnected } after line 389.
Benchmark configs missing benchmark: processor and using wrong output sink (saphana-read/bulk/benchmark_config.yaml): All four configs route to kafka_franz instead of drop: {} and have no benchmark: processor. Per docs/benchmarking.md §3, the benchmark processor is required for msg/sec/MB/sec stats and drop: {} is required to isolate connector throughput from output overhead.
docs/benchmark-results/SUMMARY.md missing SAP HANA paragraph (SUMMARY.md#L16–L24): SAP HANA row added to the table but no paragraph under "What These Numbers Mean". Every other connector has one.

ness-david-dedu · 2026-06-09T19:17:07Z

Commits

ea5af3be — message starts with two leading spaces ( sap_hana: fix DSN secret lea…). Message must not start with whitespace per commit policy.

Review

Two previous rounds of findings have been resolved (DSN Secret field, poll-loop nil guard, benchmark config gaps, SQL injection, output identifier quoting). Three remaining issues:

quoteIdentifier has no unit test — this is the load-bearing SQL injection prevention function. It correctly doubles embedded " characters per SQL standard, but there is no test exercising the escape path. A test with an input containing a " character would confirm the doubling behaviour under regression. schema.go#L24-L26
Integration tests are missing Given-When-Then t.Log() structure — the project convention (followed by e.g. internal/impl/gcp/enterprise/bigquery/integration_test.go) uses t.Log("Given ...") / t.Log("When ...") / t.Log("Then ...") annotations to make CI log output triageable. None of the seven TestIntegration* functions in integration_test.go follow this pattern.
timerFired guard is unreachable dead code — at input_sap_hana.go#L391-L393 both the ctx.Done() and stopChan cases return before reaching the if !timerFired check, so timerFired is always true when that line executes. The guard can be removed without changing behaviour.

…fig gaps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… GWT logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

squiidz · 2026-06-10T14:02:02Z

timestamp+incrementing mode silently breaks after the first empty poll — input_sap_hana.go advances s.timestampHWM unconditionally at end of ReadBatch, but s.hwm only
advances when a row is scanned. After one empty poll the next query binds inc > ? with an empty string against a numeric column. Fix: don't advance timestampHWM past the first
observed row, or fall back to pure timestamp comparison while s.hwm == "".
HWM round-trips through fmt.Sprintf("%v", v) — input_sap_hana.go:451-455. For a TIMESTAMP incrementing_column (which the docs explicitly suggest), %v on time.Time produces
2024-01-01 00:00:00 +0000 UTC, which HANA cannot parse. Even for BIGINT/DECIMAL the string→column-type implicit cast is fragile. Keep the raw any and pass it directly to
QueryContext.
Output's "execMany bulk path" claim is not what the code does — output_sap_hana.go:170-189 calls db.ExecContext with a flat arg slice. That triggers go-hdb's argument-count
detection but not a true prepared/MtInsert flow. Either prepare the statement once in Connect and cache *sql.Stmt, or remove the "single MtInsert RPC" claim from
docs/benchmark-results/sap-hana.md.
go.mod has a stray blank line in the // indirect block — go mod tidy will revert it and any tidiness check on CI will fail.
public/components/all/package.go import order — saphana was inserted before salesforce. The block is alphabetised everywhere else; this will fail any goimports/lint
enforcement.

- hwm: string→any; store raw scanned value directly instead of fmt.Sprintf(%v) — prevents time.Time formatting to unparseable string when incrementing_column is a TIMESTAMP - timestamp+incrementing openRows: fall back to pure-timestamp predicate when hwm==nil (timestampHWM advanced past an empty poll but no incrementing value seen yet), preventing nil bind against a numeric column on the next query - output: prepare INSERT stmt once in Connect, use stmt.ExecContext in WriteBatch — makes go-hdb MtInsert path explicit rather than relying on undocumented argument-count detection - go.mod: remove stray blank line in // indirect block - public/components/all: restore alphabetical import order (salesforce before saphana)

Upstream added cloud_unsupported_reason column and removed version column from internal/plugins/info.csv. Updated sap_hana entries to new schema format and reinserted in correct alphabetical position.

twmb · 2026-06-12T21:29:16Z

The approach here is the right one and matches the scope we defined for SAP HANA: a dedicated polling sap_hana input (bulk / incrementing / query) plus hana driver support in sql_insert / sql_raw for writes. The schema-metadata path (from SYS.TABLE_COLUMNS, addition-only drift detection, precision-aware DECIMAL mapping) is the most important piece and it's in good shape, and the SQL is parameterized with identifier quoting throughout.

There are also a few good ideas in #4490 worth pulling in here — it grows the scope of this PR slightly, but keeps everything in one place.

A few things to address before merge:

Config surface

Decide on the connection model: separate connection_url / connection_user / connection_password vs. the single dsn here, plus table_name / incrementing_column_name naming. The spec's own sql_insert/sql_raw examples use dsn: hdb://…, so a single dsn may be the more consistent choice — let's just pick one deliberately.
poll_interval defaults to 5s; spec calls for 60s.
Missing knobs: numeric_mapping (none / best_fit / best_fit_eager_double), partition_count, max_retries.

Schema metadata (ideas from #4490)

Add primary-key discovery — DNM - experimental - feat(saphana): add SAP HANA CDC input connector (Debezium-format, trigger-based) #4490's SYS.INDEX_COLUMNS ⋈ SYS.INDEXES WHERE CONSTRAINT = 'PRIMARY KEY' query is a clean way to do it; there's no PK info in the emitted schema today.
Use real per-column nullability (IS_NULLABLE) instead of marking every field optional.
Emit DECIMAL as a canonical decimal string rather than float64. This matches every other CDC input here (oracle / mysql / mongodb / postgres all emit canonical decimal strings via sqlutil.CanonicaliseDecimal), and bare floats get rejected by downstream Avro string/decimal encoding — the current path is both lossy and inconsistent with the rest of the repo.

Behavior

The high-water mark is in-memory only, so it doesn't survive a restart — consider persisting it (a cache resource) so incrementing mode resumes without re-reading from the initial value.
The dedicated sap_hana output is outside the defined scope (writes go through sql_insert/sql_raw with driver: hana, which you've already wired up). Either drop it, or let's confirm we want it as a deliberate extra.
Metadata keys: schema is correct; table_name / database_schema were the agreed names (currently sap_hana_table / sap_hana_schema).

Tests / CI

DNM - experimental - feat(saphana): add SAP HANA CDC input connector (Debezium-format, trigger-based) #4490 has a useful type/encoding edge-case suite (Unicode, max int64, DECIMAL precision, date boundaries, sub-second timestamps, LOBs) that exercises the go-hdb round-trip — worth porting.
Run task docs so the generated component page + info.csv land in the PR, and add license headers to the bench/**/main.go helpers.

- IS_NULLABLE from SYS.TABLE_COLUMNS replaces blanket optional:true - Primary-key discovery via SYS.INDEX_COLUMNS ⋈ SYS.INDEXES; emitted as primary_key_columns JSON array metadata per message - DECIMAL → canonical string (CanonicaliseDecimal / CanonicaliseBigDecimal) instead of lossy float64; consistent with oracle/mysql/postgres/mssql - numeric_mapping field: none (default) / best_fit / best_fit_eager_double - Rename metadata keys: sap_hana_schema → database_schema, sap_hana_table → table_name (matches agreed naming) - poll_interval default 5s → 60s - max_retries and partition_count config fields added - License headers on all bench/**/main.go files

ness-david-dedu changed the title ~~[WIP] Feature/sap hana~~ Feature/sap hana Jun 4, 2026

ness-david-dedu commented Jun 9, 2026

View reviewed changes

Comment thread internal/impl/saphana/input_sap_hana.go Outdated

Comment thread internal/impl/saphana/input_sap_hana.go Outdated

Comment thread internal/impl/saphana/output_sap_hana.go Outdated

Comment thread internal/impl/saphana/output_sap_hana.go Outdated

Comment thread go.mod Outdated

ness-david-dedu added 24 commits June 9, 2026 14:25

sap_hana: add input connector

a16c0b6

sap_hana: fix Read never executing query in bulk and query modes

3fd200e

sap_hana: add bulk-read benchmark suite

496bbc3

sap_hana: add incrementing-mode benchmark and HWM skip test

c49e42a

sap_hana: add benchmark README

dae5bb8

sap_hana: add query-mode benchmark

68884be

sap_hana: add output benchmark

4bda368

sap_hana: add timestamp and timestamp+incrementing modes

33206cd

sap_hana: add bulk read mode

82a5caa

sap_hana: improve output write throughput

10b8632

sap_hana: improve bulk read throughput

f8ef87d

sap_hana: add bulk read benchmark matrix

b0e0283

sap_hana: add benchmark results

dfe9f7d

sap_hana: improve output write throughput

f3fa6db

sap_hana: add benchmark results for incrementing mode

138ecfb

sap_hana: add benchmark matrix for write and query modes

c35567c

sap_hana: add query-mode benchmark results

0b97f47

sap_hana: add benchmark results for bulk and incrementing reads

d439c90

sap_hana: fix data normalization

7c0fed8

sap_hana: add reset flag to incrementing benchmark

4889ec6

sap_hana: add license check

75489e2

sap_hana: enhance integration tests

54b0602

sap_hana: add Kafka Connect JDBC bulk bench alongside Redpanda Connect

4f7f486

sap_hana: fix SQL injection in identifier quoting and restructure out…

f301c6a

…put config Covers: quoteIdentifier helper, tableRef/openRows escaping in input, output split into schema_name+table, go.mod direct dep, ngdbc.jar removal, SUMMARY.md and benchmarking.md updates.

ness-david-dedu force-pushed the feature/sap_hana branch from 84a9f04 to f301c6a Compare June 9, 2026 11:26

ness-david-dedu commented Jun 9, 2026

View reviewed changes

Comment thread internal/impl/saphana/input_sap_hana.go Outdated

Comment thread internal/impl/saphana/input_sap_hana.go Outdated

Comment thread internal/impl/saphana/bench/saphana-read/bulk/benchmark_config.yaml

ness-david-dedu and others added 2 commits June 9, 2026 23:07

sap_hana: fix DSN secret leak, poll-loop nil panic, and benchmark con…

8eaa6b6

…fig gaps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

sap_hana: remove dead timerFired guard, add quoteIdentifier test, add…

9eac8f0

… GWT logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ness-david-dedu force-pushed the feature/sap_hana branch from ea5af3b to 9eac8f0 Compare June 9, 2026 20:09

ness-david-dedu added 2 commits June 11, 2026 12:33

sap_hana: merge upstream/main, resolve info.csv conflict

26133c2

Upstream added cloud_unsupported_reason column and removed version column from internal/plugins/info.csv. Updated sap_hana entries to new schema format and reinserted in correct alphabetical position.

twmb mentioned this pull request Jun 12, 2026

DNM - experimental - feat(saphana): add SAP HANA CDC input connector (Debezium-format, trigger-based) #4490

Draft

3 tasks

ness-david-dedu added 2 commits June 15, 2026 23:12

sap_hana: remove output component and write benchmarks

adc36a1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/sap hana#4462

Feature/sap hana#4462
ness-david-dedu wants to merge 30 commits into
redpanda-data:mainfrom
ness-david-dedu:feature/sap_hana

ness-david-dedu commented May 28, 2026

Uh oh!

ness-david-dedu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ness-david-dedu commented Jun 9, 2026

Uh oh!

ness-david-dedu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ness-david-dedu commented Jun 9, 2026

Uh oh!

ness-david-dedu commented Jun 9, 2026

Uh oh!

squiidz commented Jun 10, 2026

Uh oh!

twmb commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ness-david-dedu commented May 28, 2026

Uh oh!

ness-david-dedu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ness-david-dedu commented Jun 9, 2026

Uh oh!

ness-david-dedu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ness-david-dedu commented Jun 9, 2026

Uh oh!

ness-david-dedu commented Jun 9, 2026

Uh oh!

squiidz commented Jun 10, 2026

Uh oh!

twmb commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants