Skip to content

Feature/sap hana#4462

Open
ness-david-dedu wants to merge 30 commits into
redpanda-data:mainfrom
ness-david-dedu:feature/sap_hana
Open

Feature/sap hana#4462
ness-david-dedu wants to merge 30 commits into
redpanda-data:mainfrom
ness-david-dedu:feature/sap_hana

Conversation

@ness-david-dedu

Copy link
Copy Markdown
Contributor

No description provided.

@ness-david-dedu ness-david-dedu changed the title [WIP] Feature/sap hana Feature/sap hana Jun 4, 2026

@ness-david-dedu ness-david-dedu left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-inline issues

internal/impl/saphana/bench/ngdbc.jar — A proprietary SAP HANA JDBC driver binary is committed to the repository. The README already documents how to obtain it from the SAP Software Downloads Center; ship the instructions, not the file. Remove it and add it to .gitignore.

docs/benchmark-results/SUMMARY.md — The SAP HANA connector adds a results file (docs/benchmark-results/sap-hana.md) but SUMMARY.md is not updated. Per docs/benchmarking.md: "When adding a new benchmark suite — update the table in this document, and update docs/benchmark-results/SUMMARY.md."

docs/benchmarking.md — The "Existing Benchmarks" table in this file does not include the SAP HANA entry. The same maintenance rule applies: add a row for the new suite.

Comment thread internal/impl/saphana/input_sap_hana.go Outdated
Comment thread internal/impl/saphana/input_sap_hana.go Outdated
Comment thread internal/impl/saphana/output_sap_hana.go Outdated
Comment thread internal/impl/saphana/output_sap_hana.go Outdated
Comment thread go.mod Outdated
@ness-david-dedu

Copy link
Copy Markdown
Contributor Author

Commits

  1. 2ef9e7e2fix(sap_hana): ...: fix is not a valid system name in this repo's commit format. The pattern is system: message or system(subsystem): message where system is a known area name. Should be sap_hana: fix read ... (or the subsystem form sap_hana(input): ...).
  2. 24eb7ef3working bulk read: vague WIP message. Use imperative form describing the change (e.g. sap_hana: add bulk read mode).
  3. 9e36dff8imrpove hana rows write speed: typo ("imrpove") and missing system prefix. Should be e.g. sap_hana(output): improve bulk write throughput.
  4. cf122214add bencmark results: typo ("bencmark") and missing system prefix. Should be e.g. sap_hana: add benchmark results.
  5. 4673e059 add incrementing-mode benchmark...: leading whitespace in the message.

Review

Five issues found across security and dependency classification.

  1. SQL injection — tableRef() (input_sap_hana.go#L283–L287): schemaName and tableName are concatenated into double-quoted SQL identifiers without escaping internal " characters. Fix: strings.ReplaceAll(name, ", "") before concatenation.
  2. SQL injection — column names in openRows() (input_sap_hana.go#L296–L332): incrementingCol and timestampCol are interpolated into SQL identifiers without escaping internal ". Same fix.
  3. SQL injection — output column names (output_sap_hana.go#L119–L121): same pattern, same fix.
  4. SQL injection — table field completely unquoted (output_sap_hana.go#L125): table is interpolated raw into the INSERT SQL via %s with no quoting. Needs schema/table splitting and identifier escaping.
  5. go.mod misclassified as indirect (go.mod#L233): github.com/SAP/go-hdb is // indirect but is directly imported. Run go mod tidy.
  6. Proprietary binary committed (internal/impl/saphana/bench/ngdbc.jar): SAP HANA JDBC driver JAR should not be in the repo. The README documents how to obtain it — remove the file and add it to .gitignore.
  7. docs/benchmark-results/SUMMARY.md not updated with SAP HANA entry (required per docs/benchmarking.md#L322–L326).
  8. docs/benchmarking.md Existing Benchmarks table not updated with SAP HANA suite (same rule).

…put config

  Covers: quoteIdentifier helper, tableRef/openRows escaping in input, output split into schema_name+table, go.mod direct dep, ngdbc.jar removal, SUMMARY.md and benchmarking.md updates.

@ness-david-dedu ness-david-dedu left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/benchmark-results/SUMMARY.md — SAP HANA row is added to the "At a Glance" table but there is no corresponding paragraph under the "What These Numbers Mean" section. Every other connector has one. Add a short paragraph explaining what the numbers mean and what the benchmark measured (e.g. fetch_size sensitivity, Kafka overhead, recommended config).

Comment thread internal/impl/saphana/input_sap_hana.go Outdated
Comment thread internal/impl/saphana/input_sap_hana.go Outdated
Comment thread internal/impl/saphana/bench/saphana-read/bulk/benchmark_config.yaml
@ness-david-dedu

Copy link
Copy Markdown
Contributor Author

Commits
LGTM

Review

Three code issues and one docs gap found in the updated PR.

  1. **Input DSN not ** (input_sap_hana.go#L69–L72): The input dsn field is missing .Secret(), exposing embedded credentials in DEBUG logs and the config API. The output connector marks its DSN Secret correctly — same fix needed here.
  2. Nil pointer dereference in poll loop (input_sap_hana.go#L377–L397): dbMut is released during the poll sleep (line 377); Close() can acquire the lock, set s.db = nil, and release it. When ReadBatch re-acquires at line 389, s.db is nil and openRows() panics. The timerFired guard at line 390 is dead code (both other select branches return before reaching line 389). Fix: add if s.db == nil { return nil, nil, service.ErrNotConnected } after line 389.
  3. Benchmark configs missing benchmark: processor and using wrong output sink (saphana-read/bulk/benchmark_config.yaml): All four configs route to kafka_franz instead of drop: {} and have no benchmark: processor. Per docs/benchmarking.md §3, the benchmark processor is required for msg/sec/MB/sec stats and drop: {} is required to isolate connector throughput from output overhead.
  4. docs/benchmark-results/SUMMARY.md missing SAP HANA paragraph (SUMMARY.md#L16–L24): SAP HANA row added to the table but no paragraph under "What These Numbers Mean". Every other connector has one.

@ness-david-dedu

Copy link
Copy Markdown
Contributor Author

Commits

  1. ea5af3be — message starts with two leading spaces ( sap_hana: fix DSN secret lea…). Message must not start with whitespace per commit policy.

Review

Two previous rounds of findings have been resolved (DSN Secret field, poll-loop nil guard, benchmark config gaps, SQL injection, output identifier quoting). Three remaining issues:

  1. quoteIdentifier has no unit test — this is the load-bearing SQL injection prevention function. It correctly doubles embedded " characters per SQL standard, but there is no test exercising the escape path. A test with an input containing a " character would confirm the doubling behaviour under regression. schema.go#L24-L26

  2. Integration tests are missing Given-When-Then t.Log() structure — the project convention (followed by e.g. internal/impl/gcp/enterprise/bigquery/integration_test.go) uses t.Log("Given ...") / t.Log("When ...") / t.Log("Then ...") annotations to make CI log output triageable. None of the seven TestIntegration* functions in integration_test.go follow this pattern.

  3. timerFired guard is unreachable dead code — at input_sap_hana.go#L391-L393 both the ctx.Done() and stopChan cases return before reaching the if !timerFired check, so timerFired is always true when that line executes. The guard can be removed without changing behaviour.

ness-david-dedu and others added 2 commits June 9, 2026 23:07
…fig gaps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… GWT logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@squiidz

squiidz commented Jun 10, 2026

Copy link
Copy Markdown
Contributor
  1. timestamp+incrementing mode silently breaks after the first empty poll — input_sap_hana.go advances s.timestampHWM unconditionally at end of ReadBatch, but s.hwm only
    advances when a row is scanned. After one empty poll the next query binds inc > ? with an empty string against a numeric column. Fix: don't advance timestampHWM past the first
    observed row, or fall back to pure timestamp comparison while s.hwm == "".

  2. HWM round-trips through fmt.Sprintf("%v", v) — input_sap_hana.go:451-455. For a TIMESTAMP incrementing_column (which the docs explicitly suggest), %v on time.Time produces
    2024-01-01 00:00:00 +0000 UTC, which HANA cannot parse. Even for BIGINT/DECIMAL the string→column-type implicit cast is fragile. Keep the raw any and pass it directly to
    QueryContext.

  3. Output's "execMany bulk path" claim is not what the code does — output_sap_hana.go:170-189 calls db.ExecContext with a flat arg slice. That triggers go-hdb's argument-count
    detection but not a true prepared/MtInsert flow. Either prepare the statement once in Connect and cache *sql.Stmt, or remove the "single MtInsert RPC" claim from
    docs/benchmark-results/sap-hana.md.

  4. go.mod has a stray blank line in the // indirect block — go mod tidy will revert it and any tidiness check on CI will fail.

  5. public/components/all/package.go import order — saphana was inserted before salesforce. The block is alphabetised everywhere else; this will fail any goimports/lint
    enforcement.

  - hwm: string→any; store raw scanned value directly instead of
    fmt.Sprintf(%v) — prevents time.Time formatting to unparseable
    string when incrementing_column is a TIMESTAMP
  - timestamp+incrementing openRows: fall back to pure-timestamp
    predicate when hwm==nil (timestampHWM advanced past an empty
    poll but no incrementing value seen yet), preventing nil bind
    against a numeric column on the next query
  - output: prepare INSERT stmt once in Connect, use stmt.ExecContext
    in WriteBatch — makes go-hdb MtInsert path explicit rather than
    relying on undocumented argument-count detection
  - go.mod: remove stray blank line in // indirect block
  - public/components/all: restore alphabetical import order
    (salesforce before saphana)
Upstream added cloud_unsupported_reason column and removed version
column from internal/plugins/info.csv. Updated sap_hana entries to
new schema format and reinserted in correct alphabetical position.
@twmb

twmb commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

The approach here is the right one and matches the scope we defined for SAP HANA: a dedicated polling sap_hana input (bulk / incrementing / query) plus hana driver support in sql_insert / sql_raw for writes. The schema-metadata path (from SYS.TABLE_COLUMNS, addition-only drift detection, precision-aware DECIMAL mapping) is the most important piece and it's in good shape, and the SQL is parameterized with identifier quoting throughout.

There are also a few good ideas in #4490 worth pulling in here — it grows the scope of this PR slightly, but keeps everything in one place.

A few things to address before merge:

Config surface

  • Decide on the connection model: separate connection_url / connection_user / connection_password vs. the single dsn here, plus table_name / incrementing_column_name naming. The spec's own sql_insert/sql_raw examples use dsn: hdb://…, so a single dsn may be the more consistent choice — let's just pick one deliberately.
  • poll_interval defaults to 5s; spec calls for 60s.
  • Missing knobs: numeric_mapping (none / best_fit / best_fit_eager_double), partition_count, max_retries.

Schema metadata (ideas from #4490)

  • Add primary-key discovery — DNM - experimental - feat(saphana): add SAP HANA CDC input connector (Debezium-format, trigger-based) #4490's SYS.INDEX_COLUMNSSYS.INDEXES WHERE CONSTRAINT = 'PRIMARY KEY' query is a clean way to do it; there's no PK info in the emitted schema today.
  • Use real per-column nullability (IS_NULLABLE) instead of marking every field optional.
  • Emit DECIMAL as a canonical decimal string rather than float64. This matches every other CDC input here (oracle / mysql / mongodb / postgres all emit canonical decimal strings via sqlutil.CanonicaliseDecimal), and bare floats get rejected by downstream Avro string/decimal encoding — the current path is both lossy and inconsistent with the rest of the repo.

Behavior

  • The high-water mark is in-memory only, so it doesn't survive a restart — consider persisting it (a cache resource) so incrementing mode resumes without re-reading from the initial value.
  • The dedicated sap_hana output is outside the defined scope (writes go through sql_insert/sql_raw with driver: hana, which you've already wired up). Either drop it, or let's confirm we want it as a deliberate extra.
  • Metadata keys: schema is correct; table_name / database_schema were the agreed names (currently sap_hana_table / sap_hana_schema).

Tests / CI

  - IS_NULLABLE from SYS.TABLE_COLUMNS replaces blanket optional:true
  - Primary-key discovery via SYS.INDEX_COLUMNS ⋈ SYS.INDEXES;
    emitted as primary_key_columns JSON array metadata per message
  - DECIMAL → canonical string (CanonicaliseDecimal / CanonicaliseBigDecimal)
    instead of lossy float64; consistent with oracle/mysql/postgres/mssql
  - numeric_mapping field: none (default) / best_fit / best_fit_eager_double
  - Rename metadata keys: sap_hana_schema → database_schema,
    sap_hana_table → table_name (matches agreed naming)
  - poll_interval default 5s → 60s
  - max_retries and partition_count config fields added
  - License headers on all bench/**/main.go files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants