Skip to content

[ATTRIBUTES] Refactor tabular write path to single-pass infer+validate in StoreTabularData #479

@zaeema-n

Description

@zaeema-n

Summary

In opengin/core-api/db/repository/postgres/data_handler.go, StoreTabularData() currently uses a two-pass approach for new tables (schema inference + row validation), and a separate validation path for existing tables.
This issue proposes consolidating this into a single scanner pass that can both infer (when needed) and validate rows, while keeping behavior consistent.

Current Flow (with exact code locations)

File: opengin/core-api/db/repository/postgres/data_handler.go

  • Main entrypoint: StoreTabularData(ctx, entityID, attrName, value)
  • Current helper calls involved:
    • schema.GenerateSchema(...) (new-table path)
    • validateRowsAgainstSchema(...) (both paths)
    • hasNullOnlyColumns(...) (new-table path)
    • isDateTime(...) and isStructpbNull(...) (validation/type helpers)
    • schemaToColumns(...) (DDL generation)

Current behavior inside StoreTabularData(...)

  • If table exists:
    1. Load persisted schema from attribute_schemas
    2. Call validateRowsAgainstSchema(&tabularStruct, &existingSchema)
  • If table does not exist:
    1. Call schema.GenerateSchema(value.Value)
    2. Call hasNullOnlyColumns(schemaInfo)
    3. Call validateRowsAgainstSchema(&tabularStruct, schemaInfo)
    4. Create table via schemaToColumns(schemaInfo)
      This means new-table writes may traverse row data multiple times (inference + validation).

Problem

  • Repeated scans of the same tabular payload in StoreTabularData(...).
  • Type inference rules and validation rules are distributed across helpers, increasing chance of drift.
  • Date/datetime/null handling policy should be consistent between inference and validation.

Proposed Refactor (specific functions/files)

1) Add combined scanner in:

  • opengin/core-api/db/repository/postgres/data_handler.go
    Proposed function:
  • scanTabularRows(data *structpb.Struct, existingSchema *schema.SchemaInfo) (*schema.SchemaInfo, error)
    Responsibilities in one pass:
  • validate row shape,
  • classify cell type,
  • existing-schema mode: validate cell against existingSchema,
  • inference mode: infer column type from first non-null, then validate compatibility for later rows.

2) Centralize shared type logic in same file

Use/introduce helpers in data_handler.go such as:

  • inferCellType(...)
  • compatibility helper (e.g. areInferredTypesCompatible(...))
  • value-vs-schema checker (e.g. valueMatchesType(...))
  • one canonical date/datetime policy (currently tied to isDateTime(...) semantics)

3) Rewire StoreTabularData(...)

In opengin/core-api/db/repository/postgres/data_handler.go:

  • existing table path: fetch schema -> scanTabularRows(..., &existingSchema) -> insert
  • new table path: scanTabularRows(..., nil) -> if inferred schema has any null-only columns, fail -> schemaToColumns(...) -> create + persist schema -> insert

4) Remove redundant helpers if no longer needed

From data_handler.go, remove/simplify:

  • validateRowsAgainstSchema(...)
  • hasNullOnlyColumns(...)
  • any obsolete compatibility helper only used by old flow

Explicit Requirement

For new table creation in StoreTabularData(...): if any column is null across all rows (cannot infer concrete type), return a clear error and do not create table/schema.

Tests to Update (specific file/functions)

File: opengin/core-api/db/repository/postgres/data_handler_test.go

Add/update tests for:

  • existing schema valid rows accepted,
  • existing schema invalid rows rejected with row/column context,
  • new schema inferred correctly,
  • new schema fails when any column is all-null,
  • mixed-type incompatibility behavior under unified rules.

File: opengin/core-api/db/repository/postgres/postgres_client_test.go

Adjust integration-style tabular write tests to align with new scanner-driven flow in StoreTabularData(...).

Validation

  • go test ./db/repository/postgres -run ^$ (compile check)
  • go test ./engine -run ^$ (downstream compile check)
  • Run targeted postgres tabular tests (scanner/inference/validation cases)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions