Skip to content

Bug triage results: 2026-05-18 #4359

@andygrove

Description

@andygrove

Bug triage results: 2026-05-18

Triage pass over the open requires-triage queue, per the project Bug Triage Guide.

  • Total issues processed: 20
  • Labels applied to: 19
  • Skipped: 1
  • priority:high: 2
  • priority:medium: 10
  • priority:low: 7

Labels have already been applied and requires-triage removed from each issue listed under "Triaged". A reviewer should spot-check the calls and close this issue when satisfied. To correct a label, edit the affected issue directly.

Triaged

priority:high

  • AbstractMethodError: CometBroadcastExchangeExec missing sparkContext() from BroadcastExchangeLike (#4318)
    • Area labels: none
    • Rationale: AbstractMethodError thrown on a supported code path (Comet 0.16 + Spark 3.5.6 broadcast joins); per the guide, an unhandled exception on a supported path is priority:high.
  • Windows crash if frame overflow (#4307)
    • Area labels: area:expressions
    • Rationale: Native engine throws CometNativeException on a supported window-function query (the 18446744073709551615 index points to a u64 underflow in frame computation); a native crash on a supported path is priority:high.

priority:medium

  • Allocate Comet's parquet reader buffers from ArrowUtils.rootAllocator to enable zero-copy PyArrow UDF runner (#4294)
    • Area labels: area:scan, area:ffi
    • Rationale: Performance optimization for the columnar Python runner with a working bulk-copy fallback today; matches the guide's "performance regression with workaround" criterion.
  • [FEATURE] Native scan support for VariantType columns (Iceberg + Spark 4.0) (#4295)
    • Area labels: area:scan, native_iceberg_compat, spark 4.0
    • Rationale: Missing native VariantType support causes whole-query fallback; functional gap with Spark fallback workaround is priority:medium.
  • Implement JVM UDFs for all date/time expressions (#4311)
    • Area labels: area:expressions
    • Rationale: Compatibility-feature gap: replace native date/time expressions with JVM UDFs for full Spark parity; functional gap with workaround is priority:medium.
  • Add support for native custom scalar UDFs (#4312)
  • Implement JVM UDFs for JSON expressions (#4313)
    • Area labels: area:expressions
    • Rationale: Adds full Spark-compatible JSON expression support via JVM UDFs; missing-feature gap with workaround is priority:medium.
  • Writes to Apache Iceberg Tables (#4322)
    • Area labels: area:writer, native_iceberg_compat
    • Rationale: New Iceberg write path is a major feature gap with the existing Spark write path as workaround; matches priority:medium.
  • Frequent CI failures for Spark 4.0.2 / JDK 21 (#4327)
    • Area labels: area:ci, spark 4.0
    • Rationale: A flaky CI test would normally be priority:low, but the title says "frequent" failures on the standard Spark 4.0.2 / JDK 21 build; per the guide's escalation rule for CI consistently blocking merges, escalated to priority:medium.
  • Credential Provider Support (#4332)
    • Area labels: area:scan, native_iceberg_compat
    • Rationale: Missing pluggable credential provider for Iceberg scans (only static creds today); functional gap with workaround is priority:medium.
  • Comet JVM UDF implementations cannot be created in spark module (#4336)
    • Area labels: area:expressions
    • Rationale: Module / shading structure prevents implementing UDFs that need spark-module access; broken feature with workaround (place UDFs in common) is priority:medium.
  • Implement TimeType support (#4288)
    • Area labels: area:expressions (existing: EPIC, spark 4.1)
    • Rationale: Issue already carried priority:medium from a prior reviewer; this pass added area:expressions and removed requires-triage.

priority:low

  • Make CI run on the contributor forks (#4289)
    • Area labels: area:ci
    • Rationale: CI infrastructure rework with no functional impact; matches the guide's priority:low "tooling" example.
  • [DISCUSS] Simplify regex engine + incompatibility config model (#4310)
    • Area labels: area:expressions
    • Rationale: Refactor / config-UX discussion with no underlying functional bug; user experience polish is priority:low.
  • Drop support for Spark 3.4 (#4329)
    • Area labels: none
    • Rationale: Project-policy / versioning discussion; tooling-and-process item maps to priority:low.
  • Enable spark.comet.exec.localTableScan.enabled when running Spark SQL tests (#4347)
    • Area labels: spark sql tests
    • Rationale: Test-infrastructure tweak so SQL suites exercise more of Comet; test-only / tooling change is priority:low.
  • native_datafusion: tests asserting parquet-mr's permissive overflow/narrowing behavior cannot be made to pass (#4352)
    • Area labels: area:scan, spark sql tests (existing: native_datafusion)
    • Rationale: Architectural test-only mismatch; the workaround is to re-ignore the affected Spark tests. Test-only with workaround is priority:low.
  • native_datafusion (Spark 3.x): shim's ParquetSchemaConvert translation produces an extra SparkException cause-chain layer (#4354)
    • Area labels: area:scan, native_datafusion
    • Rationale: Behavior difference visible only in Spark SQL test cause-chain assertions; tests stay ignored as a workaround. Test-only failure is priority:low.
  • Change UDF signature to use ColumnarValue rather than raw Arrow types (#4358)
    • Area labels: area:expressions
    • Rationale: Internal API refactor with no user-facing functional bug; matches priority:low for tooling/internal cleanup.

Escalations to consider

  • Frequent CI failures for Spark 4.0.2 / JDK 21 (#4327)
    • Escalated from priority:low (CI flake) to priority:medium per the guide's rule "A priority:low CI flake is blocking PR merges consistently → escalate to priority:medium". The reviewer should confirm whether these failures are in fact blocking merges; if not, downgrade to priority:low.

Skipped — needs more info

  • Bug triage results: 2026-05-11 (#4287)
    • This is the previous triage's summary issue. It is a meta issue, not a bug or feature request, and per this skill's rules ("Do not add labels to the summary issue itself") it should not carry a priority. The reviewer should close it when finished spot-checking the prior pass; requires-triage was left in place since this skill does not modify summary issues.

Notes on label availability

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions