-
Notifications
You must be signed in to change notification settings - Fork 296
docs: add bug triage guide for prioritizing open issues #3812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+203
−0
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| name: Label new issues with requires-triage | ||
|
|
||
| on: | ||
| issues: | ||
| types: [opened] | ||
|
|
||
| permissions: | ||
| issues: write | ||
|
|
||
| jobs: | ||
| add-triage-label: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/github-script@v7 | ||
| with: | ||
| script: | | ||
| await github.rest.issues.addLabels({ | ||
| owner: context.repo.owner, | ||
| repo: context.repo.repo, | ||
| issue_number: context.issue.number, | ||
| labels: ['requires-triage'] | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| <!--- | ||
| Licensed to the Apache Software Foundation (ASF) under one | ||
| or more contributor license agreements. See the NOTICE file | ||
| distributed with this work for additional information | ||
| regarding copyright ownership. The ASF licenses this file | ||
| to you under the Apache License, Version 2.0 (the | ||
| "License"); you may not use this file except in compliance | ||
| with the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, | ||
| software distributed under the License is distributed on an | ||
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations | ||
| under the License. | ||
| --> | ||
|
|
||
| # Bug Triage Guide | ||
|
|
||
| This guide describes how we prioritize and triage bugs in the Comet project. The goal is to ensure | ||
| that the most impactful bugs — especially correctness issues that produce wrong results — are | ||
| identified and addressed before less critical issues. | ||
|
|
||
| ## Priority Labels | ||
|
|
||
| Every bug should have exactly one priority label. When filing or triaging a bug, apply the | ||
| appropriate label from the table below. | ||
|
|
||
| | Label | Color | Description | Examples | | ||
| | ------------------- | ------ | ------------------------------------------------------------------------------------ | --------------------------------------------------------------------- | | ||
| | `priority:critical` | Red | Data corruption, silent wrong results, security vulnerabilities | Wrong aggregation results, FFI data corruption, incorrect cast output | | ||
| | `priority:high` | Orange | Crashes, panics, segfaults, major functional breakage affecting production workloads | Native engine panic, JVM segfault, NPE on supported code path | | ||
| | `priority:medium` | Yellow | Functional bugs, performance regressions, broken features that have workarounds | Missing expression support, writer feature gaps, excessive spilling | | ||
| | `priority:low` | Green | Minor issues, test-only failures, tooling, CI flakes, cosmetic issues | Flaky CI test, build script edge case, documentation generator bug | | ||
|
|
||
| ### How to Choose a Priority | ||
|
|
||
| Use this decision tree: | ||
|
|
||
| 1. **Can this bug cause silent wrong results?** If yes → `priority:critical`. These are the most | ||
| dangerous bugs because users may not notice the incorrect output. | ||
| 2. **Does this bug crash the JVM or native engine?** If yes → `priority:high`. Crashes are | ||
| disruptive but at least visible to the user. | ||
| 3. **Does this bug break a feature or cause significant performance degradation?** If yes → | ||
| `priority:medium`. The user can work around it (e.g., falling back to Spark) but it impacts | ||
| the value of Comet. | ||
| 4. **Everything else** → `priority:low`. Test failures, CI issues, tooling, and cosmetic problems. | ||
|
|
||
| ### Escalation Triggers | ||
|
|
||
| A bug should be escalated to a higher priority if: | ||
|
|
||
| - A `priority:high` crash is discovered to also produce wrong results silently in some cases → | ||
| escalate to `priority:critical` | ||
| - A `priority:medium` bug is reported by multiple users or affects a common workload → consider | ||
| escalating to `priority:high` | ||
| - A `priority:low` CI flake is blocking PR merges consistently → escalate to `priority:medium` | ||
|
|
||
| ## Area Labels | ||
|
|
||
| Area labels indicate which subsystem is affected. A bug may have multiple area labels. These | ||
| help contributors find bugs in their area of expertise. | ||
|
|
||
| | Label | Description | | ||
| | ------------------ | -------------------------------------- | | ||
| | `area:writer` | Native Parquet writer | | ||
| | `area:shuffle` | Shuffle (JVM and native) | | ||
| | `area:aggregation` | Hash aggregates, aggregate expressions | | ||
| | `area:scan` | Parquet scan / data reading | | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar question, since we already support CSV and Iceberg for scans. |
||
| | `area:expressions` | Expression evaluation | | ||
| | `area:ffi` | Arrow FFI / JNI boundary | | ||
| | `area:ci` | CI/CD, GitHub Actions, build tooling | | ||
|
|
||
| The following pre-existing labels also serve as area indicators: `native_datafusion`, | ||
| `native_iceberg_compat`, `spark 4`, `spark sql tests`. | ||
|
|
||
| ## Triage Process | ||
|
|
||
| Every new issue is automatically labeled with `requires-triage` when it is opened. This makes it | ||
| easy to find issues that have not yet been triaged by filtering on that label. Once an issue has | ||
| been triaged, remove the `requires-triage` label and apply the appropriate priority and area labels. | ||
|
|
||
| ### For New Issues | ||
|
|
||
| When a new bug is filed: | ||
|
|
||
| 1. **Reproduce or verify** the issue if possible. If the report lacks reproduction steps, ask | ||
| the reporter for more details. | ||
| 2. **Assess correctness impact first.** Ask: "Could this produce wrong results silently?" This | ||
| is more important than whether it crashes. | ||
| 3. **Apply a priority label** using the decision tree above. | ||
| 4. **Apply area labels** to indicate the affected subsystem(s). | ||
| 5. **Apply `good first issue`** if the fix is likely straightforward and well-scoped. | ||
| 6. **Remove the `requires-triage` label** to indicate triage is complete. | ||
|
|
||
| ### For Existing Bugs | ||
|
|
||
| Periodically review open bugs to ensure priorities are still accurate: | ||
|
|
||
| - Has a `priority:medium` bug been open for a long time with user reports? Consider escalating. | ||
| - Has a `priority:high` bug been fixed by a related change? Close it. | ||
| - Are there clusters of related bugs that should be tracked under an EPIC? | ||
|
|
||
| ### Prioritization Principles | ||
|
|
||
| 1. **Correctness over crashes.** A bug that silently returns wrong results is worse than one that | ||
| crashes, because crashes are at least visible. | ||
| 2. **User-reported over test-only.** A bug hit by a real user on a real workload takes priority | ||
| over one found only in test suites. | ||
| 3. **Core path over experimental.** Bugs in the default scan mode (`native_comet`) or widely-used | ||
| expressions take priority over bugs in experimental features like `native_datafusion` or | ||
| `native_iceberg_compat`. | ||
| 4. **Production safety over feature completeness.** Fixing a data corruption bug is more important | ||
| than adding support for a new expression. | ||
|
|
||
| ## Common Bug Categories | ||
|
|
||
| ### Correctness Bugs (`priority:critical`) | ||
|
|
||
| These are bugs where Comet produces different results than Spark without any error or warning. | ||
| Examples include: | ||
|
|
||
| - Incorrect cast behavior (e.g., negative zero to string) | ||
| - Aggregate functions ignoring configuration (e.g., `ignoreNulls`) | ||
| - Data corruption in FFI boundary (e.g., boolean arrays with non-zero offset) | ||
| - Type mismatches between partial and final aggregation stages | ||
|
|
||
| When fixing correctness bugs, always add a regression test that verifies the output matches Spark. | ||
|
|
||
| ### Crash Bugs (`priority:high`) | ||
|
|
||
| These are bugs where the native engine panics, segfaults, or throws an unhandled exception. | ||
| Common patterns include: | ||
|
|
||
| - **All-scalar inputs:** Some expressions assume at least one columnar input and panic when all | ||
| inputs are literals (e.g., when `ConstantFolding` is disabled) | ||
| - **Type mismatches:** Downcasting to the wrong Arrow array type | ||
| - **Memory safety:** FFI boundary issues, unaligned arrays, GlobalRef lifecycle | ||
|
|
||
| ### Aggregate Planning Bugs | ||
|
|
||
| Several bugs relate to how Comet plans hash aggregates across stage boundaries. The key issue is | ||
| that Spark's AQE may materialize a Comet partial aggregate but then run the final aggregate in | ||
| Spark (or vice versa), and the intermediate formats may not be compatible. See the | ||
| [EPIC #2892](https://github.com/apache/datafusion-comet/issues/2892) for the full picture. | ||
|
|
||
| ### Native Writer Bugs | ||
|
|
||
| The native Parquet writer has a cluster of known test failures tracked as individual issues | ||
| (#3417–#3430). These are lower priority since the native writer is still maturing, but they | ||
| should be addressed before the writer is promoted to production-ready status. | ||
|
|
||
| ## How to Help with Triage | ||
|
|
||
| Triage is a valuable contribution that doesn't require writing code. You can help by: | ||
|
|
||
| - Reviewing new issues and suggesting a priority label | ||
| - Reproducing reported bugs and adding details | ||
| - Identifying duplicate issues | ||
| - Linking related issues together | ||
| - Testing whether old bugs have been fixed by recent changes | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this include other data formats (that we might not support yet) like CSV and Iceberg?