Skip to content

sql: deflake TestStatusAPIContentionEvents#162596

Open
dhartunian wants to merge 1 commit intocockroachdb:masterfrom
dhartunian:davidh/push-ytovwxomqzuo
Open

sql: deflake TestStatusAPIContentionEvents#162596
dhartunian wants to merge 1 commit intocockroachdb:masterfrom
dhartunian:davidh/push-ytovwxomqzuo

Conversation

@dhartunian
Copy link
Collaborator

Add detection for duplicate span errors to the test and ignore them since this test only cares about the contention information that's captured while tracing, and not the trace itself.

Also added notes to the two callsites I found that were causing this error to trigger. I confirmed that the test would stop flaking if I commented out the ImportRemoteRecording line in distsql_running.go.

I've chosen for now not to disrupt the application logic because even if I disabled duplicate span
detection, I'd still want it on during tests, and I haven't observed this error in production so it
doesn't seem to be urgent.

Resolves: #160544
Resolves: #160876
Epic: none
Release note: None

Add detection for duplicate span errors to the test
and ignore them since this test only cares about the
contention information that's captured while tracing,
and not the trace itself.

Also added notes to the two callsites I found that
were causing this error to trigger. I confirmed that
the test would stop flaking if I commented out the
`ImportRemoteRecording` line in distsql_running.go.

I've chosen for now not to disrupt the application
logic because even if I disabled duplicate span
detection, I'd still want it on during tests, and I
haven't observed this error in production so it
doesn't seem to be urgent.

Resolves: cockroachdb#160544
Resolves: cockroachdb#160876
Epic: none
Release note: None
@dhartunian dhartunian requested review from a team as code owners February 6, 2026 15:54
@dhartunian dhartunian requested review from yuzefovich and removed request for a team February 6, 2026 15:54
@trunk-io
Copy link
Contributor

trunk-io bot commented Feb 6, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cockroach-teamcity
Copy link
Member

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 9.882m ±1% 9.881m ±1% ~ p=0.838 n=15
allocs/op 7.973k ±1% 7.975k ±0% ~ p=0.340 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/8c18043/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/8c18043009b0206bcf9df9e9ef7a7279d4950c81/bin/pkg_sql_tests benchdiff/8c18043/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/8c18043/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/f275471/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/f2754717ca1ec136f609fdfa5ca779e0fbf120e1/bin/pkg_sql_tests benchdiff/f275471/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/f275471/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=f275471 --new=8c18043 ./pkg/sql/tests
🔴 Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
🔴 sec/op 2.919m ±1% 2.948m ±0% +1.01% p=0.000 n=15
allocs/op 2.101k ±1% 2.102k ±1% ~ p=0.168 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/8c18043/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/8c18043009b0206bcf9df9e9ef7a7279d4950c81/bin/pkg_sql_tests benchdiff/8c18043/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/8c18043/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/f275471/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/f2754717ca1ec136f609fdfa5ca779e0fbf120e1/bin/pkg_sql_tests benchdiff/f275471/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/f275471/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=f275471 --new=8c18043 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 2.666m ±1% 2.671m ±2% ~ p=0.217 n=15
allocs/op 4.145k ±0% 4.145k ±1% ~ p=0.894 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/8c18043/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/8c18043009b0206bcf9df9e9ef7a7279d4950c81/bin/pkg_sql_tests benchdiff/8c18043/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/8c18043/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/f275471/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/f2754717ca1ec136f609fdfa5ca779e0fbf120e1/bin/pkg_sql_tests benchdiff/f275471/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/f275471/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=f275471 --new=8c18043 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/8c18043009b0206bcf9df9e9ef7a7279d4950c81/21756806326-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/f2754717ca1ec136f609fdfa5ca779e0fbf120e1/21756806326-1/\* old/

built with commit: 8c18043009b0206bcf9df9e9ef7a7279d4950c81

@cockroach-teamcity cockroach-teamcity added the X-perf-check Microbenchmarks CI: Added to a PR if a performance regression is detected and should be checked label Feb 6, 2026
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich reviewed 2 files and all commit messages, and made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @dhartunian).


pkg/sql/distsql_running.go line 1685 at r1 (raw file):

	if len(meta.TraceData) > 0 {
		if span := tracing.SpanFromContext(r.ctx); span != nil {
			// Note(davidh): I believe this import can be triggered multiple times

Yes, if the query has a distributed plan, then each remote flow might send meta.TraceData multiple times (depending on what operators we have in that flow), and then we can also have multiple remote flows. This has been the case since forever. (The gateway flows are handled in a special way as of #98831.)

Copy link
Collaborator Author

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhartunian made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @yuzefovich).


pkg/sql/distsql_running.go line 1685 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Yes, if the query has a distributed plan, then each remote flow might send meta.TraceData multiple times (depending on what operators we have in that flow), and then we can also have multiple remote flows. This has been the case since forever. (The gateway flows are handled in a special way as of #98831.)

Do you think I should remove the dupe trace error and log it + continue when writing the trace? Seems like it could get triggered easily even if I haven't seen any evidence.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich made 1 comment.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @alyshanjahani-crl and @dhartunian).


pkg/sql/distsql_running.go line 1685 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Do you think I should remove the dupe trace error and log it + continue when writing the trace? Seems like it could get triggered easily even if I haven't seen any evidence.

Hm, I dunno. Is the problem that we're trying to import the same span multiple times?

I could definitely see that happening, but via children spans - e.g. if we have colrpc.Outbox -> colfetcher.ColBatchScan on the remote node, then the tracing span for ColBatchScan is a child of that for Outbox, and both operators will emit TraceData (the ColBatchScan only for itself while the Outbox for its own span with the ColBatchScan's span as a child). Is this setup problematic?

We could definitely refactor the code on the execution side to guarantee that only remote "root" execution spans are propagated as TraceData if you think it's worth cleaning this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

X-perf-check Microbenchmarks CI: Added to a PR if a performance regression is detected and should be checked

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server/application_api: TestStatusAPIContentionEvents failed server/application_api: TestStatusAPIContentionEvents failed

3 participants