Improved symbol handling by threeseed · Pull Request #1942 · sonos/tract

threeseed · 2026-01-12T03:52:45Z

Based on benchmarks the following performance improvements:

LinearClassifier / 100,000 - 94.7%
LinearClassifier / 1,000,000 - 13.7%

LinearRegressor / 100,000 - 48.3%
LinearRegressor / 1,000,000 - 7.07%

kali · 2026-01-12T07:39:00Z

If I get this right, you "just" replaced the locking structure, right ? The previous iteration was adding an extra cache, but this PR does not, correct ?

threeseed · 2026-01-12T20:26:57Z

If I get this right, you "just" replaced the locking structure, right ? The previous iteration was adding an extra cache, but this PR does not, correct ?

Yes. We did have a PR for symbol cache which we are still looking into.

But this is about locking and the string interning micro optimisation.

kali · 2026-01-12T21:46:11Z

Pleasse make sure clippy is happy, and we'll merge this.

kali · 2026-01-14T13:40:09Z

I think we still have a problem though, some expression will trigger a dead lock (which put the CI in trouble, but that's on me, i need to make it more resilient i guess).

In tract/data, cargo test dim::assertion::tests::low_bound_1 shows the deadlock (a couple other tests in data also do).

prove_positive_or_zero_inner was filter_map'ing as_known_positive over every assertion on each call. The result is stable for base assertions; cache the positive form at add_assertion time. Only the scenario-extra assertions are still filtered per-call. Note: as_known_positive() recurses through TDim simplify() which re-enters the scope lock, so it must be called before the lock is acquired. Cherry-picked from PR #1942.

SymbolValues::get is called on every TDim::eval; hashing a Symbol (which hashes its narrow SymbolU32 id) is dominant. FxHashMap's faster hash pays off in the hot path. Cherry-picked from PR #1942.

Previously cloned into a fresh Vec on every call; with the proof cache hitting all_assertions from multiple places during inference, Arc's shared ownership avoids repeated heap churn. Cherry-picked from PR #1942.

Tiny allocation reduction: the retry loop no longer builds a fresh String on every iteration. Cherry-picked from PR #1942.

SimplePlan::resolve calls guess_scenario on every set_input; models without scenarios (the common case) still paid a full scope lock just to observe that scenarios is empty. Under 32-way parallel inference that acquire/release was the dominant bottleneck (100% of scope-lock acquisitions counted during a LinearClassifier benchmark). Wrap the scope's Arc<ReentrantMutex<...>> in a SymbolScopeInner that also carries a lock-free `has_scenarios` AtomicBool, updated by add_scenario / add_scenario_assertion. guess_scenario checks it first and returns immediately when the scope has no scenarios. Deref on the wrapper keeps the `scope.0.lock()` call pattern working unchanged. LinearClassifier @ 100K runs, 32 threads: 100ms → 26ms. Now beats the PR #1942 RwLock approach (36ms) without the deadlock, since no lock at all beats a concurrent-reader lock.

kali · 2026-04-23T13:49:53Z

Thanks for this PR and the accompanying benchmark — the bench itself was invaluable and is what made the underlying contention tractable to track down. I'm going to close
this in favour of a narrower fix, but wanted to share what we landed and why.

Root cause. On the LinearClassifier bench at high thread counts, every set_input was hitting two locked paths in SymbolScope:

TDim::to_i64() was being used inside plan.rs::resolve as a "is this concrete?" probe. On a symbolic TDim it constructed an error containing self.to_string(), which
formats through Symbol::Display and takes the scope lock. (This was regressed by 106161a in January; swapping to the already-existing as_i64() -> Option avoids any
allocation and any lock.)
guess_scenario() acquired the scope lock just to observe that scenarios is empty — the common case.

Numbers (100K parallel runs, 32 threads, Ryzen 9 9950X3D):

  ┌───────────────────────────────┬─────┐
  │                               │ ms  │
  ├───────────────────────────────┼─────┤
  │ main today                    │ 197 │
  ├───────────────────────────────┼─────┤
  │ this PR if it didn't deadlock │ 36  │
  ├───────────────────────────────┼─────┤
  │ with the two fixes above      │ 26  │
  └───────────────────────────────┴─────┘

The lock-free fast path actually edges out the RwLock approach — no lock beats a concurrent-reader lock when the contended work is trivial.

Why not this PR directly.

The ReentrantMutex<RefCell<_>> → parking_lot::RwLock swap deadlocks on any path that re-enters the scope lock while a guard is held (dim::assertion::tests::low_bound_1 is
the reproducer). RwLock isn't reentrant, and several hot paths — TDim arithmetic calling simplify(), Symbol::Display, the proof-cache path — re-enter transitively. That's
fundamental to the approach rather than a localised bug.
For the rest of the PR I benched each change individually against the bench suite included here. DefaultStringInterner → StringInterner is a no-op (both
resolve to the same type in string-interner 0.19 and 0.15). FxHashMap on SymbolValues, the positives_cache, Arc<[Assertion]>, and the String::with_capacity buffer reuse all
stayed within ±1–3 ms on every thread count. They may well pay off in other workloads, but I didn't want to carry them in without a workload where they pull their weight.

Replacement is three commits on #2151 :

plan: use as_i64 probe instead of to_i64 in shape resolve
deps: workspace-ize parking_lot
sym: lock-free fast path for guess_scenario on empty scope

Thanks again — the problem this PR exposed was real and worth fixing.

SimplePlan::resolve calls guess_scenario on every set_input; models without scenarios (the common case) still paid a full scope lock just to observe that scenarios is empty. Under 32-way parallel inference that acquire/release was the dominant bottleneck (100% of scope-lock acquisitions counted during a LinearClassifier benchmark). Wrap the scope's Arc<ReentrantMutex<...>> in a SymbolScopeInner that also carries a lock-free `has_scenarios` AtomicBool, updated by add_scenario / add_scenario_assertion. guess_scenario checks it first and returns immediately when the scope has no scenarios. Deref on the wrapper keeps the `scope.0.lock()` call pattern working unchanged. LinearClassifier @ 100K runs, 32 threads: 100ms → 26ms. Now beats the PR #1942 RwLock approach (36ms) without the deadlock, since no lock at all beats a concurrent-reader lock.

Improved symbol handling

b8b16fa

kali added a commit that referenced this pull request Apr 23, 2026

data/sym: reuse buffer in new_with_prefix

543518e

Tiny allocation reduction: the retry loop no longer builds a fresh String on every iteration. Cherry-picked from PR #1942.

kali closed this Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved symbol handling#1942

Improved symbol handling#1942
threeseed wants to merge 1 commit intosonos:mainfrom
threeseed:sym

threeseed commented Jan 12, 2026

Uh oh!

kali commented Jan 12, 2026

Uh oh!

threeseed commented Jan 12, 2026

Uh oh!

kali commented Jan 12, 2026

Uh oh!

kali commented Jan 14, 2026

Uh oh!

kali commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

threeseed commented Jan 12, 2026

Uh oh!

kali commented Jan 12, 2026

Uh oh!

threeseed commented Jan 12, 2026

Uh oh!

kali commented Jan 12, 2026

Uh oh!

kali commented Jan 14, 2026

Uh oh!

kali commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants