Skip to content

Comments

Trying to predict band splitting#15

Open
jmvalin wants to merge 1 commit intojmvalin/theta_cleanupfrom
jmvalin/split_mem2
Open

Trying to predict band splitting#15
jmvalin wants to merge 1 commit intojmvalin/theta_cleanupfrom
jmvalin/split_mem2

Conversation

@jmvalin
Copy link
Contributor

@jmvalin jmvalin commented Feb 13, 2026

No description provided.

@jmvalin jmvalin self-assigned this Feb 13, 2026
@jmvalin jmvalin force-pushed the jmvalin/theta_cleanup branch from 226eb18 to c2ccc64 Compare February 17, 2026 22:50
@thirv thirv self-requested a review February 18, 2026 21:42
@thirv
Copy link

thirv commented Feb 19, 2026

I'm bit confused about the branches. You want a review on this PR in how it affects jmvalin/theta_cleanup, while that branch is used in another PR to main? Should PRs be layered this way because it seems difficult to interpret the final effects to main...?

@jmvalin
Copy link
Contributor Author

jmvalin commented Feb 19, 2026

I just thought it made more sense to already have the changes be one on top of each other, i.e.

main -> A -> B -> C -> D

rather than

       /-->A
main -/--> B
      \--> C
       \-> D

It's more logical in terms of development, it reduces the amount of conflicts, and it's necessary in some cases where PRs depend on earlier PRs. For example, among the current PRs, #17 requires #16 to work.

@thirv
Copy link

thirv commented Feb 19, 2026

If you do a PR on top of an ongoing PR, doesn't this assume that the first PR works, or at least possible changes to the first PR during review would not affect the latter? Is it better to wait for the first PR to merge or you think this is not an issue?

@jmvalin
Copy link
Contributor Author

jmvalin commented Feb 19, 2026

Well, I'm assuming there's more than a 50% chance the PR lands in one form or another. Having all PRs based on main would also mean having to adapt as soon as something lands in main. There's no way to avoid rebase unless you only ever allow a single open PR to exist at any given time. In this case, I stacked the PRs to minimize the expected amount of complications. PRs #14, #15 and #16 are mostly independent (any of them can be applied separately), but they all change the same files, so I think this is simpler. As for PR #17 , that one just cannot exist without #16, so it couldn't be based on main even if I wanted to.

@thirv
Copy link

thirv commented Feb 20, 2026

It is clear multiple open PRs must be allowed. Was just trying to learn why stacking is beneficial compared to making PRs to main and merging main changes as they appear. But no problem so far if this way is preferred.

Copy link

@thirv thirv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help if at least a brief description of the logic and PR motivation is included. Please correct if any of this is wrong:

The transient detector decides to split to short frames before MDCT, both for mono and stereo with B = num of subframes. theta = generic rotation angle between two splits. Meaning changes whether split happens in channel, time or freg domain, and different PDFs are used depending.

Mode B Theta Use Domain split_mem Used? PDF Type
Mono 1 Frequency (sub-band split) NO Triangular (B0>1, balanced) or Uniform (B0>1, biased) or Uniform (B0=1)
Mono >1 Time (MDCT temporal split) YES Triangular (balanced, split_mem=3) or Uniform (biased, split_mem=1/2)
Stereo 1 Channel (mid/side spatial) NO Special stereo PDF (p0=3 for bias, 1 for rest)
Stereo >1 Both: Time (MDCT) + Channel (mid/side) NO Special stereo PDF (p0=3 for bias, 1 for rest)

Assuming this, using split_mem to select PDF in the one case seems like it saves bits (given typical audio and the PDFs used). However, should it be used for other cases? Do you disagree with:

Case Currently Uses split_mem? Should Use? Expected Benefit Implementation Complexity
Mono B=1 (freq) NO NO <0.1% High (need per-band memory)
Mono B>1 (time) YES YES 0.1-0.3% Low (already implemented)
Stereo B=1 (mid/side) NO MAYBE 0.5-1.0% Medium (need stereo memory)
Stereo B>1 (time+channel) NO YES 0.2-0.6% Low (extend existing code)

@jmvalin
Copy link
Contributor Author

jmvalin commented Feb 20, 2026

Just to clarify one thing, the "stereo" flag is set when the angle itself corresponds to mid/side (exactly one split per band). So the commit still applies to stereo audio when coding the mid and side channels. Basically, when B=1 it means we're splitting along frequency, so there's no reason for one band to have the same split as the previous band. That's why B=1 is already using a triangular pdf. That leaves the stereo (mid/side) split itself. That one may eventually benefit from inter-band memory, but I haven't looked into that yet.

@thirv
Copy link

thirv commented Feb 20, 2026

Are you saying that in this PR the case Stereo B>1 is already using split_mem for temporal + stereo PDF for spatial?

Did not understand what does "angle itself corresponds to mid/side" mean.

@jmvalin
Copy link
Contributor Author

jmvalin commented Feb 21, 2026

So if you have a stereo input, each band will be coded with oaci_quant_band_stereo(), which computes the mid and side signals. It then calls oaci_compute_theta() with stereo=1 to compute/encode an "angle" theta=atan(|side|/|mid|) as explained in Section 4.5.1 of https://jmvalin.ca/papers/aes135_opus_celt.pdf .
From there, the mid and side signals are normalized and each of them is coded with oaci_quant_band() (which in turns calls oaci_quant_partition()) like any mono signal. Now oaci_quant_partition() decide to split any band in half (recursively) if it requires too many bits. To do that, it also uses oaci_compute_theta() but with stereo=0 this time. In this case, the angle is between the magnitude of the left and right part of the vector.
Now when it comes to coding the angle, the probability of any given angle depends on the context of the split. stereo=1 means that we're splitting mid and side. B==1 means we're splitting along different frequencies, and B>0 means we're splitting along time.

Copy link

@thirv thirv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, I was bit confused by this but now it seems clear. In the future it would be interesting to look at split memory for the channel coding theta, if it is not meaningful now.

@jmvalin
Copy link
Contributor Author

jmvalin commented Feb 23, 2026

Thanks for the review. When you say "split memory for the channel coding theta", do you mean adding memory for the theta that's used for mid/side splitting? If so, then yes, it's definitely something to look at in the future. The potential gains are likely a bit smaller just because there's fewer of those angles, but it's probably still worth doing something simple.

@thirv
Copy link

thirv commented Feb 23, 2026

When you say "split memory for the channel coding theta", do you mean adding memory for the theta that's used for mid/side splitting?

Yes that was the meaning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants