Trying to predict band splitting#15
Conversation
226eb18 to
c2ccc64
Compare
085456e to
5595048
Compare
5595048 to
413ed98
Compare
|
I'm bit confused about the branches. You want a review on this PR in how it affects jmvalin/theta_cleanup, while that branch is used in another PR to main? Should PRs be layered this way because it seems difficult to interpret the final effects to main...? |
|
I just thought it made more sense to already have the changes be one on top of each other, i.e. rather than It's more logical in terms of development, it reduces the amount of conflicts, and it's necessary in some cases where PRs depend on earlier PRs. For example, among the current PRs, #17 requires #16 to work. |
|
If you do a PR on top of an ongoing PR, doesn't this assume that the first PR works, or at least possible changes to the first PR during review would not affect the latter? Is it better to wait for the first PR to merge or you think this is not an issue? |
|
Well, I'm assuming there's more than a 50% chance the PR lands in one form or another. Having all PRs based on main would also mean having to adapt as soon as something lands in main. There's no way to avoid rebase unless you only ever allow a single open PR to exist at any given time. In this case, I stacked the PRs to minimize the expected amount of complications. PRs #14, #15 and #16 are mostly independent (any of them can be applied separately), but they all change the same files, so I think this is simpler. As for PR #17 , that one just cannot exist without #16, so it couldn't be based on main even if I wanted to. |
|
It is clear multiple open PRs must be allowed. Was just trying to learn why stacking is beneficial compared to making PRs to main and merging main changes as they appear. But no problem so far if this way is preferred. |
There was a problem hiding this comment.
It would help if at least a brief description of the logic and PR motivation is included. Please correct if any of this is wrong:
The transient detector decides to split to short frames before MDCT, both for mono and stereo with B = num of subframes. theta = generic rotation angle between two splits. Meaning changes whether split happens in channel, time or freg domain, and different PDFs are used depending.
| Mode | B | Theta Use Domain | split_mem Used? | PDF Type |
|---|---|---|---|---|
| Mono | 1 | Frequency (sub-band split) | NO | Triangular (B0>1, balanced) or Uniform (B0>1, biased) or Uniform (B0=1) |
| Mono | >1 | Time (MDCT temporal split) | YES | Triangular (balanced, split_mem=3) or Uniform (biased, split_mem=1/2) |
| Stereo | 1 | Channel (mid/side spatial) | NO | Special stereo PDF (p0=3 for bias, 1 for rest) |
| Stereo | >1 | Both: Time (MDCT) + Channel (mid/side) | NO | Special stereo PDF (p0=3 for bias, 1 for rest) |
Assuming this, using split_mem to select PDF in the one case seems like it saves bits (given typical audio and the PDFs used). However, should it be used for other cases? Do you disagree with:
| Case | Currently Uses split_mem? | Should Use? | Expected Benefit | Implementation Complexity |
|---|---|---|---|---|
| Mono B=1 (freq) | NO | NO | <0.1% | High (need per-band memory) |
| Mono B>1 (time) | YES | YES | 0.1-0.3% | Low (already implemented) |
| Stereo B=1 (mid/side) | NO | MAYBE | 0.5-1.0% | Medium (need stereo memory) |
| Stereo B>1 (time+channel) | NO | YES | 0.2-0.6% | Low (extend existing code) |
|
Just to clarify one thing, the "stereo" flag is set when the angle itself corresponds to mid/side (exactly one split per band). So the commit still applies to stereo audio when coding the mid and side channels. Basically, when B=1 it means we're splitting along frequency, so there's no reason for one band to have the same split as the previous band. That's why B=1 is already using a triangular pdf. That leaves the stereo (mid/side) split itself. That one may eventually benefit from inter-band memory, but I haven't looked into that yet. |
|
Are you saying that in this PR the case Stereo B>1 is already using split_mem for temporal + stereo PDF for spatial? Did not understand what does "angle itself corresponds to mid/side" mean. |
|
So if you have a stereo input, each band will be coded with oaci_quant_band_stereo(), which computes the mid and side signals. It then calls oaci_compute_theta() with stereo=1 to compute/encode an "angle" theta=atan(|side|/|mid|) as explained in Section 4.5.1 of https://jmvalin.ca/papers/aes135_opus_celt.pdf . |
thirv
left a comment
There was a problem hiding this comment.
OK thanks, I was bit confused by this but now it seems clear. In the future it would be interesting to look at split memory for the channel coding theta, if it is not meaningful now.
|
Thanks for the review. When you say "split memory for the channel coding theta", do you mean adding memory for the theta that's used for mid/side splitting? If so, then yes, it's definitely something to look at in the future. The potential gains are likely a bit smaller just because there's fewer of those angles, but it's probably still worth doing something simple. |
Yes that was the meaning. |
No description provided.