Optimize DCS performance with batch processing#19640
Conversation
|
This PR is a lot spicier than #19639. I think I did it correctly, but it's probably better to wait until after we released the next WT version. |
| // intentionally, it's far more likely now that we're looking at a broken up sequence. | ||
| // The most common win32-input-mode is ConPTY itself after all, and we never emit | ||
| // \x1b{some char} once it's enabled. | ||
| if (_isEngineForInput) |
There was a problem hiding this comment.
in my work on OSC sequences i realized... I don't think caching actually does anything for the output engine any longer. The cached sequence is never emitted because OutputStateMachineEngine::ActionPassThroughString does nothing.
Which makes sense, actually. Since ConPTY became passthrough mode, there's no reason to cache. We just yeet everything through.
Together with #19640 this bumps the performance on my HW by around 60%. You can now stream 4K images at >60 FPS on a high-end CPU. ## Validation Steps Performed * Used imagemagick to dump large images ✅
src/terminal/parser/stateMachine.cpp
Outdated
| bool StateMachine::IsProcessingLastCharacter() const noexcept | ||
| { | ||
| return _processingLastCharacter; | ||
| return _runEnd == _currentString.size(); | ||
| } |
There was a problem hiding this comment.
I can't do a full review, but I just wanted to note that this part of the code likely needs more work. When the SixelParser calls this method, it needs to know that it's processing the very last character, and not just a buffer that includes the last character.
So I expect that state now needs to be tracked in the SixelParser itself. Where we're looping over the buffer, we may need something like this:
const auto lastCharacterBuffer = _stateMachine.IsProcessingLastCharacter();
for (auto = 0; i < str.length(); i++)
{
_processingLastCharacter = lastCharacterBuffer && (i == str.length() - 1);
_parseCommandChar(str[i]);
}And wherever we're calling IsProcessingLastCharacter we would instead use this local _processingLastCharacter flag instead.
I'm not positive, but I think the SixelParser is the only place that is still using IsProcessingLastCharacter.
There was a problem hiding this comment.
I came back to this today to address your comment. Is tracking lastCharacterBuffer = _stateMachine.IsProcessingLastCharacter() necessary at all at that point? Wouldn't just i == str.length() - 1 alone be a good enough heuristic? It'd be neat, because it would let us fully decouple the sixel- from the VT-parser.
There was a problem hiding this comment.
I've pushed a commit based on this. Since the comments hinted at animations, I tested it with mpv --really-quiet --vo=sixel and it seemed to work fine without stutter and hitches.
Although... this doesn't test animations with the palette color. Do you have a pre-existing test for that? Otherwise, I'd try and write my own.
There was a problem hiding this comment.
That's what I had hoped initially, but I got the impression that you were not actually sending through the full buffer all the time (I may have misunderstood how it now works though). And the risk is that it may trigger an unwanted update that could produce flickering in things like video sequences (this is exacerbated by the fact that a lot of image libraries request an unnecessary background fill, so if we flush before the full image has been received you can end up seeing a large block of some random background color).
That said, it is just a heuristic, and there was always a risk of flickering, so I'm happy to go with whatever is easiest and see if anyone complains. Modern apps can always work around flickering with synchronized update mode if necessary.
There was a problem hiding this comment.
Although... this doesn't test animations with the palette color. Do you have a pre-existing test for that?
Sorry, I didn't initially see this message. There's a palette animation test in my initial PR here (plasma.py): #17421 (comment)
You can also confirm that progressive output is working with this script:
https://github.com/hackerb9/vt340test/blob/main/jerch/endless.sh
It's not that smooth on Windows Terminal, I think because of conpty buffering, but you should still see it updating continuously.
There was a problem hiding this comment.
Thanks for the tests!
https://github.com/hackerb9/vt340test/blob/main/jerch/endless.sh
Hmm weird, that script doesn't work for me at all.
This bumps performance of sixels by around 20%. More importantly, this bumps performance of any future optimized DCS code (e.g. tmux control mode) by practically infinite%, because we can pass through payloads unmodified.
Validation Steps Performed