Skip to content

(way) faster startup on Windows#21089

Open
jsmucr wants to merge 1 commit into
darktable-org:masterfrom
jsmucr:win32-fast-startup
Open

(way) faster startup on Windows#21089
jsmucr wants to merge 1 commit into
darktable-org:masterfrom
jsmucr:win32-fast-startup

Conversation

@jsmucr
Copy link
Copy Markdown
Contributor

@jsmucr jsmucr commented May 21, 2026

I'd like to propose integration of a set of changes aiming at startup speed improvements on Windows. In my case, the result is 4 seconds from file execution to lighttable with an empty database, as opposed to 16-18 seconds before the change.

It's a set of changes, so I didn't squash them intentionally for you to review each of them separately.

The 5x1ms sleep loop in _process_all_gui_events() exists to give
X11/Wayland compositors time to redraw asynchronously. On Windows,
the GDK Win32 backend redraws synchronously and gdk_display_sync()
(called immediately after) flushes the GDI pipeline, making the
sleeps pure waste.
@ralfbrown ralfbrown added scope: performance doing everything the same but faster scope: windows support windows related issues and PR labels May 21, 2026
@ralfbrown
Copy link
Copy Markdown
Collaborator

It seems to me that the big win here is the database transaction. How much time does the prefetching and bypassing Gtk for symbol loading actually save? darktable tries to avoid platform-specific code since we have so few Windows and MacOS contributors and someone will have to maintain that code in the future.

The prefetching really just moves the disk reads on a cold cache from one place in the code to another....

@jsmucr
Copy link
Copy Markdown
Contributor Author

jsmucr commented May 21, 2026

I'll measure that for you separately tomorrow.

As for the Windows-specific optimizations: I understand your concerns. But now you're actively discouraging a potential contributor who's actually using the app on that strange platform nobody cares about (definitely not the people trying to free themselves from the Adobe ecosystem, desperately looking for an alternative). And also suggesting dumping an up to 80% speed improvement. That's crazy.

Sarcasm is my way of dealing with heavy disappointment.

@wpferguson
Copy link
Copy Markdown
Member

But now you're actively discouraging a potential contributor who's actually using the app

He's trying to understand what each improvement gives. Also we are at feature freeze so we need to evaluate whether to merge this now or hold off. Also, someone has to maintain this code, if you can't or wont.

How are we supposed to ask questions so that you don't take offense?

@jsmucr
Copy link
Copy Markdown
Contributor Author

jsmucr commented May 22, 2026

I'm very sorry if it sounded harsh, it wasn't meant to be. My point was that refusing win32 code (thus strictly sticking to principles from the Linux world) may as well get to be the reason why there's little DT devs focused on that platform, effectively creating a vicious circle.

Not the best of my days, though. I'm sorry.

I'll get back to you with the results soon. I'm fighting with caches and limitations of my work setup. Also it seems that the DLL tweak has introduced a race condition.

@jsmucr
Copy link
Copy Markdown
Contributor Author

jsmucr commented May 22, 2026

So... The benchmark kind of surprised me. Here on the work machine, I can't use DT exactly the way I use it at home -- that is entirely from a portable SSD -- so here it was DT on the internal drive and only the data on the portable drive.

Anyway, here it is:

Patch                         1          2         3
----------------------------------------------------
win32-fast-startup~0    17,8603     9,2402    9,0916
win32-fast-startup~1    Crash
win32-fast-startup~2    13,4841     9,2025    9,2473
win32-fast-startup~3    13,3013     9,4312    9,7235
win32-fast-startup~4    25,8784    22,3902   26,3804

So you were correct to question the changes. The biggest culprit was the loop of sleep calls. My apologies. I'm gonna dump the other changes.

@jsmucr jsmucr force-pushed the win32-fast-startup branch from ebf361e to 642c83c Compare May 22, 2026 07:41
@ralfbrown
Copy link
Copy Markdown
Collaborator

It's a general principle of software engineering to minimize alternative code paths when possible to reduce overall maintenance (and testing!). A big enough performance win makes the extra overhead worthwhile. I spent a lot of time a few years ago getting the compiler to autovectorize code so that the hand-written SSE code was no longer faster and could thus be removed, eliminating one of the three separate processing implementations in IOPs (and one of the two requiring some specialized knowledge, the other being OpenCL).

I'm surprised that the sleep calls are the biggest overhead, given that they nominally add up to 5 milliseconds. Since the display_sync call was added after that loop had been in the codebase for a while, we should revisit whether the loop is still needed on Linux as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope: performance doing everything the same but faster scope: windows support windows related issues and PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants