-
Notifications
You must be signed in to change notification settings - Fork 160
lockfile: add holder info file for debugging stale locks #2011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as spam.
This comment was marked as spam.
hcmaATshopify
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would argue for using pid instead of holder everywhere.
Pros of pid
- Precise: The only payload in the file is the PID. Using “pid” makes it 100% clear and avoids ambiguity.
- Conventional: Many tools use
.pidfiles or similar conventions, so the meaning is familiar. - Unambiguous Code: Variables like
pid_fileorread_lock_pid()are explicitly about PIDs, not other info.
lockfile.c
Outdated
| return NULL; | ||
|
|
||
| strbuf_addf(&holder_path, "%s%s", lock_path, LOCK_HOLDER_SUFFIX); | ||
| fd = open(holder_path.buf, O_WRONLY | O_CREAT | O_TRUNC, mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure mode is restrictive enough (match whatever the .lock has and is readable by the creator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does use the same mode parameter as the lock file:
// In lock_file():
lk->tempfile = create_tempfile_mode(filename.buf, mode); // lock file
lk->pid_tempfile = create_lock_pid_file(filename.buf, mode); // PID file - same mode
lockfile.c
Outdated
|
|
||
| if (lock_holder_info_enabled() && | ||
| !read_lock_holder_pid(lock_path.buf, &holder_pid)) | ||
| strbuf_addf(buf, _("Lock is held by process %"PRIuMAX".\n\n"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lock (acquired via file '%s') is being held by process .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this will print after the "Unable to create '/path/to/.git/index.lock': File exists." message, I think it would be redundant.
lockfile.c
Outdated
| strbuf_addf(buf, _("Lock is held by process %"PRIuMAX".\n\n"), | ||
| holder_pid); | ||
|
|
||
| strbuf_addstr(buf, _("Another git process seems to be running in this repository, e.g.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the feature is enabled, we can actually improve this now and be assertive (for instance, if the pid is no longer around, we can say this "lock file is stale and can be removed" or "lock file is valid and held by process "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out kill was already implement and cross platform. Added a liveness check!
| return 0; | ||
| } | ||
|
|
||
| int rollback_lock_file(struct lock_file *lk) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend renaming rollback_lock_file to abort_lock_file for clarity and consistency with Git’s conventions. abort_lock_file is likely the best fit for the Git codebase, as this matches the standard “commit/abort” terminology for transactional semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not new, I just needed to change it from being an inline function to add new functionality.
lockfile.c
Outdated
| strbuf_addf(&holder_path, "%s%s", lock_path, LOCK_HOLDER_SUFFIX); | ||
| fd = open(holder_path.buf, O_WRONLY | O_CREAT | O_TRUNC, mode); | ||
| if (fd >= 0) { | ||
| strbuf_addf(&content, "%"PRIuMAX"\n", (uintmax_t)getpid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe strbuf_addf(&content, "%ld\n", (long)getpid()); since pid_t is generally int or long; cast to long for portability
Are we sure this will work for non-Unix platforms that might be supported by git? In other words, do we need to ifdef this whole thing so we don't break non-Unix builds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking around the codebase it seems (uintmax_t)getpid() + PRIuMAX pattern is the established Git convention for portable PID handling:
Examples:
-
daemon.c:1458- writes PID to file:write_file(pid_file, "%"PRIuMAX, (uintmax_t) getpid());
-
builtin/gc.c:789-790- writes PID to lockfile:strbuf_addf(&sb, "%"PRIuMAX" %s", (uintmax_t) getpid(), my_host);
In terms of cross platform compatibility, I think I got lucky because it was already a first class citizen.
Did a little research and it works across the board.
- getpid() is available on all Git-supported platforms (provided by libc on Unix, emulated on Windows)
- pid_t is defined everywhere (native on Unix, typedef int on Windows via mingw-posix.h)
5452ae9 to
51dbf28
Compare
|
Thanks for doing this @pcasaretto - LGTM! |
51dbf28 to
7068ec0
Compare
|
/preview |
|
Preview email sent as pull.2011.git.1764687711531.gitgitgadget@gmail.com |
|
/submit |
|
Submitted as pull.2011.git.1764688047077.gitgitgadget@gmail.com To fetch this version into To fetch this version to local tag |
|
On the Git mailing list, "D. Ben Knoble" wrote (reply to this): On Tue, Dec 2, 2025 at 10:07 AM Paulo Casaretto via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Paulo Casaretto <pcasaretto@gmail.com>
>
> When a lock file is held, it can be helpful to know which process owns
> it, especially when debugging stale locks left behind by crashed
> processes. Add an optional feature that creates a companion .lock.pid
> file alongside each lock file, containing the PID of the lock holder.
>
> The .lock.pid file is created when a lock is acquired (if enabled), and
> automatically cleaned up when the lock is released (via commit or
> rollback). The file is registered as a tempfile so it gets cleaned up
> by signal and atexit handlers if the process terminates abnormally.
>
> When a lock conflict occurs, the code checks if the PID from the .pid
> file is still running using kill(pid, 0). This allows providing
> context-aware error messages. With PID info enabled:
>
> Lock is held by process 12345. Wait for it to finish, or remove
> the lock file to continue.
>
> Or for a stale lock:
>
> Lock was held by process 12345, which is no longer running.
> Remove the stale lock file to continue.
>
> Without PID info (default):
>
> Another git process seems to be running in this repository.
> Wait for it to finish, or remove the lock file to continue.
>
> The feature is opt-in via GIT_LOCK_PID_INFO=1 environment variable.
>
> Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com>
Sounds interesting. I think by the time I wish I knew what else was
using the lockfile, it's too late for me to alter my environment.
Perhaps (in addition to allowing the environment opt-in) we could
opt-in via configuration? Or is this really only useful, say, on the
server side where the environment is carefully controlled? I don't
relish putting this variable into my environment to take advantage of
something that looks very useful.
Are there downsides that make it necessary to be opt-in? I also
imagine this could be a useful default; occasionally folks at work hit
something similar and ask "what's up with that?"
Only other thing is: just because a process X is running doesn't mean
it was the one holding the lock, right? Since PIDs can be reused.
--
D. Ben Knoble |
|
User |
|
On the Git mailing list, Torsten Bögershausen wrote (reply to this): On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:
> From: Paulo Casaretto <pcasaretto@gmail.com>
>
> When a lock file is held, it can be helpful to know which process owns
> it, especially when debugging stale locks left behind by crashed
> processes. Add an optional feature that creates a companion .lock.pid
> file alongside each lock file, containing the PID of the lock holder.
>
> The .lock.pid file is created when a lock is acquired (if enabled), and
> automatically cleaned up when the lock is released (via commit or
> rollback). The file is registered as a tempfile so it gets cleaned up
> by signal and atexit handlers if the process terminates abnormally.
>
> When a lock conflict occurs, the code checks if the PID from the .pid
> file is still running using kill(pid, 0). This allows providing
> context-aware error messages. With PID info enabled:
>
> Lock is held by process 12345. Wait for it to finish, or remove
> the lock file to continue.
>
> Or for a stale lock:
>
> Lock was held by process 12345, which is no longer running.
> Remove the stale lock file to continue.
>
> Without PID info (default):
>
> Another git process seems to be running in this repository.
> Wait for it to finish, or remove the lock file to continue.
>
> The feature is opt-in via GIT_LOCK_PID_INFO=1 environment variable.
[]
I think that this makes sense.
However, as a frequent user of Git repos hosted on an NFS server
(without any problems in my setup):
Does it make sense to add the hostname here ?
We already have xgethostname() in Git, so that we can diagnose
who/which machine really left a lock.
[] |
|
User |
|
On the Git mailing list, Jeff King wrote (reply to this): On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:
> The .lock.pid file is created when a lock is acquired (if enabled), and
> automatically cleaned up when the lock is released (via commit or
> rollback). The file is registered as a tempfile so it gets cleaned up
> by signal and atexit handlers if the process terminates abnormally.
I'm sympathetic to the goal of this series, and the implementation looks
cleanly done. But I wonder if there might be some system-level side
effects that make these .pid files awkward.
Temporarily having an extra .git/index.lock.pid file is probably not a
big deal. But for other namespaces, like refs, we're colliding with
names that have other meanings. So if we want to update refs/heads/foo,
for example, we'll create refs/heads/foo.lock now. And after your patch,
also refs/heads/foo.lock.pid.
The ".lock" suffix is special, in that we disallow it in a refname and
know to skip it when iterating over loose refs. But for the ".pid"
variant, we run the risk of colliding with a real branch named
"foo.lock.pid", both for reading and writing.
On the writing side, creating foo.lock.pid may accidentally overwrite an
existing branch. This can be mitigated by using O_EXCL when creating the
pid file.
But we can see the writes in the opposite order, which I think can also
lead to data loss. Something like:
- process A wants to write branch "foo", so it holds
refs/heads/foo.lock and now also the matching foo.lock.pid
- process B wants to write branch "foo.lock.pid", so it holds
refs/heads/foo.lock.pid.lock (and the matching pid)
- process B finishes its write and atomically renames
foo.lock.pid.lock into foo.lock.pid. It's expected to overwrite
the existing file there, so now process A's pid file is gone.
- process A finishes its write and tries to clean up its pid file. So
it deletes foo.lock.pid, which contains the actual data from process
B's write.
And now process B's write has been lost (and maybe even the entire
existence of the foo.lock.pid branch, if it was not also present in the
packed-refs file).
On the reading side, anybody iterating refs/heads/ may racily see the
foo.lock.pid file and think it is supposed to be an actual ref. So they
open it, find it contains garbage, and then flag it as an error.
I think both could be mitigated if we disallowed ".lock.pid" as a suffix
in refnames, but that is a big user-facing change.
In the long run, alternate ref stores like reftable won't suffer from
this issue. It's possible there are other spots where we use lockfiles
alongside arbitrarily-named files, though I couldn't think of any.
So I dunno what that means for your patch. I notice that the user has to
enable the feature manually. But it feels more like it should be
selective based on which subsystem is using the lockfile (so refs would
never want it, but other lockfiles/tempfiles might).
-Peff |
|
User |
|
On the Git mailing list, Junio C Hamano wrote (reply to this): Jeff King <peff@peff.net> writes:
> So I dunno what that means for your patch. I notice that the user has to
> enable the feature manually. But it feels more like it should be
> selective based on which subsystem is using the lockfile (so refs would
> never want it, but other lockfiles/tempfiles might).
Or perhaps the way to opt into the feature is to create an empty
file $GIT_DIR/lockfile-audit, and the lockfile subsystem will append
to it every time a lock is taken? We need to ensure that a PID and
pathname formatted into a single record is small enough and O_APPEND
would relieve us from worrying about multi writer races, which may
introduce different kind of complications, though.
|
|
On the Git mailing list, Jeff King wrote (reply to this): On Wed, Dec 03, 2025 at 02:21:11PM -0800, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > So I dunno what that means for your patch. I notice that the user has to
> > enable the feature manually. But it feels more like it should be
> > selective based on which subsystem is using the lockfile (so refs would
> > never want it, but other lockfiles/tempfiles might).
>
> Or perhaps the way to opt into the feature is to create an empty
> file $GIT_DIR/lockfile-audit, and the lockfile subsystem will append
> to it every time a lock is taken? We need to ensure that a PID and
> pathname formatted into a single record is small enough and O_APPEND
> would relieve us from worrying about multi writer races, which may
> introduce different kind of complications, though.
I like a single log much better from a management perspective. I agree
that atomicity is a potential issue, though. I think that even if we
kept it small, network filesystems like NFS do not provide great
guarantees for atomic appends. Something like flock() can work there,
but that's not something we've relied on before.
It also raises questions about reading (do we find pid files in the log
in order to provide more directed advice?) and maintenance (do we ever
clean it up, or just let it grow without bound?).
-Peff |
|
On the Git mailing list, Taylor Blau wrote (reply to this): On Wed, Dec 03, 2025 at 04:16:10PM -0500, Jeff King wrote:
> On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:
>
> > The .lock.pid file is created when a lock is acquired (if enabled), and
> > automatically cleaned up when the lock is released (via commit or
> > rollback). The file is registered as a tempfile so it gets cleaned up
> > by signal and atexit handlers if the process terminates abnormally.
>
> I'm sympathetic to the goal of this series, and the implementation looks
> cleanly done. But I wonder if there might be some system-level side
> effects that make these .pid files awkward.
>
> Temporarily having an extra .git/index.lock.pid file is probably not a
> big deal. But for other namespaces, like refs, we're colliding with
> names that have other meanings. So if we want to update refs/heads/foo,
> for example, we'll create refs/heads/foo.lock now. And after your patch,
> also refs/heads/foo.lock.pid.
>
> The ".lock" suffix is special, in that we disallow it in a refname and
> know to skip it when iterating over loose refs. But for the ".pid"
> variant, we run the risk of colliding with a real branch named
> "foo.lock.pid", both for reading and writing.
Good point. I don't have a strong opinion on whether or not we should
use an append-only log of which PIDs grabbed which lockfiles when versus
tracking them on a per-lock basis. But I wonder if this would be
mitigated by either:
- Keeping the ".lock" suffix as-is, so that holding a lockfile at path
"$GIT_DIR/index.lock" would create "$GIT_DIR/index-pid.lock" or
something similar.
- Introducing a new reference name constraint that treats ".lock.pid"
as a reserved in a manner identical to how we currently treat
".lock".
Between the two, I vastly prefer the former, but see below for more on
why.
> But we can see the writes in the opposite order, which I think can also
> lead to data loss. Something like:
>
> - process A wants to write branch "foo", so it holds
> refs/heads/foo.lock and now also the matching foo.lock.pid
>
> - process B wants to write branch "foo.lock.pid", so it holds
> refs/heads/foo.lock.pid.lock (and the matching pid)
Changing the naming scheme as above would cause us to hold
"foo.pid.lock" in addition to "foo.lock". That would allow process B
here to write branch "foo.lock.pid" (as is the case today). But if the
scenario were instead "process B wants to write branch foo.pid.lock", it
would fail immediately since the ".lock" suffix is reserved.
> I think both could be mitigated if we disallowed ".lock.pid" as a suffix
> in refnames, but that is a big user-facing change.
Yeah, I don't think that we should change the refname constraints here,
especially in a world where reftable deployments are more common. In
that world I think we should err on the side of removing constraints,
not adding them ;-).
> So I dunno what that means for your patch. I notice that the user has to
> enable the feature manually. But it feels more like it should be
> selective based on which subsystem is using the lockfile (so refs would
> never want it, but other lockfiles/tempfiles might).
Yeah, I think that something similar to the "which files do we fsync()
and how?" configuration we have today would be a nice complement here.
(As an aside, I wonder if that interface, too, could be slightly
improved. Right now we have a comma-separated list of values in the
"core.fsync" configuration for listing different "components", and then
a global core.fsyncMethod to either issue a fsync(), or a pagecache
writeback, or writeout-only flushes in batches. It might be nice to have
something like:
[fsync "loose-object"]
method = fsync
[fsync "packfile"]
method = writeout
, so the analog here would be something like:
[lockfile "refs"]
pidfile = false
[lockfile "index"]
pidfile = true
or similar. That could also be represented as core.lockfile=index,
omitting "refs" to avoid tracking it. It may be that people don't really
care to ever use different fsync methods for different fsync-able
components, so perhaps the analogy doesn't hold up perfectly.)
Thanks,
Taylor |
|
On the Git mailing list, Taylor Blau wrote (reply to this): Hi Paulo,
On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:
> The .lock.pid file is created when a lock is acquired (if enabled), and
> automatically cleaned up when the lock is released (via commit or
> rollback). The file is registered as a tempfile so it gets cleaned up
> by signal and atexit handlers if the process terminates abnormally.
(For the purposes of this review, I'll ignore the naming conventions
that are discussed elsewhere in the thread, which I think can be
resolved separately of any technical concerns.)
> diff --git a/Documentation/git.adoc b/Documentation/git.adoc
> index ce099e78b8..6fdd509d34 100644
> --- a/Documentation/git.adoc
> +++ b/Documentation/git.adoc
> @@ -1010,6 +1010,16 @@ be in the future).
> the background which do not want to cause lock contention with
> other operations on the repository. Defaults to `1`.
>
> +`GIT_LOCK_PID_INFO`::
> + If this Boolean environment variable is set to `1`, Git will create
> + a `.lock.pid` file alongside each lock file containing the PID of the
> + process that created the lock. This information is displayed in error
> + messages when a lock conflict occurs, making it easier to identify
> + stale locks or debug locking issues. The PID files are automatically
> + cleaned up via signal and atexit handlers; however, if a process is
> + terminated abnormally (e.g., SIGKILL), the file may remain as a stale
> + indicator. Disabled by default.
Regardless of whether or not we expose this functionality behind an
environment variable or configuration, I think it would be nice to be
able to turn PID tracking on and off for different components (e.g., for
scenarios where you care about who is holding open, say, $GIT_DIR/index,
but not who is creating a lock during ref creation).
If we determined this through an environment variable, I think it would
be reasonable to adopt the convention from the "core.fsync"
configuration and use a comma-separated list. Alternatively, we could
adopt that same convention for a configuration variable, say,
"core.lockfile".
But I think that having this exposed as a per-component setting via
configuration is preferable than a global switch, since callers don't
have to remember to set this variable in their environment to get the
desired effect. Callers that want to opt-in or out of this feature on a
one-off basis can still override the configuration via the top-level
"-c" flag.
> `GIT_REDIRECT_STDIN`::
> `GIT_REDIRECT_STDOUT`::
> `GIT_REDIRECT_STDERR`::
> diff --git a/lockfile.c b/lockfile.c
> index 1d5ed01682..4a694b9c3d 100644
> --- a/lockfile.c
> +++ b/lockfile.c
> @@ -6,6 +6,9 @@
> #include "abspath.h"
> #include "gettext.h"
> #include "lockfile.h"
> +#include "parse.h"
> +#include "strbuf.h"
> +#include "wrapper.h"
>
> /*
> * path = absolute or relative path name
> @@ -71,6 +74,62 @@ static void resolve_symlink(struct strbuf *path)
> strbuf_reset(&link);
> }
>
> +/*
> + * Lock PID file functions - write PID to a .lock.pid file alongside
> + * the lock file for debugging stale locks. The PID file is registered
> + * as a tempfile so it gets cleaned up by signal/atexit handlers.
> + */
> +
> +static int lock_pid_info_enabled(void)
> +{
With the above suggestion, I think this function would get a little more
complicated to instead take a component argument. That's not a huge deal
in and of itself, but callers that *create* a lockfile will have to
somehow pass in the component name when acquiring the lock.
> + return git_env_bool(GIT_LOCK_PID_INFO_ENVIRONMENT, 0);
> +}
> +
> +static struct tempfile *create_lock_pid_file(const char *lock_path, int mode)
> +{
> + struct strbuf pid_path = STRBUF_INIT;
> + struct strbuf content = STRBUF_INIT;
> + struct tempfile *pid_tempfile = NULL;
> + int fd;
> +
> + if (!lock_pid_info_enabled())
> + return NULL;
> +
> + strbuf_addf(&pid_path, "%s%s", lock_path, LOCK_PID_SUFFIX);
> + fd = open(pid_path.buf, O_WRONLY | O_CREAT | O_TRUNC, mode);
I'm not sure whether or not we should pass O_EXCL here, but I think it
depends on the naming convention we pick.
> + if (fd >= 0) {
This is a style preference, but I'd suggest handling the happy path
outside of this conditional if possible by inverting it. Perhaps
something like:
if (fd < 0)
goto out;
strbuf_addf(&content, ...);
if (write_in_full(fd, content.buf, content.len) < 0)
warning_errno(...);
close(fd);
pid_tempfile = register_tempfile(pid_path.buf);
out:
strbuf_release(...);
return pid_tempfile;
> +static int read_lock_pid(const char *lock_path, uintmax_t *pid_out)
> +{
> + struct strbuf pid_path = STRBUF_INIT;
> + struct strbuf content = STRBUF_INIT;
> + int ret = -1;
> +
> + strbuf_addf(&pid_path, "%s%s", lock_path, LOCK_PID_SUFFIX);
> + if (strbuf_read_file(&content, pid_path.buf, LOCK_PID_MAXLEN) > 0) {
> + char *endptr;
> + *pid_out = strtoumax(content.buf, &endptr, 10);
> + if (*pid_out > 0 && (*endptr == '\n' || *endptr == '\0'))
> + ret = 0;
Same note here. I'd suggest setting "ret = 0" during initialization, and
inverting this conditional to:
if (*pid_out <= 0 || (*endptr != '\n' && *endptr != '\0')) {
warning(...);
ret = -1;
goto out;
}
> + else
> + warning(_("malformed lock pid file '%s'"), pid_path.buf);
> + }
> + strbuf_release(&pid_path);
> + strbuf_release(&content);
> + return ret;
> +}
> +
> /* Make sure errno contains a meaningful value on error */
> static int lock_file(struct lock_file *lk, const char *path, int flags,
> int mode)
> @@ -80,9 +139,12 @@ static int lock_file(struct lock_file *lk, const char *path, int flags,
> strbuf_addstr(&filename, path);
> if (!(flags & LOCK_NO_DEREF))
> resolve_symlink(&filename);
> -
> strbuf_addstr(&filename, LOCK_SUFFIX);
> +
These look like stray whitespace changes that were left behind from
development.
> + if (pid_status == 1)
> + strbuf_addf(buf, _("Lock is held by process %"PRIuMAX". "
> + "Wait for it to finish, or remove the lock file to continue"),
> + pid);
> + else if (pid_status == -1)
> + strbuf_addf(buf, _("Lock was held by process %"PRIuMAX", "
> + "which is no longer running. Remove the stale lock file to continue"),
> + pid);
> + else
> + strbuf_addstr(buf, _("Another git process seems to be running in this repository. "
> + "Wait for it to finish, or remove the lock file to continue"));
On one hand I prefer the new "Another git process" message for when we
don't have a PID lockfile. But on the other hand, I think the "If it
still fails, a git process may have crashed..." message is useful for
users who may not be immediately aware of the consequences of simply
removing the lockfile to continue.
I do think the original message is somewhat verbose, so maybe the change
here in the non-PID case is worth doing. What are your thoughts?
> +
> + strbuf_release(&lock_path);
> } else
> strbuf_addf(buf, _("Unable to create '%s.lock': %s"),
> absolute_path(path), strerror(err));
> @@ -207,6 +292,8 @@ int commit_lock_file(struct lock_file *lk)
> {
> char *result_path = get_locked_file_path(lk);
>
> + delete_tempfile(&lk->pid_tempfile);
> +
Do we want to wait to delete the PID file until after we know that we
successfully committed the lockfile?
> +/* Maximum length for PID file content */
> +#define LOCK_PID_MAXLEN 32
Makes sense ;-). This should be plenty of space, since on my system
maximum PID value is 2^22:
$ cat /proc/sys/kernel/pid_max
4194304
> +test_expect_success 'PID info file cleaned up on successful operation when enabled' '
> + git init repo4 &&
> + (
> + cd repo4 &&
> + echo content >file &&
> + env GIT_LOCK_PID_INFO=1 git add file &&
> + # After successful add, no lock or PID files should exist
> + ! test -f .git/index.lock &&
> + ! test -f .git/index.lock.pid
These should be:
test_path_is_missing .git/index.lock &&
test_path_is_missing .git/index.lock.pid
instead of bare "! test -f"'s.
Thanks,
Taylor |
When a lock file is held, it can be helpful to know which process owns
it, especially when debugging stale locks left behind by crashed
processes. Add an optional feature that creates a companion PID file
alongside each lock file, containing the PID of the lock holder.
For a lock file "foo.lock", the PID file is named "foo-pid.lock".
The PID file is created when a lock is acquired (if enabled), and
automatically cleaned up when the lock is released (via commit or
rollback). The file is registered as a tempfile so it gets cleaned up
by signal and atexit handlers if the process terminates abnormally.
When a lock conflict occurs, the code checks for an existing PID file
and, if found, uses kill(pid, 0) to determine if the process is still
running. This allows providing context-aware error messages:
Lock is held by process 12345. Wait for it to finish, or remove
the lock file to continue.
Or for a stale lock:
Lock was held by process 12345, which is no longer running.
Remove the stale lock file to continue.
The feature is controlled via core.lockfilePid configuration, which
accepts per-component values similar to core.fsync:
- none: Disable for all components (default)
- all: Enable for all components
- index, config, refs, commit-graph, midx, shallow, gc, other:
Enable for specific components
Multiple components can be combined with commas:
git config core.lockfilePid index,config
Each lock file call site specifies which component it belongs to,
allowing users fine-grained control over which locks create PID files.
Existing PID files are always read when displaying lock errors,
regardless of the core.lockfilePid setting. This ensures helpful
diagnostics even when the feature was previously enabled and later
disabled.
Signed-off-by: Paulo Casaretto <pcasaretto@gmail.com>
|
/preview |
|
Preview email sent as pull.2011.v2.git.1764871702690.gitgitgadget@gmail.com |
|
/submit |
|
Error: 7068ec0 was already submitted |
7068ec0 to
cfbf3a8
Compare
|
On the Git mailing list, Patrick Steinhardt wrote (reply to this): On Wed, Dec 03, 2025 at 06:19:46PM -0500, Taylor Blau wrote:
> On Wed, Dec 03, 2025 at 04:16:10PM -0500, Jeff King wrote:
> > On Tue, Dec 02, 2025 at 03:07:27PM +0000, Paulo Casaretto via GitGitGadget wrote:
> >
> > > The .lock.pid file is created when a lock is acquired (if enabled), and
> > > automatically cleaned up when the lock is released (via commit or
> > > rollback). The file is registered as a tempfile so it gets cleaned up
> > > by signal and atexit handlers if the process terminates abnormally.
> >
> > I'm sympathetic to the goal of this series, and the implementation looks
> > cleanly done. But I wonder if there might be some system-level side
> > effects that make these .pid files awkward.
> >
> > Temporarily having an extra .git/index.lock.pid file is probably not a
> > big deal. But for other namespaces, like refs, we're colliding with
> > names that have other meanings. So if we want to update refs/heads/foo,
> > for example, we'll create refs/heads/foo.lock now. And after your patch,
> > also refs/heads/foo.lock.pid.
> >
> > The ".lock" suffix is special, in that we disallow it in a refname and
> > know to skip it when iterating over loose refs. But for the ".pid"
> > variant, we run the risk of colliding with a real branch named
> > "foo.lock.pid", both for reading and writing.
>
> Good point. I don't have a strong opinion on whether or not we should
> use an append-only log of which PIDs grabbed which lockfiles when versus
> tracking them on a per-lock basis. But I wonder if this would be
> mitigated by either:
>
> - Keeping the ".lock" suffix as-is, so that holding a lockfile at path
> "$GIT_DIR/index.lock" would create "$GIT_DIR/index-pid.lock" or
> something similar.
>
> - Introducing a new reference name constraint that treats ".lock.pid"
> as a reserved in a manner identical to how we currently treat
> ".lock".
>
> Between the two, I vastly prefer the former, but see below for more on
> why.
Yeah, I agree that this is not really a feasible change for the "files"
reference backend. Besides all the issues mentioned in this thread, we
also have to consider alternative implementations of Git and old
versions. Those wouldn't know that the ".lock.pid" files are special, so
they will misbehave if Git started to write them now.
We could of course work around that issue by introducing a new
repository extension. But I doubt that this is something we want to
pursue in this context. The provided benefit just isn't high enough.
The overall idea still has merit though. So if we still want to pursue
it we can likely work around the issue by introducing a toggle that
allows specific callers to opt out of creating the PID files.
That'd raise the question though when we are most likely to need the PID
files for debugging stuff. From my own experience I only ever had issues
with stale locks in the ref subsystem, so if we disable the mechanism in
exactly that subsystem it may be way less useful.
If references are the main culprit one could also think about a slightly
ugly alternative approach: loose refs handle it just fine if they
contain additional lines. So in theory, we could write a second line for
each lock file that contains the PID. We do have an fsck check that
warns about this, but in theory this should just work. Whether we want
to go there is a different question though.
Patrick |
|
User |
|
On the Git mailing list, Jeff King wrote (reply to this): On Wed, Dec 03, 2025 at 06:19:46PM -0500, Taylor Blau wrote:
> Changing the naming scheme as above would cause us to hold
> "foo.pid.lock" in addition to "foo.lock". That would allow process B
> here to write branch "foo.lock.pid" (as is the case today). But if the
> scenario were instead "process B wants to write branch foo.pid.lock", it
> would fail immediately since the ".lock" suffix is reserved.
I agree that this gets rid of any corruption or race issues, since all
versions understand how to handle ".lock" specially (and assuming we
create foo.pid.lock with O_EXCL). But there is still a namespace
collision; I cannot write refs "foo" and "foo.pid" at the same time.
> > So I dunno what that means for your patch. I notice that the user has to
> > enable the feature manually. But it feels more like it should be
> > selective based on which subsystem is using the lockfile (so refs would
> > never want it, but other lockfiles/tempfiles might).
>
> Yeah, I think that something similar to the "which files do we fsync()
> and how?" configuration we have today would be a nice complement here.
I'm not sure it makes sense for the user to configure this. I more meant
that the ref files-backend code would set a flag for "no, do not create
a pid file for me ever" (or inversely, other bits of the code would
add a flag for "yes, it's OK to do so").
-Peff |
Changes in v2
Changed file naming: From
foo.lock.pidtofoo-pid.lockto avoidcollision with refs namespace (where
refs/foo.lock.pidwould be valid)Replaced environment variable with config: Now uses
core.lockfilePidconfiguration instead of
GIT_LOCK_PID_INFOenvironment variableAdded per-component configuration: Similar to
core.fsync, users canenable PID files for specific components:
none(default),allindex,config,refs,commit-graph,midx,shallow,gc,otherCaller-specified components: Each call site specifies which component it
belongs to, giving users fine-grained control. Please help by reviewing the categories
specially the catch all OTHER
Always read existing PID files: When displaying lock errors, existing
PID files are read regardless of config setting, ensuring helpful diagnostics
PID-only for now: Not adding hostname for simplicity.
Should we maybe make the file contents extensible?
CC: Taylor Blau me@ttaylorr.com
cc: "D. Ben Knoble" ben.knoble@gmail.com
cc: Torsten Bögershausen tboegi@web.de
cc: Jeff King peff@peff.net
cc: "Paulo Casaretto (Shopify)" paulo.casaretto@shopify.com
cc: Patrick Steinhardt ps@pks.im