Skip to content

Commit 6912d80

Browse files
committed
Merge branch 'je/doc-data-model'
Add a new manual that describes the data model. * je/doc-data-model: doc: add an explanation of Git's data model
2 parents 3b212a8 + dee8094 commit 6912d80

File tree

4 files changed

+311
-2
lines changed

4 files changed

+311
-2
lines changed

Documentation/Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ MAN7_TXT += gitcli.adoc
5353
MAN7_TXT += gitcore-tutorial.adoc
5454
MAN7_TXT += gitcredentials.adoc
5555
MAN7_TXT += gitcvs-migration.adoc
56+
MAN7_TXT += gitdatamodel.adoc
5657
MAN7_TXT += gitdiffcore.adoc
5758
MAN7_TXT += giteveryday.adoc
5859
MAN7_TXT += gitfaq.adoc

Documentation/gitdatamodel.adoc

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
gitdatamodel(7)
2+
===============
3+
4+
NAME
5+
----
6+
gitdatamodel - Git's core data model
7+
8+
SYNOPSIS
9+
--------
10+
gitdatamodel
11+
12+
DESCRIPTION
13+
-----------
14+
15+
It's not necessary to understand Git's data model to use Git, but it's
16+
very helpful when reading Git's documentation so that you know what it
17+
means when the documentation says "object", "reference" or "index".
18+
19+
Git's core operations use 4 kinds of data:
20+
21+
1. <<objects,Objects>>: commits, trees, blobs, and tag objects
22+
2. <<references,References>>: branches, tags,
23+
remote-tracking branches, etc
24+
3. <<index,The index>>, also known as the staging area
25+
4. <<reflogs,Reflogs>>: logs of changes to references ("ref log")
26+
27+
[[objects]]
28+
OBJECTS
29+
-------
30+
31+
All of the commits and files in a Git repository are stored as "Git objects".
32+
Git objects never change after they're created, and every object has an ID,
33+
like `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
34+
35+
This means that if you have an object's ID, you can always recover its
36+
exact contents as long as the object hasn't been deleted.
37+
38+
Every object has:
39+
40+
[[object-id]]
41+
1. an *ID* (aka "object name"), which is a cryptographic hash of its
42+
type and contents.
43+
It's fast to look up a Git object using its ID.
44+
This is usually represented in hexadecimal, like
45+
`1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`.
46+
2. a *type*. There are 4 types of objects:
47+
<<commit,commits>>, <<tree,trees>>, <<blob,blobs>>,
48+
and <<tag-object,tag objects>>.
49+
3. *contents*. The structure of the contents depends on the type.
50+
51+
Here's how each type of object is structured:
52+
53+
[[commit]]
54+
commit::
55+
A commit contains these required fields
56+
(though there are other optional fields):
57+
+
58+
1. The full directory structure of all the files in that version of the
59+
repository and each file's contents, stored as the *<<tree,tree>>* ID
60+
of the commit's top-level directory
61+
2. Its *parent commit ID(s)*. The first commit in a repository has 0 parents,
62+
regular commits have 1 parent, merge commits have 2 or more parents
63+
3. An *author* and the time the commit was authored
64+
4. A *committer* and the time the commit was committed
65+
5. A *commit message*
66+
+
67+
Here's how an example commit is stored:
68+
+
69+
----
70+
tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a
71+
parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647
72+
author Maya <maya@example.com> 1759173425 -0400
73+
committer Maya <maya@example.com> 1759173425 -0400
74+
75+
Add README
76+
----
77+
+
78+
Like all other objects, commits can never be changed after they're created.
79+
For example, "amending" a commit with `git commit --amend` creates a new
80+
commit with the same parent.
81+
+
82+
Git does not store the diff for a commit: when you ask Git to show
83+
the commit with linkgit:git-show[1], it calculates the diff from its
84+
parent on the fly.
85+
86+
[[tree]]
87+
tree::
88+
A tree is how Git represents a directory.
89+
It can contain files or other trees (which are subdirectories).
90+
It lists, for each item in the tree:
91+
+
92+
1. The *filename*, for example `hello.py`
93+
2. The *file type*, which must be one of these five types:
94+
- *regular file*
95+
- *executable file*
96+
- *symbolic link*
97+
- *directory*
98+
- *gitlink* (for use with submodules)
99+
3. The <<object-id,*object ID*>> with the contents of the file, directory,
100+
or gitlink.
101+
+
102+
For example, this is how a tree containing one directory (`src`) and one file
103+
(`README.md`) is stored:
104+
+
105+
----
106+
100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md
107+
040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src
108+
----
109+
110+
NOTE: In the output above, Git displays the file type of each tree entry
111+
using a format that's loosely modelled on Unix file modes (`100644` is
112+
"regular file", `100755` is "executable file", `120000` is "symbolic
113+
link", `040000` is "directory", and `160000` is "gitlink"). It also
114+
displays the object's type: `blob` for files and symlinks, `tree` for
115+
directories, and `commit` for gitlinks.
116+
117+
[[blob]]
118+
blob::
119+
A blob object contains a file's contents.
120+
+
121+
When you make a commit, Git stores the full contents of each file that
122+
you changed as a blob.
123+
For example, if you have a commit that changes 2 files in a repository
124+
with 1000 files, that commit will create 2 new blobs, and use the
125+
previous blob ID for the other 998 files.
126+
This means that commits can use relatively little disk space even in a
127+
very large repository.
128+
129+
[[tag-object]]
130+
tag object::
131+
Tag objects contain these required fields
132+
(though there are other optional fields):
133+
+
134+
1. The *ID* of the object it references
135+
2. The *type* of the object it references
136+
3. The *tagger* and tag date
137+
4. A *tag message*, similar to a commit message
138+
139+
Here's how an example tag object is stored:
140+
141+
----
142+
object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3
143+
type commit
144+
tag v1.0.0
145+
tagger Maya <maya@example.com> 1759927359 -0400
146+
147+
Release version 1.0.0
148+
----
149+
150+
NOTE: All of the examples in this section were generated with
151+
`git cat-file -p <object-id>`.
152+
153+
[[references]]
154+
REFERENCES
155+
----------
156+
157+
References are a way to give a name to a commit.
158+
It's easier to remember "the changes I'm working on are on the `turtle`
159+
branch" than "the changes are in commit bb69721404348e".
160+
Git often uses "ref" as shorthand for "reference".
161+
162+
References can either refer to:
163+
164+
1. An object ID, usually a <<commit,commit>> ID
165+
2. Another reference. This is called a "symbolic reference"
166+
167+
References are stored in a hierarchy, and Git handles references
168+
differently based on where they are in the hierarchy.
169+
Most references are under `refs/`. Here are the main types:
170+
171+
[[branch]]
172+
branches: `refs/heads/<name>`::
173+
A branch refers to a commit ID.
174+
That commit is the latest commit on the branch.
175+
+
176+
To get the history of commits on a branch, Git will start at the commit
177+
ID the branch references, and then look at the commit's parent(s),
178+
the parent's parent, etc.
179+
180+
[[tag]]
181+
tags: `refs/tags/<name>`::
182+
A tag refers to a commit ID, tag object ID, or other object ID.
183+
There are two types of tags:
184+
1. "Annotated tags", which reference a <<tag-object,tag object>> ID
185+
which contains a tag message
186+
2. "Lightweight tags", which reference a commit, blob, or tree ID
187+
directly
188+
+
189+
Even though branches and tags both refer to a commit ID, Git
190+
treats them very differently.
191+
Branches are expected to change over time: when you make a commit, Git
192+
will update your <<HEAD,current branch>> to point to the new commit.
193+
Tags are usually not changed after they're created.
194+
195+
[[HEAD]]
196+
HEAD: `HEAD`::
197+
`HEAD` is where Git stores your current <<branch,branch>>,
198+
if there is a current branch. `HEAD` can either be:
199+
+
200+
1. A symbolic reference to your current branch, for example `ref:
201+
refs/heads/main` if your current branch is `main`.
202+
2. A direct reference to a commit ID. In this case there is no current branch.
203+
This is called "detached HEAD state", see the DETACHED HEAD section
204+
of linkgit:git-checkout[1] for more.
205+
206+
[[remote-tracking-branch]]
207+
remote-tracking branches: `refs/remotes/<remote>/<branch>`::
208+
A remote-tracking branch refers to a commit ID.
209+
It's how Git stores the last-known state of a branch in a remote
210+
repository. `git fetch` updates remote-tracking branches. When
211+
`git status` says "you're up to date with origin/main", it's looking at
212+
this.
213+
+
214+
`refs/remotes/<remote>/HEAD` is a symbolic reference to the remote's
215+
default branch. This is the branch that `git clone` checks out by default.
216+
217+
[[other-refs]]
218+
Other references::
219+
Git tools may create references anywhere under `refs/`.
220+
For example, linkgit:git-stash[1], linkgit:git-bisect[1],
221+
and linkgit:git-notes[1] all create their own references
222+
in `refs/stash`, `refs/bisect`, etc.
223+
Third-party Git tools may also create their own references.
224+
+
225+
Git may also create references other than `HEAD` at the base of the
226+
hierarchy, like `ORIG_HEAD`.
227+
228+
NOTE: Git may delete objects that aren't "reachable" from any reference
229+
or <<reflogs,reflog>>.
230+
An object is "reachable" if we can find it by following tags to whatever
231+
they tag, commits to their parents or trees, and trees to the trees or
232+
blobs that they contain.
233+
For example, if you amend a commit with `git commit --amend`,
234+
there will no longer be a branch that points at the old commit.
235+
The old commit is recorded in the current branch's <<reflogs,reflog>>,
236+
so it is still "reachable", but when the reflog entry expires it may
237+
become unreachable and get deleted.
238+
239+
the old commit will usually not be reachable, so it may be deleted eventually.
240+
Reachable objects will never be deleted.
241+
242+
[[index]]
243+
THE INDEX
244+
---------
245+
The index, also known as the "staging area", is a list of files and
246+
the contents of each file, stored as a <<blob,blob>>.
247+
You can add files to the index or update the contents of a file in the
248+
index with linkgit:git-add[1]. This is called "staging" the file for commit.
249+
250+
Unlike a <<tree,tree>>, the index is a flat list of files.
251+
When you commit, Git converts the list of files in the index to a
252+
directory <<tree,tree>> and uses that tree in the new <<commit,commit>>.
253+
254+
Each index entry has 4 fields:
255+
256+
1. The *file type*, which must be one of:
257+
- *regular file*
258+
- *executable file*
259+
- *symbolic link*
260+
- *gitlink* (for use with submodules)
261+
2. The *<<blob,blob>>* ID of the file,
262+
or (rarely) the *<<commit,commit>>* ID of the submodule
263+
3. The *stage number*, either 0, 1, 2, or 3. This is normally 0, but if
264+
there's a merge conflict there can be multiple versions of the same
265+
filename in the index.
266+
4. The *file path*, for example `src/hello.py`
267+
268+
It's extremely uncommon to look at the index directly: normally you'd
269+
run `git status` to see a list of changes between the index and <<HEAD,HEAD>>.
270+
But you can use `git ls-files --stage` to see the index.
271+
Here's the output of `git ls-files --stage` in a repository with 2 files:
272+
273+
----
274+
100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md
275+
100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py
276+
----
277+
278+
[[reflogs]]
279+
REFLOGS
280+
-------
281+
282+
Every time a branch, remote-tracking branch, or HEAD is updated, Git
283+
updates a log called a "reflog" for that <<references,reference>>.
284+
This means that if you make a mistake and "lose" a commit, you can
285+
generally recover the commit ID by running `git reflog <reference>`.
286+
287+
A reflog is a list of log entries. Each entry has:
288+
289+
1. The *commit ID*
290+
2. *Timestamp* when the change was made
291+
3. *Log message*, for example `pull: Fast-forward`
292+
293+
Reflogs only log changes made in your local repository.
294+
They are not shared with remotes.
295+
296+
You can view a reflog with `git reflog <reference>`.
297+
For example, here's the reflog for a `main` branch which has changed twice:
298+
299+
----
300+
$ git reflog main --date=iso --no-decorate
301+
750b4ea main@{2025-09-29 15:17:05 -0400}: commit: Add README
302+
4ccb6d7 main@{2025-09-29 15:16:48 -0400}: commit (initial): Initial commit
303+
----
304+
305+
GIT
306+
---
307+
Part of the linkgit:git[1] suite

Documentation/glossary-content.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -297,8 +297,8 @@ This commit is referred to as a "merge commit", or sometimes just a
297297
identified by its <<def_object_name,object name>>. The objects usually
298298
live in `$GIT_DIR/objects/`.
299299
300-
[[def_object_identifier]]object identifier (oid)::
301-
Synonym for <<def_object_name,object name>>.
300+
[[def_object_identifier]]object identifier, object ID, oid::
301+
Synonyms for <<def_object_name,object name>>.
302302
303303
[[def_object_name]]object name::
304304
The unique identifier of an <<def_object,object>>. The

Documentation/meson.build

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,7 @@ manpages = {
193193
'gitcore-tutorial.adoc' : 7,
194194
'gitcredentials.adoc' : 7,
195195
'gitcvs-migration.adoc' : 7,
196+
'gitdatamodel.adoc' : 7,
196197
'gitdiffcore.adoc' : 7,
197198
'giteveryday.adoc' : 7,
198199
'gitfaq.adoc' : 7,

0 commit comments

Comments
 (0)