-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathdraft-moore-text-diff-00.txt
More file actions
560 lines (322 loc) · 17.8 KB
/
draft-moore-text-diff-00.txt
File metadata and controls
560 lines (322 loc) · 17.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
Internet Engineering Task Force J. Moore
Internet-Draft July 18, 2012
Intended status: Standards Track
Expires: January 19, 2013
The text/diff Media Type
draft-moore-text-diff-00
Abstract
The "unified diff format" is commonly used to represent changes from
one version of a file to another or to represent a comparison of two
files. This document captures the definition of the unified diff
format as the text/diff media type for the purpose of registering
this format formally in the IANA standards tree for media types.
Status of this Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 19, 2013.
Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Moore Expires January 19, 2013 [Page 1]
Internet-Draft Abbreviated Title July 2012
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1. Permitted Character Sets . . . . . . . . . . . . . . . . . 3
2.2. Diff Contexts . . . . . . . . . . . . . . . . . . . . . . 4
2.3. Grammar . . . . . . . . . . . . . . . . . . . . . . . . . 5
3. Security Considerations . . . . . . . . . . . . . . . . . . . 7
3.1. Out-of-scope attacks . . . . . . . . . . . . . . . . . . . 7
3.2. In-scope attacks . . . . . . . . . . . . . . . . . . . . . 7
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9
6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6.1. Normative References . . . . . . . . . . . . . . . . . . . 9
6.2. Informative References . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 10
Moore Expires January 19, 2013 [Page 2]
Internet-Draft Abbreviated Title July 2012
1. Introduction
The "unified diff format" is a commonly-used file format typically
used for exchanging patch changesets to a set of documents--most
notably, source code trees--or for comparing the contents of two
distinct files. This format is generated and understood by a variety
of open source tools including diff, patch, svn, and git. However,
at the time of this writing, no definitive, formal definition of this
file format exists, although there are numerous informal definitions,
including a Wikipedia article about diff [WIKIPEDIA], a blog post
[VANROSSUM] by Guido van Rossum, and the documentation for GNU
diffutils [GNU].
At the same time, the HTTP PATCH method has been standardized as RFC
5789 [RFC5789], for which this file format would be an ideal media
type. There are additionally other Internet-based services that
trade regularly in patches or diffs that might benefit specifically
from having participants explicitly declare that exchanged patchsets
conform to a specific file format.
Therefore, the goals of this document are first, to record a formal
definition of the file format for reference purposes, and second, to
request the IANA registration of a standard media type for this
format.
1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
2. Definition
As a member of the "text/*" media type family, the text/diff format
follows the requirements for textual messages laid out in RFC 2046
[RFC2046], including the use of a "charset" parameter with a default
value of US-ASCII and the use of CRLF as a linebreak.
2.1. Permitted Character Sets
The unified diff format relies on the use of certain US-ASCII
characters to delineate change hunks and context. Therefore, the
value of the "charset" parameter MUST reference a character set which
is a superset of US-ASCII. Permitted character sets specifically
include:
Moore Expires January 19, 2013 [Page 3]
Internet-Draft Abbreviated Title July 2012
o US-ASCII
o those in the ISO-8859-* family
o UTF-8
Other character sets that are supersets of US-ASCII MAY be specified
as values of the "charset" parameter. If the parameter is omitted
then a default value of US-ASCII MAY be assumed.
Generally speaking, this format is useful for describing changesets
against files whose original and modified character set encodings are
both supersets of US-ASCII. Implementations generating text/diff
SHOULD follow the guidelines of RFC 2046 [RFC2046] and use the
"lowest common denominator" for the "charset" parameter value. For
example, it is possible that the original and modified versions of a
file might use the ISO-8859-1 character set, but the changed portions
only use US-ASCII characters: in this case the resulting diff should
be marked as using the "us-ascii" character set. An implementation
MUST NOT include characters in a text/diff file that are not
compatible with the value of the "charset" parameter.
The requirement to have a single character set for a text/diff file
means that there are certain change operations this format cannot
express, for example the conversion of a file from ISO-8859-1 to
UTF-8 where non-US-ASCII characters are used in both the old and new
versions. In particular, this means that not all output of the
common UN*X diff utility--even when generating the unified diff
format--will be a valid text/diff file. This media type is aimed
instead at the very common case where the character sets of the old
and new file are compatible with these requirements and can thus be
treated as a text/* type.
2.2. Diff Contexts
Diffs are typically generated from a specific context; for example,
from a specific directory on a file system, or based against a
particular version of a source control repository. Diffs are
likewise applied against a specific context, and often this context
is different than the generation context. For example, the diff may
be applied against a different directory on the same machine, or
against a directory on an entirely different filesystem, or against a
different branch of a codebase. Indeed, the main purpose of diffs
are to make a portable description of a set of document changes.
The intended application context for diffs is carried out-of-band
from the diffs themselves in common usage, and in fact is sometimes
completely implicit, as with patchsets sent on a developer mailing
Moore Expires January 19, 2013 [Page 4]
Internet-Draft Abbreviated Title July 2012
list for an open source project. As such, this media type
description will reference a particular context without specifying it
explicitly; instead, a recipient fixes the context when attempting to
apply the diff to that context. Insofar as the same diff can be
applied to different contexts--albeit to varying degrees of success
and to varying results--the meaning of the diff itself is not
dependent on a particular context.
For the remainder of this specification, we will assume that this
"diff context" can be specified as a URI [RFC3986].
2.3. Grammar
Formal syntax in this document is governed by the ABNF (Augmented
Backus-Naur Form) notation [RFC5234], including the following core
ABNF syntax rules defined by that specification: DIGIT (decimal
digits), HTAB (horizontal tab), SP (space), CR (carriage return), and
LF (linefeed). This syntax also incorporates by reference the "URI-
reference" ABNF production from RFC 3986 [RFC3986].
A "diff file" captures a set of instructions describing a particular
set of document changes, often known as a "patchset" or a
"changeset". It consists of one or more "document diffs", each
document diff describing changes to a single document. A document
diff consists of a "header" identifying the target document, usually
relative to the overall diff context, plus one or more "change hunks"
or "chunks" specifying additions, deletions, or edits of particular
lines in the target document.
year = 4DIGIT
month = 2DIGIT
day = 2DIGIT
date = year "-" month "-" day
hour = 2DIGIT
minute = 2DIGIT
second = 2DIGIT
nanosecond = 9DIGIT
time = hour ":" minute ":" second [ "." nanosecond ]
timezone = ("+" / "-") hour minute
resource-modification = URI-reference HTAB date SP time SP timezone
old-indicator = "---"
new-indicator = "+++"
old-file-header = old-indicator SP resource-modification
new-file-header = new-indicator SP resource-modification
file-headers = old-file-header CRLF new-file-header CRLF
Moore Expires January 19, 2013 [Page 5]
Internet-Draft Abbreviated Title July 2012
After the header, a diff file contains a set of chunks. Each chunk
begins with a header identifying the relevant selection of lines from
the old and new files.
line-number = 1*DIGIT
range = line-number [ "," line-number ]
chunk-header = "@@" SP "-" range SP "+" range SP "@@"
After the chunk header, the chunk includes a set of lines that are
marked as being either common, only in the old file, or only in the
new file.
common = SP
old-only = "-"
new-only = "+"
line-affinity = common / old-only / new-only
non-linebreak = %x00-09 / %x0B-0C / %x0E-FF
non-linefeed = %x00-09 / %x0B-FF
content = non-linebreak / (CR non-linefeed)
chunk-line = line-affinity 1*content CRLF
Thus, a chunk consists of a chunk header followed by at least one
chunk line.
chunk = chunk-header CRLF 1*chunk-line
Therefore, a single file diff is a header followed by at least one
chunk.
diff = headers 1*chunk
And lastly, a patch consists of a non-empty set of single file diffs.
A file matching the "patch" production is a valid text/diff file.
patch = 1*diff
Moore Expires January 19, 2013 [Page 6]
Internet-Draft Abbreviated Title July 2012
3. Security Considerations
3.1. Out-of-scope attacks
Since this document describes a media type and not a protocol,
network-based attacks are out of scope for this specification. In
particular, eavesdropping, replay, message insertion, deletion,
modification, and man-in-the-middle attacks are the concern of the
containing or transporting protocol and not of the media type itself.
This specification makes no claims about and does not provide
facilities for authentication, authorization, file integrity, or non-
repudiation.
3.2. In-scope attacks
As a subtype of the "text/*" media type, text/diff shares many of the
security concerns related to processing any arbitrary text body. In
particular, implementators SHOULD be aware that this specification
does not impose a maximum line length, and that implementations that
assume a fixed maximum line length may be subject to buffer overrun
attacks. Implementations should also be aware of denial-of-service
or buffer overrun attacks involving extraordinarily large text/diff
files. Again, however, file framing would be a concern of the
containing or transporting protocol.
In addition, text/diff files may contain control characters that may
alter an operating environment when displayed to a terminal (or
otherwise). RFC 3023 [RFC3023] contains a list of possible attacks
and recommended defenses against these attacks in its Security
Considerations section.
The contents of a text/diff file describe a set of changes that may
be applied to a file to alter its contents, and hence implementations
that process text/diff patches SHOULD be sure that the sender is
properly authenticated and authorized before applying patches. To
the extent that patches are automatically applied, recipients of
text/diff entities may be at risk. In general, it may be possible to
specify changesets that perform unauthorized file changes and/or make
changes to the recipient's system that may affect subsequent
processing.
Some security considerations will vary by application domain. For
example, text/diff patches intended to update financial records may
have more stringent privacy, security, and auditing requirements than
a publicly-disseminated file showing the source code differences
between two releases of an open source software project.
Moore Expires January 19, 2013 [Page 7]
Internet-Draft Abbreviated Title July 2012
4. IANA Considerations
This memo includes a request to IANA to register the text/diff media
type.
Type name: text
Subtype name: diff
Required parameters: none
Optional parameters:
charset
This parameter has identical semantics to the "charset"
parameter of the "text/plain" media type as specified in RFC
2046 [RFC2046]. Values are restricted to charsets which are
supersets of US-ASCII.
Encoding considerations:
The syntax of the text/diff format is expressed over 8-bit octets.
The encoding of the file is as specified by the charset parameter,
which defaults to US-ASCII if not provided.
Security considerations: see Section 3.
Interoperability considerations: TBD
Published specification: this document.
Applications that use this media type: diff, patch, svn, git.
Additional information:
Magic number(s): none
File extension(s): .patch, .diff
Macintosh file type code(s): TEXT
Person & email address to contact for more information: authors of
this document.
Intended usage: COMMON
Restrictions on usage: none
Moore Expires January 19, 2013 [Page 8]
Internet-Draft Abbreviated Title July 2012
Author: authors of this document.
Change controller: authors of this document.
5. Acknowledgements
The author wishes to thank those who have done the hard work of
developing the algorithms and tools that form the de facto standard
of patchset exchange on the Internet today. In particular, he would
like to recognize Douglas McIlroy and Larry Wall for the initial diff
and patch utilities as well as the numerous contributors that have
maintained and updated them over the years.
6. References
6.1. Normative References
[RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet Mail
Extensions (MIME) Part Two: Media Types", RFC 2046,
November 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
Resource Identifier (URI): Generic Syntax", STD 66,
RFC 3986, January 2005.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008.
6.2. Informative References
[GNU] GNU, Free Software Foundation, "GNU diffutils manual,
Detailed Description of Unified Format", 2012, <http://
www.gnu.org/software/diffutils/manual/html_node/
Detailed-Unified.html#Detailed-Unified>.
[RFC3023] Murata, M., St. Laurent, S., and D. Kohn, "XML Media
Types", RFC 3023, January 2001.
[RFC5789] Dusseault, L. and J. Snell, "PATCH Method for HTTP",
RFC 5789, March 2010.
[VANROSSUM]
van Rossum, "Unified Diff Format", June 2006,
Moore Expires January 19, 2013 [Page 9]
Internet-Draft Abbreviated Title July 2012
<http://www.artima.com/weblogs/
viewpost.jsp?thread=164293>.
[WIKIPEDIA]
Wikipedia, "diff", 2012,
<http://en.wikipedia.org/wiki/Diff#Unified_format>.
Author's Address
Jonathan Moore
Philadelphia, PA
USA
Email: jonm@jjmoore.net
Moore Expires January 19, 2013 [Page 10]