fix: line endings are set based on file #161#339
fix: line endings are set based on file #161#339thompson-tomo wants to merge 6 commits intothlorenz:masterfrom
Conversation
Added a function to detect line endings in content.
| if (lf > crlf && lf > cr) return '\n'; | ||
| if (cr > crlf && cr > lf) return '\r'; | ||
|
|
||
| return '\n'; |
There was a problem hiding this comment.
Should this default be user definable? Cc @AndrewSouthpaw
There was a problem hiding this comment.
That would ease some of what I mention above. Something like --line-ending-default or something, which we can fall back to (and not throw) in cases of mixed line endings.
| if (crlf > lf && crlf > cr) return '\r\n'; | ||
| if (lf > crlf && lf > cr) return '\n'; | ||
| if (cr > crlf && cr > lf) return '\r'; |
There was a problem hiding this comment.
I am not sure about how we should handle files with mixed endings. Perhaps we should do.
| if (crlf > lf && crlf > cr) return '\r\n'; | |
| if (lf > crlf && lf > cr) return '\n'; | |
| if (cr > crlf && cr > lf) return '\r'; | |
| if (lf + cr === 0 && crlf > 0) return '\r\n'; | |
| if (crlf + cr === 0 && lf > 0) return '\n'; | |
| if (crlf + lf === 0 && cr > 0) return '\r'; |
Thoughts @AndrewSouthpaw
There was a problem hiding this comment.
Hmmm this could be a major change, as I mentioned here.
IMO it'd make sense for this to be behind a flag for v2, and change the default to automatically detect for v3.
For files with mixed line endings, I'm leaning towards throwing, because there's not really a reasonable thing to do. Whatever we choose, it could be the wrong behavior for the user, and quite subtle. We can give a flag for them to choose the override (below).
Printing a warning would be another option, but I'm generally a fan of failing fast, especially given that mixed line endings is probably an indication that something is weird and wrong about their file, and it'd be helpful for them to know about it.
Thoughts?
| if (crlf > lf && crlf > cr) return '\r\n'; | ||
| if (lf > crlf && lf > cr) return '\n'; | ||
| if (cr > crlf && cr > lf) return '\r'; |
There was a problem hiding this comment.
Hmmm this could be a major change, as I mentioned here.
IMO it'd make sense for this to be behind a flag for v2, and change the default to automatically detect for v3.
For files with mixed line endings, I'm leaning towards throwing, because there's not really a reasonable thing to do. Whatever we choose, it could be the wrong behavior for the user, and quite subtle. We can give a flag for them to choose the override (below).
Printing a warning would be another option, but I'm generally a fan of failing fast, especially given that mixed line endings is probably an indication that something is weird and wrong about their file, and it'd be helpful for them to know about it.
Thoughts?
| const crlf = (content.match(/\r\n/g) || []).length; | ||
|
|
||
| // Count all \n, then subtract CRLF occurrences | ||
| const lf = (content.match(/\n/g) || []).length - crlf; | ||
|
|
||
| // Count all \r, then subtract CRLF occurrences | ||
| const cr = (content.match(/\r/g) || []).length - crlf; | ||
|
|
||
| if (crlf > lf && crlf > cr) return '\r\n'; | ||
| if (lf > crlf && lf > cr) return '\n'; |
There was a problem hiding this comment.
In the spirit of optimization for large files, we could avoid 3 scans of regex matches like this:
function detectLineEnding(content) {
var crlf = 0, lf = 0, cr = 0;
for (var i = 0; i < content.length; i++) {
if (content[i] === '\r') {
if (content[i + 1] === '\n') { crlf++; i++; }
else { cr++; }
} else if (content[i] === '\n') { lf++; }
}As much as I hate for loops 😅
| if (lf > crlf && lf > cr) return '\n'; | ||
| if (cr > crlf && cr > lf) return '\r'; | ||
|
|
||
| return '\n'; |
There was a problem hiding this comment.
That would ease some of what I mention above. Something like --line-ending-default or something, which we can fall back to (and not throw) in cases of mixed line endings.
| if (options?.toc?.footer?.content) { tocL.push(options.toc.footer.content); } | ||
|
|
||
| toc = tocL.join('\n'); | ||
| wrappedToc = start + '\n' + toc + '\n' + end; |
There was a problem hiding this comment.
I'm guessing this isn't yet done b/c it's still in draft form, but lines like these need to be updated, otherwise \r\n files still get marked as modified:
test('up-to-date toc in \\r\\n file is not marked as transformed', function (t) {
var mdCrlf = [
'<!-- START doctoc generated TOC please keep comment here to allow auto update -->'
, '<!-- DON\'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->'
, '**Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)*'
, ''
, '- [Header](#header)'
, ''
, '<!-- END doctoc generated TOC please keep comment here to allow auto update -->'
, ''
, '## Header'
, 'some content'
, ''
].join('\r\n')
var resCrlf = transform(mdCrlf)
t.same(resCrlf.transformed, false, '\\r\\n version should also be up to date')
t.end()
})
Closes: #161
Closes: #216
Tool now detects what line endings have been used in the content and uses that in the generated toc to strive for consistency.
Blocked by: #337 & #338