Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,8 @@ See [#2611](https://github.com/Rdatatable/data.table/issues/2611) for details. T

27. `dogroups()` no longer reads beyond the resized end of over-allocated data.table list columns, [#7486](https://github.com/Rdatatable/data.table/issues/7486). While this didn't crash in practice, it is now explicitly checked for in recent R versions (r89198+). Thanks @TimTaylor and @aitap for the report and @aitap for the fix.

28. `fread()` with `skip=0` and `(header=TRUE|FALSE)` no longer skips the first row when it has fewer fields than subsequent rows, [#7463](https://github.com/Rdatatable/data.table/issues/7463). Thanks @emayerhofer for the report and @ben-schwen for the fix.

### NOTES

1. The following in-progress deprecations have proceeded:
Expand Down
7 changes: 7 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21896,3 +21896,10 @@ DT = data.table(x = strings)
setorder(DT, x)
test(2350, DT[["x"]], sort.int(strings, method='radix'))
rm(DT, strings)

# fread dont skip on skip=0, #7463
txt = 'a1;a2\nb1;b2;b3\nc1;c2;c3'
test(2351.1, fread(txt, skip=0), data.table(V1 = c("b1", "c1"), a1 = c("b2", "c2"), a2 = c("b3", "c3")), warning="Added an extra default column name")
test(2351.2, fread(txt, skip=0, header=TRUE), data.table(V1 = c("b1", "c1"), a1 = c("b2", "c2"), a2 = c("b3", "c3")), warning="Added an extra default column name")
test(2351.3, fread(txt, skip=0, header=FALSE), data.table(V1=character(), V2=character(), V3=character()), warning="Consider fill=TRUE")
test(2351.4, fread(txt, skip=0, fill=TRUE), data.table(V1 = c("a1", "b1", "c1"), V2 = c("a2", "b2", "c2"), V3 = c("", "b3", "c3")))
6 changes: 3 additions & 3 deletions src/fread.c
Original file line number Diff line number Diff line change
Expand Up @@ -2190,15 +2190,15 @@ int freadMain(freadMainArgs _args)
}
}

if (args.header == NA_BOOL8 && prevStart != NULL) {
if (prevStart != NULL && (args.header == NA_BOOL8 || args.skipNrow >= 0)) {
// The first data row matches types in the row after that, and user didn't override default auto detection.
// Maybe previous line (if there is one, prevStart!=NULL) contains column names but there are too few (which is why it didn't become the first data row).
ch = prevStart;
int tt = countfields(&ch);
if (tt == ncol) INTERNAL_STOP("row before first data row has the same number of fields but we're not using it"); // # nocov
if (ch != pos) INTERNAL_STOP("ch!=pos after counting fields in the line before the first data row"); // # nocov
if (verbose) DTPRINT(_("Types in 1st data row match types in 2nd data row but previous row has %d fields. Taking previous row as column names."), tt);
if (tt < ncol) {
if (tt < ncol && args.header != false) {
autoFirstColName = (ncol - tt == 1);
if (autoFirstColName) {
DTWARN(_("Detected %d column names but the data has %d columns (i.e. invalid file). Added an extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.\n"),
Expand All @@ -2216,7 +2216,7 @@ int freadMain(freadMainArgs _args)
for (int j = ncol; j < tt; j++) { tmpType[j] = type[j] = type0; }
ncol = tt;
}
args.header = true;
if (args.header == NA_BOOL8) args.header = true;
pos = prevStart;
row1line--;
}
Expand Down
Loading