-
Notifications
You must be signed in to change notification settings - Fork 30
Description
I believe this library is deprecated, but wanted to leave this info for future reference in case someone else runs into this problem.
It seems as though ifeof() does not return true when the last line in the file fills the buffer exactly (to length DEFAULT_BUFFER_SIZE).
If you read through a file using String::readLine():
libStatGen/general/StringBasics.cpp
Lines 744 to 776 in fae4fca
| int String::ReadLine(IFILE & f) | |
| { | |
| int ch; | |
| char *ptr = buffer; | |
| char *endBuffer = buffer + size; | |
| len = 0; | |
| while ( ((ch = f->ifgetc()) != EOF) && (ch != '\n')) | |
| { | |
| if (ptr >= endBuffer - 1) | |
| { | |
| // resize: 1 byte for the next character, 1 byte | |
| // for the NUL at the end. | |
| Grow(len + 2); | |
| endBuffer = buffer + size; | |
| ptr = buffer + len; | |
| } | |
| *ptr++ = ch; | |
| len++; | |
| } | |
| // NB: assumes that buffer is always allocated. | |
| buffer[len] = 0; | |
| if ((ch == EOF) && (len == 0)) | |
| { | |
| // Indicate error/EOF if nothing was read. | |
| return -1; | |
| } | |
| return 1; | |
| } | |
| #endif |
It calls ifgetc() repeatedly:
libStatGen/general/InputFile.h
Lines 324 to 341 in fae4fca
| inline int ifgetc() | |
| { | |
| if (myBufferIndex >= myCurrentBufferSize) | |
| { | |
| // at the last index, read a new buffer. | |
| myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize); | |
| myBufferIndex = 0; | |
| // If the buffer index is still greater than or equal to the | |
| // myCurrentBufferSize, then we failed to read the file - return EOF. | |
| // NB: This only needs to be checked when myCurrentBufferSize | |
| // is changed. Simplify check - readFromFile returns zero on EOF | |
| if (myCurrentBufferSize == 0) | |
| { | |
| return(EOF); | |
| } | |
| } | |
| return(myFileBuffer[myBufferIndex++]); | |
| } |
When the last line fills the buffer exactly, myBufferIndex == myCurrentBufferSize == DEFAULT_BUFFER_SIZE.
But when ifeof() checks to see if we have arrived at EOF:
libStatGen/general/InputFile.h
Lines 386 to 404 in fae4fca
| inline int ifeof() const | |
| { | |
| // Not EOF if we are not at the end of the buffer. | |
| if (myBufferIndex < myCurrentBufferSize) | |
| { | |
| // There are still available bytes in the buffer, so NOT EOF. | |
| return false; | |
| } | |
| else | |
| { | |
| if (myFileTypePtr == NULL) | |
| { | |
| // No myFileTypePtr, so not eof (return 0). | |
| return 0; | |
| } | |
| // exhausted our buffer, so check the file for eof. | |
| return myFileTypePtr->eof(); | |
| } | |
| } |
It checks if myBufferIndex < myCurrentBufferSize, which will be false, and it thinks the file should continue being read from.
This can result in a seg fault depending on what you try to do with the buffer at that point, after you've tried to read another line (that doesn't exist) from the file.
Code sample:
#include "InputFile.h"
#include "StringBasics.h"
#include "StringArray.h"
#include <iostream>
#include <string>
int main(int argc, char *argv[]) {
std::string bad_filepath = argv[1];
IFILE test = ifopen(bad_filepath.c_str(), "r");
// This loop is roughly what RAREMETAL does when reading in a covariance file
while (!ifeof(test)) {
String buffer;
buffer.ReadLine(test);
StringArray tokens;
tokens.AddTokens(buffer, "\t ");
int p = tokens[0].Find("chr");
}
std::cout << "Finished";
return 0;
}Attached file, run with above code, will cause a seg fault.
The simple solution to this (if you're a user looking to get around this bug) is just add a zero somewhere it doesn't matter at the end of the file, or an extra space, and the file will be read successfully.