Skip to content

Incorrect behavior with ifeof() when file length matches default buffer size #29

@welchr

Description

@welchr

I believe this library is deprecated, but wanted to leave this info for future reference in case someone else runs into this problem.

It seems as though ifeof() does not return true when the last line in the file fills the buffer exactly (to length DEFAULT_BUFFER_SIZE).

If you read through a file using String::readLine():

int String::ReadLine(IFILE & f)
{
int ch;
char *ptr = buffer;
char *endBuffer = buffer + size;
len = 0;
while ( ((ch = f->ifgetc()) != EOF) && (ch != '\n'))
{
if (ptr >= endBuffer - 1)
{
// resize: 1 byte for the next character, 1 byte
// for the NUL at the end.
Grow(len + 2);
endBuffer = buffer + size;
ptr = buffer + len;
}
*ptr++ = ch;
len++;
}
// NB: assumes that buffer is always allocated.
buffer[len] = 0;
if ((ch == EOF) && (len == 0))
{
// Indicate error/EOF if nothing was read.
return -1;
}
return 1;
}
#endif

It calls ifgetc() repeatedly:

inline int ifgetc()
{
if (myBufferIndex >= myCurrentBufferSize)
{
// at the last index, read a new buffer.
myCurrentBufferSize = readFromFile(myFileBuffer, myAllocatedBufferSize);
myBufferIndex = 0;
// If the buffer index is still greater than or equal to the
// myCurrentBufferSize, then we failed to read the file - return EOF.
// NB: This only needs to be checked when myCurrentBufferSize
// is changed. Simplify check - readFromFile returns zero on EOF
if (myCurrentBufferSize == 0)
{
return(EOF);
}
}
return(myFileBuffer[myBufferIndex++]);
}

When the last line fills the buffer exactly, myBufferIndex == myCurrentBufferSize == DEFAULT_BUFFER_SIZE.

But when ifeof() checks to see if we have arrived at EOF:

inline int ifeof() const
{
// Not EOF if we are not at the end of the buffer.
if (myBufferIndex < myCurrentBufferSize)
{
// There are still available bytes in the buffer, so NOT EOF.
return false;
}
else
{
if (myFileTypePtr == NULL)
{
// No myFileTypePtr, so not eof (return 0).
return 0;
}
// exhausted our buffer, so check the file for eof.
return myFileTypePtr->eof();
}
}

It checks if myBufferIndex < myCurrentBufferSize, which will be false, and it thinks the file should continue being read from.

This can result in a seg fault depending on what you try to do with the buffer at that point, after you've tried to read another line (that doesn't exist) from the file.

Code sample:

#include "InputFile.h"
#include "StringBasics.h"
#include "StringArray.h"
#include <iostream>
#include <string>

int main(int argc, char *argv[]) {
  std::string bad_filepath = argv[1];
  IFILE test = ifopen(bad_filepath.c_str(), "r");

  // This loop is roughly what RAREMETAL does when reading in a covariance file
  while (!ifeof(test)) {
    String buffer;
    buffer.ReadLine(test);

    StringArray tokens;
    tokens.AddTokens(buffer, "\t ");

    int p = tokens[0].Find("chr");
  }

  std::cout << "Finished";
  return 0;
}

Attached file, run with above code, will cause a seg fault.

The simple solution to this (if you're a user looking to get around this bug) is just add a zero somewhere it doesn't matter at the end of the file, or an extra space, and the file will be read successfully.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions