I found out that when reading the entire content of a file of 2+αGiB from S3 fails by OverflowError: signed integer is greater than maximum exception raised from Python SSL library.
Here is the minimum reproduction.
import pfio
import os
path = 's3://<bucket>/foo.dat'
# size = 2**31 + 7 * 1024 # No error
size = 2**31 + 8 * 1024 # Get error
# Create test data of _size_ bytes
bs = 128 * 1024 * 1024
with pfio.v2.open_url(path, 'wb') as f:
while 0 < size:
s = min(bs, size)
print('ss={}, s={}'.format(size, s))
f.write(bytearray(os.urandom(s)))
size -= s
# Read the entire content
with pfio.v2.open_url(path, 'rb') as f:
assert len(f.read(-1))
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
File "/usr/local/lib/python3.8/site-packages/pfio/v2/s3.py", line 149, in readall
return self.read(-1)
File "/usr/local/lib/python3.8/site-packages/pfio/v2/s3.py", line 82, in read
data = body.read()
File "/usr/local/lib/python3.8/site-packages/botocore/response.py", line 95, in read
chunk = self._raw_stream.read(amt)
File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 515, in read
data = self._fp.read() if not fp_closed else b""
File "/usr/local/lib/python3.8/http/client.py", line 468, in read
s = self._safe_read(self.length)
File "/usr/local/lib/python3.8/http/client.py", line 609, in _safe_read
data = self.fp.read(amt)
File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
OverflowError: signed integer is greater than maximum
Reading the error message, I thought reading a file of 2^31 bytes is fine and 2^31+1 bytes is NG, but it seems to be slightly different; the threshold is somewhere between 2147490816 (2^31+7k) ~ 2147491840 (2^31+8k).
I think the S3 API itself should support reading such a large file, but the issue is in Python SSL library layer (if so, maybe it'd be better trying Python 3.9 and 3.10).
Here is my environment:
% python --version
Python 3.8.10
% python -c "import pfio; print(pfio.__version__)"
2.2.0
I found out that when reading the entire content of a file of 2+αGiB from S3 fails by
OverflowError: signed integer is greater than maximumexception raised from Python SSL library.Here is the minimum reproduction.
Reading the error message, I thought reading a file of 2^31 bytes is fine and 2^31+1 bytes is NG, but it seems to be slightly different; the threshold is somewhere between 2147490816 (2^31+7k) ~ 2147491840 (2^31+8k).
I think the S3 API itself should support reading such a large file, but the issue is in Python SSL library layer (if so, maybe it'd be better trying Python 3.9 and 3.10).
Here is my environment: