-
Notifications
You must be signed in to change notification settings - Fork 137
Open
Description
Hi, I'm training faster-rcnn on 4 gpus with coco dataset converted to LMDB.
I used num_worker=4 for the dataloader and I found that the memory occupation is almost 60Gb.
I suspect that the whole dataset is read into memory. But per your description in readme,
Here I choose lmdb because
2. hdf5 pth n5, though with a straightforward json-like API, require to put the whole file into memory. This is not practicle when you play with large dataset like imagenet.
LMDB shouldn't perform like this. Any thought about this?
I can share part of my dataset code
class LMDBWrapper(object):
def __init__(self, lmdb_path):
self.env = lmdb.open(lmdb_path, max_readers=1,
subdir=os.path.isdir(lmdb_path),
readonly=True, lock=False,
readahead=False, meminit=False)
with self.env.begin(write=False) as txn:
self.length = pa.deserialize(txn.get(b'__len__'))
self.keys = pa.deserialize(txn.get(b'__keys__'))
def get_image(self, image_key):
env = self.env
with env.begin(write=False) as txn:
byteflow = txn.get(u'{}'.format(image_key).encode('ascii'))
imgbuf = pa.deserialize(byteflow)
buf = six.BytesIO()
buf.write(imgbuf)
buf.seek(0)
image = Image.open(buf).convert('RGB')
return np.asarray(image)
class LMDBDataset(Dataset):
def __init__(self, lmdb_path):
self.lmdb = None
self.lmdb_path = lmdb_path
def init_lmdb(self):
self.lmdb = LMDBWrapper(self.lmdb_path)
def __getitem__(self, idx):
if self.lmdb is None:
self.init_lmdb()class CocoInstanceLMDBDataset(LMDBDataset):
def __init__(self, lmdb_path):
super().__init__(lmdb_path=lmdb_path)
def __getitem__(self, idx):
super().__getitem__(idx)
ann = self.filtered_anns[idx]
data = dict()
# transforms
return dataReactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels