Skip to content

Large memory occupation #17

@gathierry

Description

@gathierry

Hi, I'm training faster-rcnn on 4 gpus with coco dataset converted to LMDB.
I used num_worker=4 for the dataloader and I found that the memory occupation is almost 60Gb.
I suspect that the whole dataset is read into memory. But per your description in readme,

Here I choose lmdb because
2. hdf5 pth n5, though with a straightforward json-like API, require to put the whole file into memory. This is not practicle when you play with large dataset like imagenet.

LMDB shouldn't perform like this. Any thought about this?
I can share part of my dataset code

class LMDBWrapper(object):
    def __init__(self, lmdb_path):
        self.env = lmdb.open(lmdb_path, max_readers=1, 
                             subdir=os.path.isdir(lmdb_path),
                             readonly=True, lock=False,
                             readahead=False, meminit=False)
        with self.env.begin(write=False) as txn:
            self.length = pa.deserialize(txn.get(b'__len__'))
            self.keys = pa.deserialize(txn.get(b'__keys__'))

    def get_image(self, image_key):
        env = self.env
        with env.begin(write=False) as txn:
            byteflow = txn.get(u'{}'.format(image_key).encode('ascii'))
        imgbuf = pa.deserialize(byteflow)
        buf = six.BytesIO()
        buf.write(imgbuf)
        buf.seek(0)
        image = Image.open(buf).convert('RGB')

        return np.asarray(image)


class LMDBDataset(Dataset):
    def __init__(self, lmdb_path):
        self.lmdb = None
        self.lmdb_path = lmdb_path

    def init_lmdb(self):
        self.lmdb = LMDBWrapper(self.lmdb_path)

    def __getitem__(self, idx):
        if self.lmdb is None:
            self.init_lmdb()
class CocoInstanceLMDBDataset(LMDBDataset):
    def __init__(self, lmdb_path):
        super().__init__(lmdb_path=lmdb_path)

    def __getitem__(self, idx):
        super().__getitem__(idx)
        ann = self.filtered_anns[idx]
        data = dict()
        # transforms
        return data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions