Large memory occupation

Hi, I'm training faster-rcnn on 4 gpus with coco dataset converted to LMDB.
I used num_worker=4 for the dataloader and I found that the memory occupation is almost 60Gb.
I suspect that the whole dataset is read into memory. But per your description in readme, 

> Here I choose lmdb because
> 2. hdf5 pth n5, though with a straightforward json-like API, require to put the whole file into memory. This is not practicle when you play with large dataset like imagenet.
> 

LMDB shouldn't perform like this. Any thought about this? 
I can share part of my dataset code


```python
class LMDBWrapper(object):
    def __init__(self, lmdb_path):
        self.env = lmdb.open(lmdb_path, max_readers=1, 
                             subdir=os.path.isdir(lmdb_path),
                             readonly=True, lock=False,
                             readahead=False, meminit=False)
        with self.env.begin(write=False) as txn:
            self.length = pa.deserialize(txn.get(b'__len__'))
            self.keys = pa.deserialize(txn.get(b'__keys__'))

    def get_image(self, image_key):
        env = self.env
        with env.begin(write=False) as txn:
            byteflow = txn.get(u'{}'.format(image_key).encode('ascii'))
        imgbuf = pa.deserialize(byteflow)
        buf = six.BytesIO()
        buf.write(imgbuf)
        buf.seek(0)
        image = Image.open(buf).convert('RGB')

        return np.asarray(image)


class LMDBDataset(Dataset):
    def __init__(self, lmdb_path):
        self.lmdb = None
        self.lmdb_path = lmdb_path

    def init_lmdb(self):
        self.lmdb = LMDBWrapper(self.lmdb_path)

    def __getitem__(self, idx):
        if self.lmdb is None:
            self.init_lmdb()

```

```python
class CocoInstanceLMDBDataset(LMDBDataset):
    def __init__(self, lmdb_path):
        super().__init__(lmdb_path=lmdb_path)

    def __getitem__(self, idx):
        super().__getitem__(idx)
        ann = self.filtered_anns[idx]
        data = dict()
        # transforms
        return data
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large memory occupation #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Large memory occupation #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions