Skip to content

[Feature request] Better control over size of different caches #382

@leburgel

Description

@leburgel

There isn't much infrastructure in place for controlling the size of the different caches stored by TensorKit. Currently, all caches are LRU caches with $10^4$ entries by default, but this leads to serious memory issues when using product symmetries. Simply reducing the maxsize of all caches is not that straightforward, and also doesn't easily resolve the memory issues.

For context, running PEPS simulations using a $f\mathbb{Z}_2 \times \mathrm{U}(1) \times \mathrm{SU}(3)$ symmetry with a maximum cache size of $10^3$ entries results in the following output of TensorKit.global_cache_info()

GLOBAL__FSTRANSPOSE_CACHE:      CacheInfo(; hits=0, misses=56, currentsize=56, maxsize=1000)
GLOBAL__FSBRAID_CACHE:  CacheInfo(; hits=0, misses=168233779, currentsize=1000, maxsize=1000)
GLOBAL_FSTRANSPOSE_CACHE:       CacheInfo(; hits=0, misses=0, currentsize=0, maxsize=1000)
GLOBAL_FSBRAID_CACHE:   CacheInfo(; hits=41137188, misses=79569782, currentsize=1000, maxsize=1000)
GLOBAL_FUSIONBLOCKSTRUCTURE_CACHE:      CacheInfo(; hits=2215295638, misses=866311, currentsize=1000, maxsize=1000)
GLOBAL_TREEBRAIDER_CACHE:       CacheInfo(; hits=0, misses=0, currentsize=0, maxsize=1000)
GLOBAL_TREEPERMUTER_CACHE:      CacheInfo(; hits=818451, misses=130315, currentsize=1000, maxsize=1000)
GLOBAL_TREETRANSPOSER_CACHE:    CacheInfo(; hits=1, misses=5, currentsize=5, maxsize=1000)

As far as I can tell the only way to set these cache sizes is to loop over TensorKit.GLOBAL_CACHES and manually call resize! on them. This leads to:

Issue 1: there is no list of the different caches that are present in TensorKit.jl, nor any documentation on what they are used for

This means the only thing to do right now is to set some maximum size for all caches, without knowing what I'm actually setting. However, given the vastly differing number of queries for the different caches, using a single maxsize for all of them does not seem like a good idea. It feels like this is something that should be taken into account:

Issue 2: the default maxsize doesn't take into account the typical number of queries a cache receives

Furthermore, the currentsize of a cache gives very little indication of how much memory it uses. For example, calling Base.summarysize on the caches of the above example gives:

[ Info: TensorKit cache 'GLOBAL__FSTRANSPOSE_CACHE': 0.102691650390625 MB
[ Info: TensorKit cache 'GLOBAL__FSBRAID_CACHE': 4.004486083984375 MB
[ Info: TensorKit cache 'GLOBAL_FSTRANSPOSE_CACHE': 0.000457763671875 MB
[ Info: TensorKit cache 'GLOBAL_FSBRAID_CACHE': 8.690292358398438 MB
[ Info: TensorKit cache 'GLOBAL_FUSIONBLOCKSTRUCTURE_CACHE': 4257.241668701172 MB
[ Info: TensorKit cache 'GLOBAL_TREEBRAIDER_CACHE': 0.000457763671875 MB
[ Info: TensorKit cache 'GLOBAL_TREEPERMUTER_CACHE': 807.445198059082 MB
[ Info: TensorKit cache 'GLOBAL_TREETRANSPOSER_CACHE': 0.02800750732421875 MB

Clearly, at a similar number of entries the memory sizes can be vastly different. Therefore, in some settings it would make much more sense to specify a maximum memory size for a cache rather than a maximum number of entries. This is:

Issue 3: there is no way of specifying a maximum memory size for the different TensorKit.jl caches

Finally, it seems like it's possible that the GLOBAL_FUSIONBLOCKSTRUCTURE_CACHE takes up a disproportionate amount of memory. Possibly the different entries that differ only in degeneracies contain duplicate information on the same fusion tree structure. So maybe

Issue 4: the GLOBAL_FUSIONBLOCKSTRUCTURE_CACHE could reuse more information between its different entries, reducing its memory size

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions