Improving cache management

Currently, if a _working directory_ is given, all cache files are deleted from memory after each stage execution.
We could accelerate the whole execution of a pipeline by avoiding re-loading results that have been used or produced during the preceding stage.

This is how it would work:
We would use a _cache_ map for all stage executions. At each stage execution, this _cache_ is filled with _dependency results/caches_ and, in the end, with the stage result.
This is already the case when no _working directory_ is given. So technically, we could use the same _cache_.
Then, at the beginning of each stage execution, we delete the cache that will not be used during the subsequent execution.

This reduces the execution time of the whole pipeline significantly depending on the cache file sizes. This may be explained by the fact that a topological sort gives execution order: for each stage, its result is probably used during the next stage.

There is one major drawback: the cache is mutable, so any modification of a retrieved cache will be transmitted to the following stages using this same retrieved cache (which does not happen currently and we generally do not want). I do not know how to prevent that because efficiently detecting a modification of an object or making it immutable depends on the kind of this object. A heavy way to do that is to compare the serialization before and after execution, but then we may lose most of the speedup depending on the cost of a serialization.
Let me know if you see any solution apart from warning the developer or if you note any other drawbacks.

Maybe we could make my suggestion an optional behavior?

I already implemented it in [my fork](https://github.com/ainar/synpp/tree/improved_cache) ([diffs](https://github.com/eqasim-org/synpp/compare/develop...ainar:synpp:improved_cache)). If you agree with my suggestion, should I wait for https://github.com/eqasim-org/synpp/pull/81 to be merged to open a new PR? Or may you want to implement it by yourself?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving cache management #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving cache management #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions