Skip to content

Remove /var/cache/yum/ and merge steps to minimize size of container #120

@thvasilo

Description

@thvasilo

After taking a look through the sagemaker-spark-processing:3.1-cpu-py37-v1.1 image using the dive tool, I noticed that the cache files for installations were not getting cleaned up, leading to an unnecessary increase in image size.

In particular line 13 has no effect as the layer that installs the yum packages on line 6 is immutable at that point. This leads to around 30-40% of the image size being allocated to caches:

image

By adding the cleanup at the end of the layer definition, the cleanup actually works, significantly reducing the size of the image:

image

By merging a couple of layers and doing cleanup we are able to shrink the image size from 4.4GB to 2.5GB, that should lead to faster container spin-up. The changes are available in my fork, if the maintainers agree I can try opening a PR with these changes for the Dockerfiles that could benefit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions