Skip to content

chore: [cache-dir-size-fix] Part 3: Add disk size calculator for file cache#4416

Draft
gargnitingoogle wants to merge 2 commits intogargnitin/cache-dir-size-fix/v2from
gargnitin/cache-dir-size-fix/v3
Draft

chore: [cache-dir-size-fix] Part 3: Add disk size calculator for file cache#4416
gargnitingoogle wants to merge 2 commits intogargnitin/cache-dir-size-fix/v2from
gargnitin/cache-dir-size-fix/v3

Conversation

@gargnitingoogle
Copy link
Copy Markdown
Contributor

@gargnitingoogle gargnitingoogle commented Feb 25, 2026

Description

Link to the issue in case of a bug fix.

b/477828938

Testing details

  1. Manual - NA
  2. Unit tests - NA
  3. Integration tests - NA

Any backward incompatible change? If so, please explain.

@gargnitingoogle
Copy link
Copy Markdown
Contributor Author

/gemini summary

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

This pull request enhances the caching mechanism by introducing a dedicated utility for calculating disk utilization in file caches and refactoring the core LRU cache to support pluggable size calculation strategies. This separation of concerns allows for more accurate and efficient management of disk space for file-backed caches, addressing the complexities of actual disk usage versus logical file sizes.

Highlights

  • New Disk Size Calculator: Introduced a new FileCacheDiskUtilizationCalculator in internal/cache/file to specifically handle disk space calculation for file-based caches. This utility supports periodic scanning of the cache directory, removal of empty directories, and accurate calculation of file sizes on disk, considering block sizes and sparse files.
  • LRU Cache Refactoring: The generic lru.Cache has been refactored to use a SizeCalculator interface instead of directly managing currentSize. This change externalizes the logic for calculating and updating cache entry sizes, making the lru.Cache more flexible and extensible for different types of cached data, such as file caches where disk utilization might differ from logical size.
  • Custom Size Calculation: A defaultSizeCalculator is provided for the lru.Cache to maintain existing behavior, while the new FileCacheDiskUtilizationCalculator implements the SizeCalculator interface, allowing the lru.Cache to leverage its specialized disk size accounting for file caches.
Changelog
  • internal/cache/file/disk_size_calculator.go
    • Added FileCacheDiskUtilizationCalculator struct with fields for tracking file and scanned sizes, cache directory, scan frequency, and configuration for including files or deleting empty directories.
    • Implemented NewFileCacheDiskUtilizationCalculator to initialize the calculator and start a background goroutine for periodic size scanning.
    • Provided methods SetSharedDirLocker, periodicSizeScan, clearEmptyDirsAndRescanSize, Stop, GetCurrentSize, SizeOf, EvictEntry, InsertEntry, and AddDelta for managing and calculating disk utilization.
  • internal/cache/file/disk_size_calculator_test.go
    • Added unit tests for FileCacheDiskUtilizationCalculator to verify GetCurrentSize, ClearEmptyDirsAndRescanSize, FullScan, AddDelta, and SizeOf methods, including scenarios for sparse and non-sparse files.
  • internal/cache/lru/lru.go
    • Introduced SizeCalculator interface with methods GetCurrentSize, EvictEntry, InsertEntry, AddDelta, and SizeOf.
    • Added defaultSizeCalculator struct implementing the SizeCalculator interface to provide the original size calculation logic.
    • Replaced currentSize field in Cache struct with sizeCalculator of type SizeCalculator.
    • Modified NewCache to initialize with defaultSizeCalculator.
    • Added NewCacheWithCustomSizeCalculator to allow injecting a custom SizeCalculator.
    • Updated checkInvariants, evictOne, Insert, Erase, and UpdateSize methods to use the sizeCalculator interface for all size-related operations.
Activity
  • gargnitingoogle requested a Gemini summary.

@gargnitingoogle
Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@gargnitingoogle
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a FileCacheDiskUtilizationCalculator to abstract cache size calculations, which is a good architectural improvement. However, the implementation has critical issues related to thread safety and correctness in atomic operations, which will cause compilation errors and incorrect behavior under concurrency. I've provided suggestions to fix these issues using a compare-and-swap loop for atomic updates. Additionally, I've suggested some improvements in lru.go to use the new SizeCalculator interface more semantically, enhancing code readability.

Comment on lines +190 to +201
func (c *FileCacheDiskUtilizationCalculator) AddDelta(delta int64) {
if delta < 0 {
negDelta := uint64(-delta)
if negDelta > c.filesSize.Load() {
c.filesSize.Store(0)
} else {
c.filesSize.Add(-negDelta)
}
} else {
c.filesSize.Add(uint64(delta))
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current implementation of AddDelta has two issues. First, c.filesSize.Add(-negDelta) will not compile because the unary minus operator is not defined for uint64. Second, the function is susceptible to a race condition. The value of c.filesSize can change between c.filesSize.Load() and c.filesSize.Store(0) or c.filesSize.Add(...), potentially leading to incorrect size calculations (e.g., wrapping around to a large positive number instead of clamping at zero). A compare-and-swap (CAS) loop should be used to ensure atomicity.

func (c *FileCacheDiskUtilizationCalculator) AddDelta(delta int64) {
	if delta == 0 {
		return
	}
	for {
		currentVal := c.filesSize.Load()
		var newVal uint64
		if delta < 0 {
			negDelta := uint64(-delta)
			if negDelta > currentVal {
				newVal = 0
			} else {
				newVal = currentVal - negDelta
			}
		} else {
			newVal = currentVal + uint64(delta)
		}
		if c.filesSize.CompareAndSwap(currentVal, newVal) {
			return
		}
	}
}

Comment on lines +178 to +180
func (c *FileCacheDiskUtilizationCalculator) EvictEntry(evictedEntry lru.ValueType) {
c.filesSize.Add(-c.SizeOf(evictedEntry))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The expression -c.SizeOf(evictedEntry) will cause a compilation error because the unary minus operator is not defined for the uint64 type returned by c.SizeOf(). You should use the AddDelta method to correctly handle size subtraction.

Suggested change
func (c *FileCacheDiskUtilizationCalculator) EvictEntry(evictedEntry lru.ValueType) {
c.filesSize.Add(-c.SizeOf(evictedEntry))
}
func (c *FileCacheDiskUtilizationCalculator) EvictEntry(evictedEntry lru.ValueType) {
c.AddDelta(-int64(c.SizeOf(evictedEntry)))
}

Comment on lines +182 to +185
// EvictEntry adds the size for the given entry.
func (c *FileCacheDiskUtilizationCalculator) InsertEntry(insertedEntry lru.ValueType) {
c.filesSize.Add(c.SizeOf(insertedEntry))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for InsertEntry is incorrect; it says "EvictEntry". Also, for consistency with EvictEntry and to centralize size update logic, this function should use AddDelta.

Suggested change
// EvictEntry adds the size for the given entry.
func (c *FileCacheDiskUtilizationCalculator) InsertEntry(insertedEntry lru.ValueType) {
c.filesSize.Add(c.SizeOf(insertedEntry))
}
// InsertEntry adds the size for the given entry.
func (c *FileCacheDiskUtilizationCalculator) InsertEntry(insertedEntry lru.ValueType) {
c.AddDelta(int64(c.SizeOf(insertedEntry)))
}

Comment thread internal/cache/lru/lru.go
Comment on lines +228 to +229
c.sizeCalculator.AddDelta(-int64(c.sizeCalculator.SizeOf(e.Value.(entry).Value)))
c.sizeCalculator.AddDelta(int64(valueSize))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of using two AddDelta calls to update the size when an entry is replaced, it would be more semantic and readable to use EvictEntry and InsertEntry. This makes the intent of the code clearer.

Suggested change
c.sizeCalculator.AddDelta(-int64(c.sizeCalculator.SizeOf(e.Value.(entry).Value)))
c.sizeCalculator.AddDelta(int64(valueSize))
c.sizeCalculator.EvictEntry(e.Value.(entry).Value)
c.sizeCalculator.InsertEntry(value)

Comment thread internal/cache/lru/lru.go
e := c.entries.PushFront(entry{key, value})
c.index[key] = e
c.currentSize += valueSize
c.sizeCalculator.AddDelta(int64(valueSize))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For clarity and consistency, InsertEntry should be used here instead of AddDelta when adding a new entry to the cache. This makes the code's intent more explicit.

Suggested change
c.sizeCalculator.AddDelta(int64(valueSize))
c.sizeCalculator.InsertEntry(value)

@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v2 branch from 4b48d6e to 4a6bbb9 Compare February 25, 2026 10:46
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v3 branch from b0703f9 to 1b337ac Compare February 25, 2026 10:46
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v2 branch from 4a6bbb9 to 3938291 Compare February 25, 2026 11:30
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v3 branch from 1b337ac to 19b157e Compare February 25, 2026 11:30
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v2 branch from 3938291 to 9e7e9c4 Compare February 26, 2026 06:11
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v3 branch from 19b157e to 8296b90 Compare February 26, 2026 06:11
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v2 branch from 9e7e9c4 to 1d5a9ce Compare March 12, 2026 05:14
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v3 branch from 8296b90 to 8ca046f Compare March 12, 2026 05:14
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v2 branch from 1d5a9ce to 30075a4 Compare March 12, 2026 07:42
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v3 branch from 8ca046f to 95670d6 Compare March 12, 2026 07:42
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v2 branch from 30075a4 to d754d78 Compare March 20, 2026 08:00
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/cache-dir-size-fix/v3 branch from 95670d6 to 14c98fc Compare March 20, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant