Skip to content

Fix cache and print#258

Open
fsx950223 wants to merge 3 commits intomainfrom
feat/shared-bt-cache-key
Open

Fix cache and print#258
fsx950223 wants to merge 3 commits intomainfrom
feat/shared-bt-cache-key

Conversation

@fsx950223
Copy link
Contributor

@fsx950223 fsx950223 commented Mar 21, 2026

Motivation

Technical Details

Fix cache won't refresh after function body is modified and python print always output ?

Test Plan

Test Result

Submission Checklist

fsx950223 and others added 2 commits March 21, 2026 09:16
1. kernels/pa_decode_sw_fp8.py: FP8 PA decode kernel with shared K/V
   block number optimization. For large blocks (bs=1024), single BT
   lookup per block shared between K and V loads. Removes 2 unnecessary
   gpu.barrier() calls per partition.

2. jit_function.py: Include closure-captured kernel function bodies in
   cache key hash. Scans co_freevars/__closure__ for KernelFunction and
   JitFunction objects, ensuring kernel body changes invalidate cache.

3. arith.py: ArithValue.__str__ now prints metadata (type, op name)
   instead of "?". E.g. "ArithValue(type=i32, op=arith.constant)"
   or "ArithValue(type=f32, block_arg)".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 21, 2026 09:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves FlyDSL’s JIT caching stability by expanding the dependency fingerprinting used for cache keys, and tweaks ArithValue stringification for better diagnostics.

Changes:

  • Enhance ArithValue.__str__ to print type + owner context (block arg vs op name).
  • Extend _collect_dependency_sources to include closure-captured callable dependencies when computing JIT cache keys.
  • (Unintentionally) removes SPDX/copyright headers from two Python modules.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
python/flydsl/expr/utils/arith.py Improves ArithValue string formatting for debugging; header removal needs revert.
python/flydsl/compiler/jit_function.py Adds closure dependency scanning for cache key construction; header removal + filtering/tests need follow-up.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

continue
underlying = _get_underlying_func(val)
if underlying is None or id(underlying) in visited:
continue
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new closure dependency scan does not apply the same _is_user_function(...) filter used for global dependencies. This can pull in sources for non-user callables captured in closures (e.g., imported helpers assigned in an outer scope), causing noisy cache invalidations and potentially expensive inspect.getsource calls. Apply the same user-function filtering here (and consider including the function's file path in the key if cross-file dependencies are intended).

Suggested change
continue
continue
if not _is_user_function(underlying, rootFile):
continue

Copilot uses AI. Check for mistakes.
Comment on lines +146 to +164
# 2) Scan closure variables (co_freevars → __closure__) for callable
# dependencies. This catches @flyc.kernel functions defined in an
# enclosing scope and captured by the @flyc.jit launcher via closure.
if func.__code__.co_freevars and getattr(func, "__closure__", None):
for name, cell in zip(func.__code__.co_freevars, func.__closure__):
try:
val = cell.cell_contents
except ValueError:
continue
underlying = _get_underlying_func(val)
if underlying is None or id(underlying) in visited:
continue
visited.add(id(underlying))
try:
src = inspect.getsource(underlying)
except OSError:
src = underlying.__code__.co_code.hex()
sources.append(f"closure:{name}:{src}")
sources.extend(_collect_dependency_sources(underlying, rootFile, visited))
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change alters JIT cache-key behavior by incorporating closure-captured callables. There don’t appear to be tests asserting cache-key stability/invalidations for closure-captured kernels/functions; adding a small regression test (nested @flyc.kernel captured by a @flyc.jit closure) would help prevent future cache key regressions.

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 3
import builtins
from functools import partialmethod

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-level SPDX license header and copyright notice were removed. Most Python modules in this repo keep these headers (e.g., python/flydsl/expr/utils/init.py), so this should be restored here to maintain consistent licensing metadata.

Copilot uses AI. Check for mistakes.
Comment on lines 1 to 3
import ctypes
import hashlib
import inspect
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file-level SPDX license header and copyright notice were removed. Other modules in python/flydsl/compiler/* retain these headers, so they should remain here as well for consistent licensing metadata.

Copilot uses AI. Check for mistakes.
@fsx950223 fsx950223 changed the title Feat/shared bt cache key Fix cache and print Mar 21, 2026
# SPDX-License-Identifier: Apache-2.0
# Copyright (c) 2025 FlyDSL Project Contributors

import ctypes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove licence header?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants