PRISM: O(1) Photonic Block Selection for Long-Context LLM Inference — eliminates the O(N) KV cache scan via photonic broadcast-and-weight similarity engine on TFLN
pytorch attention photonics similarity-search memory-optimization kv-cache photonic-computing long-context llm-inference microring-resonator
-
Updated
Mar 25, 2026 - Python