Skip to content

RubyCloud225/Weak_sindy_compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Streaming Weak-SINDy: Sparse Identification of Non-Linear Dynamics for High-Dimensional Data Compression

This repository implements a proprietary framework for the Sparse Identification of Non-linear Dynamics (SINDy) using a Weak Formulation to achieve radical dimensionality reduction in streaming data environments.

Project Structure

Weak_sindy_compression/
├── src/
│   └── Reduction_with_POD/
│       └── Sample_Data.py      # Core logic for loading data and performing POD
├── sample_data.csv            # Input dataset (n variables × m time steps)
└── README.md                  # Project documentation

The Problem: HBM and Data Egress Bottlenecks

In frontier AI systems, the primary bottleneck is often the data movement between high-bandwidth memory (HBM) and the compute cores. Traditional lossy compression (quantization) sacrifices numerical stability and "physical" fidelity.

The Solution: Governing Dynamics as a Compression Layer

Instead of treating data as a collection of bits, this project treats data as the output of a dynamic physical system. By applying the Weak Form of SINDy, we recover the underlying governing equations
${"x˙=Θ(x)Ξ"}$ in an integral form that is inherently robust to the noise found in high-frequency data streams.

Key Technical Innovations:

- Weak Formulation Integration: Utilizing the integral form of the SINDy equation to eliminate the need for numerical differentiation of noisy data.

- Sparse Regression via STLSQ: Implementing Sequentially Thresholded Least Squares to identify the parsimonious model that represents the "Physics" of the data stream.

- Streaming Optimization: Designed for low-latency execution, allowing for real-time dimensionality reduction of model activations or KV-cache states.

Mathematical Foundation

The framework implements a two-stage pipeline to compress high-dimensional data by identifying its latent physical manifolds.

Phase 1: Spatial Compression via POD (SVD)

To handle high-dimensional AI state data, we first project the raw data $\mathbf{X}$ onto a low-rank subspace using Proper Orthogonal Decomposition (POD). We compute the Singular Value Decomposition:

$$\mathbf{X} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

By retaining the $r$ most energetic singular values, we define a reduced-order basis $\mathbf{U}_r$. The high-dimensional state is then represented by reduced coefficients $\mathbf{a}(t)$:

$$\mathbf{x}(t) \approx \mathbf{U}_r \mathbf{a}(t)$$

Phase 2: Temporal Identification via Weak-SINDy

Once in the reduced space, we identify the governing equations for the coefficients $\mathbf{a}(t)$. We utilize the Weak Formulation to ensure robustness against noise and sampling artifacts:

$$\int_{I_k} \mathbf{a}(t) \dot{g}(t) dt = - \int_{I_k} \Theta(\mathbf{a}(t)) \Xi g(t) dt$$

Where:

  • $g(t)$ is a compactly supported test function (e.g., a bell-shaped polynomial).
  • $\Theta(\mathbf{a}(t))$ is a library of candidate nonlinearities (monomials, interaction terms).
  • $\Xi$ is the sparse matrix of coefficients that represents the "Physics" of the stream.

Sparsity via STLSQ

We solve for $\Xi$ using Sequentially Thresholded Least Squares (STLSQ), effectively pruning the library to find the most parsimonious model:

$$\min_{\Xi} | \mathbf{\dot{A}} - \mathbf{\Theta}(\mathbf{A})\Xi |_2^2 + \lambda | \Xi |_0$$

This combined POD-SINDy approach allows us to represent millions of parameters as a small system of differential equations, providing a path toward zero-latency KV-cache reconstruction.

Prerequisites • Python 3.8+ • NumPy

Run the Analysis

python3 src/Reduction_with_POD/Sample_Data.py

Ensure your sample_data.csv is formatted with rows as variables and columns as time steps.

Roadmap

•	Integrate symbolic test functions \psi(t)
•	Construct feature library \Theta(\mathbf{a})
•	Implement streaming regression update logic
•	Reconstruct original system state from compressed form

References

•	Russo et al., Streaming Compression of Scientific Data via Weak-SINDy, arXiv:2308.14962

Catherine Earl

MIT-style license © 2026

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages