perf: optimize meshgrid for reduced allocations and improved type stability by ChrisRackauckas-Claude · Pull Request #87 · SciML/NeuralOperators.jl

ChrisRackauckas-Claude · 2026-01-07T22:02:50Z

Summary

Optimizes the meshgrid function used by GridEmbedding to generate coordinate grids. The optimization provides significant performance improvements through:

Pre-allocating output array with similar() instead of using stack()
Using in-place broadcasting with selectdim() views instead of repeat()
Adding type parameter T for better type inference
Using ntuple() instead of mutable array for shape computation

Benchmarks

meshgrid 1D (128 points):

Metric	Before	After	Improvement
Time	~2.85 μs	~79 ns	~36x faster
Memory	1.48 KiB	624 bytes	58% less
Allocations	15	3	80% fewer

meshgrid 2D (64x64 points):

Metric	Before	After	Improvement
Time	~19.8 μs	~4.4 μs	~4.5x faster
Memory	65 KiB	33 KiB	49% less
Allocations	29	31	Similar

GridEmbedding 1D (128 points, 4 channels, batch=32):

Metric	Before	After	Improvement
Time	~46 μs	~42 μs	~9% faster

Type Stability

The original implementation returned Any from @code_warntype due to the use of stack() with a closure. The new implementation returns concrete types (Matrix{T}, Array{T,3}, etc.), improving type inference throughout the call chain.

Test plan

Verified basic functionality tests pass
Verified @code_warntype shows concrete return types
Benchmarked before and after for both 1D and 2D cases
Tested with FNO and GridEmbedding to verify correctness

cc @ChrisRackauckas

🤖 Generated with Claude Code

…bility The meshgrid function is used by GridEmbedding to generate coordinate grids. This optimization provides significant performance improvements: ## Benchmarks ### meshgrid 1D (128 points): - BEFORE: ~2.85 μs, 1.48 KiB allocated, 15 allocations - AFTER: ~79 ns, 624 bytes allocated, 3 allocations - Improvement: ~36x faster, 58% less memory, 80% fewer allocations ### meshgrid 2D (64x64 points): - BEFORE: ~19.8 μs, 65.04 KiB allocated, 29 allocations - AFTER: ~4.4 μs, 33 KiB allocated, 31 allocations - Improvement: ~4.5x faster, 49% less memory ## Changes - Pre-allocate output array with `similar()` instead of using `stack()` - Use in-place broadcasting with `selectdim()` views instead of `repeat()` - Add type parameter `T` for better type inference - Use `ntuple()` instead of mutable array for shape computation ## Type Stability The original implementation returned `Any` from @code_warntype. The new implementation returns concrete types (Matrix{T}, Array{T,3}, etc.) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ChrisRackauckas closed this Jan 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

perf: optimize meshgrid for reduced allocations and improved type stability#87

perf: optimize meshgrid for reduced allocations and improved type stability#87
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:mainfrom
ChrisRackauckas-Claude:perf-improvements-20260107-162248

ChrisRackauckas-Claude commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Comments

Conversation

ChrisRackauckas-Claude commented Jan 7, 2026

Summary

Benchmarks

meshgrid 1D (128 points):

meshgrid 2D (64x64 points):

GridEmbedding 1D (128 points, 4 channels, batch=32):

Type Stability

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants