Skip to content

Comments

perf: optimize meshgrid for reduced allocations and improved type stability#87

Closed
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:mainfrom
ChrisRackauckas-Claude:perf-improvements-20260107-162248
Closed

perf: optimize meshgrid for reduced allocations and improved type stability#87
ChrisRackauckas-Claude wants to merge 1 commit intoSciML:mainfrom
ChrisRackauckas-Claude:perf-improvements-20260107-162248

Conversation

@ChrisRackauckas-Claude
Copy link
Contributor

Summary

Optimizes the meshgrid function used by GridEmbedding to generate coordinate grids. The optimization provides significant performance improvements through:

  • Pre-allocating output array with similar() instead of using stack()
  • Using in-place broadcasting with selectdim() views instead of repeat()
  • Adding type parameter T for better type inference
  • Using ntuple() instead of mutable array for shape computation

Benchmarks

meshgrid 1D (128 points):

Metric Before After Improvement
Time ~2.85 μs ~79 ns ~36x faster
Memory 1.48 KiB 624 bytes 58% less
Allocations 15 3 80% fewer

meshgrid 2D (64x64 points):

Metric Before After Improvement
Time ~19.8 μs ~4.4 μs ~4.5x faster
Memory 65 KiB 33 KiB 49% less
Allocations 29 31 Similar

GridEmbedding 1D (128 points, 4 channels, batch=32):

Metric Before After Improvement
Time ~46 μs ~42 μs ~9% faster

Type Stability

The original implementation returned Any from @code_warntype due to the use of stack() with a closure. The new implementation returns concrete types (Matrix{T}, Array{T,3}, etc.), improving type inference throughout the call chain.

Test plan

  • Verified basic functionality tests pass
  • Verified @code_warntype shows concrete return types
  • Benchmarked before and after for both 1D and 2D cases
  • Tested with FNO and GridEmbedding to verify correctness

cc @ChrisRackauckas

🤖 Generated with Claude Code

…bility

The meshgrid function is used by GridEmbedding to generate coordinate grids.
This optimization provides significant performance improvements:

## Benchmarks

### meshgrid 1D (128 points):
- BEFORE: ~2.85 μs, 1.48 KiB allocated, 15 allocations
- AFTER:  ~79 ns, 624 bytes allocated, 3 allocations
- Improvement: ~36x faster, 58% less memory, 80% fewer allocations

### meshgrid 2D (64x64 points):
- BEFORE: ~19.8 μs, 65.04 KiB allocated, 29 allocations
- AFTER:  ~4.4 μs, 33 KiB allocated, 31 allocations
- Improvement: ~4.5x faster, 49% less memory

## Changes

- Pre-allocate output array with `similar()` instead of using `stack()`
- Use in-place broadcasting with `selectdim()` views instead of `repeat()`
- Add type parameter `T` for better type inference
- Use `ntuple()` instead of mutable array for shape computation

## Type Stability

The original implementation returned `Any` from @code_warntype.
The new implementation returns concrete types (Matrix{T}, Array{T,3}, etc.)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants