Performance of tinygp for multiple quasiseparable kernels

I have a model expressed as a sum of many (~from 10 to 40) SHO kernels and I have been playing around with tinygp and celerite2 (Jax implementation). I have done some tests, and celerite2 is faster than tinygp (see figure below) when using a sum of multiple semi-separable kernels. 

Could you give me some insight into why we have such a difference in runtime between the two libraries? 
And also would it be possible to reach the celerite2 speed with a modification of the tinygp implementation? I am currently in the process of reading the tinygp code to understand what could explain such a difference.


![benchmark_celerite_with_RealTerm](https://github.com/dfm/tinygp/assets/98759628/bb56aa1a-823c-49b4-88aa-1aaa38462d7b)

Thanks,


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of tinygp for multiple quasiseparable kernels #190

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance of tinygp for multiple quasiseparable kernels #190

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions