Better handling for which producers can correctly encode which queries


I think there are some bits here that are being handled manually that should be handled in a more automated fashion.

With the hand-written tuples for which `producer` can correctly encode which query, we have the means to track regressions (`DuckDB` suddenly fails to run `logb`, e.g.) but not improvements (`Isthmus` can now encode `logb`).

This is a highly non-trivial problem, because the _outcomes_ of the tests of the producers are essentially the text fixtures for the consumers.


We've been using `pytest-snapshot` to test that Ibis produces "good" or "golden" SQL for various expressions (https://pypi.org/project/pytest-snapshot/) and I wonder if that would be of help here.

Testing producers would mean generating substrait blobs, then comparing them to known good / valid snapshots of those blobs.

Testing consumers would consist of loading the snapshots blobs and attempting to execute.

I know I'm not covering everything that needs covering in the test matrix here, but I think it would be a very good idea to start sketching out more sustainable patterns. 

Having said all of ^^^^that^^^^, I don't think that should block this PR.

I _do_ think that we should be attempting to run all producer tests on all SQL snippets, and not manually filtering them down pre-test.  If `isthmus` is going to fail one of those tests because it uses a different SQL dialect, so be it -- we can get creative in the `xfail` markers and distinguish between "tests that fail that should pass in the future" and "tests that fail that will always fail".

Alternatively, we might make use of `sqlglot` to translate string sql between dialects -- it's very good at that.

_Originally posted by @gforsyth in https://github.com/substrait-io/consumer-testing/pull/6#pullrequestreview-1221424754_
      

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling for which producers can correctly encode which queries #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Better handling for which producers can correctly encode which queries #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions