feat: implement case-insensitive support#119
Conversation
|
The CI test Lance datasets were created with mixed-case filenames (e.g., Person.lance, FRIEND_OF.lance), but it stored providers in the map using the original case name. Fix the |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
beinan
left a comment
There was a problem hiding this comment.
lgtm, just a minor comments
| /// This creates a new RecordBatch with a normalized schema while | ||
| /// preserving all the data arrays. | ||
| fn normalize_record_batch(batch: &RecordBatch) -> Result<RecordBatch> { | ||
| let normalized_schema = normalize_schema(batch.schema())?; |
There was a problem hiding this comment.
If the input RecordBatch has two columns that differ only by case (e.g., fullName and Fullname), normalize_schema will produce a schema with duplicate column names, am I right?
There was a problem hiding this comment.
That's a good point! Yes, the schema will have duplicate column names for the scenario you mentioned. But the issue will be later caught by the DataFusion table registration, so I don't think we need to validate again here.
I ran a test locally to verify that lance-graph throws an exception (from DataFusion) "PlanError: Schema error: Schema contains duplicate qualified field name person.fullname".
Core Feature:
Implementation Details:
Note that there's also a TODO about the GraphConfig API.