Skip to content

Table index creation overhaul#93

Open
sl-at-ibm wants to merge 10 commits intomainfrom
SL-index-rebasing
Open

Table index creation overhaul#93
sl-at-ibm wants to merge 10 commits intomainfrom
SL-index-rebasing

Conversation

@sl-at-ibm
Copy link
Collaborator

@sl-at-ibm sl-at-ibm commented Mar 17, 2026

Fixes #36 .
Fixes #37.
Fixes #38.
Fixes #39.
(#41 seems fixed already)

This PR changes the table index creation API significantly as outlined on #36. Some guidance to reviewers is in order:

  • all tests about index creation require manual verification of the payload (/table status on DB) for full control. Adding deep checks on the ListIndexesAsync(...) response post-creation is left to a future task.
  • These methods highlight an underlying ambiguity between "options for payload" and "options for how to issue the command" (currently all grouped into CreateIndexCommandOptions). This PR has not changed the behaviour to "manually" extract IfNotExists from this object and inject it into the payload, which might need to be revisited (but it would be a breaking change). See CreateGenericIndexAsync.
  • It seems that, contrary to what CreateIndex needs a way to specify $keys/$values #41 claims, indexes on maps are fully functional. Four CreateIndexTests_MapIndex_* tests pass and their payload has been verified when completing this PR, so issue 41 should be good to close IMO.
  • TableIndexDefinition flattens/deepens, in-and-out, the Ascii/CaseSensitive/Normalize options. While this offers a leaner API to the user, it distorts the underlying payload (which has them under options). It might be worth revisiting this logic as it would become unwieldy should the Data API ever enrich the options tree (admittedly unlikely, but...). Just drawing attention to this item as it is potentially breaking change.
  • Along the way, this PR fixes a bug whereby text index options never made their way into the payload.
  • Also added a "freeform" override of the Text index options (because the clients should not dictate what arbitrary json string can be sent as an analyzer index configuration).
  • Prior to splitting the methods, an important rework of the index definition class hierarchy was done: namely, the "column" and general handling of "options" have been moved to a TableIndexBaseDefinition class, which TableIndexDefinition inherits from just like the vector and text counterpart. This was critical to prevent the possibility to create frankenstein options (and payloads) with e.g., both ascii/normalize and vector settings, or similar. Note that the naming pattern, with TableIndexDefinition a subclass, is chosen for alignment to the other clients and, more important, to the command names the Data API exposes.
  • Minor fix: for vector indexes, no empty options: {} is passed if no such settings are provided.
  • Resolution of CreateIndex should be split into three separate methods #37 (splitting of the methods) is of course a greatly breaking change (@skedwards88 cc) with big impact on the docs as well. (and quite a proliferation of the methods in Table.cs).

@sl-at-ibm sl-at-ibm requested a review from toptobes March 17, 2026 14:24
@sl-at-ibm
Copy link
Collaborator Author

@a-random-steve FYI this is not small but I think the way it's structured makes sense with the rest 🙏

@sl-at-ibm
Copy link
Collaborator Author

Note.

A detail that could be added is constraining the column type to the index type (when applicable), for typed column argument:

  • CreateTextIndex would only accept text and ascii columns;
  • CreateVectorIndex would only accept vector columns;
  • CreateIndex ... everything except vector I guess.

However I think this can be added later (trading an API exception for a safer typing error, I wouldn't say it's a problem to do it post-GA) and also I'm not 100% sure this fits with the tenet that clients should not "take initiative".

@sl-at-ibm
Copy link
Collaborator Author

Hold on: I have to investigate a glitchy case @skedwards88 reported today. Stay tuned...

return _column;
}
internal set => _column = value is JsonElement je
? DeserializationUtils.UnwrapJsonElement(je)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sl-at-ibm This solution for handling the challenges of the variable value makes sense. Currently the client doesn't handle any of these sorts of serdes issues directly in the class, so for consistency I might slightly prefer another JsonConverter in SerDes instead. Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this belongs more to the serdes layer.
However, I have lifted this part straight from TableIndexDefinition.cs as-is, where it was already exactly like that.
Here is what I'd do: we can merge as is and open a "code tidy" issue (post-GA) on this item. This way no further blocking happens because of this already pretty big PR. (I understand the improvement results in zero externally-visible effects).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Agreed. And WHO put that in TableIndexDefinition? 😮 lol

@sl-at-ibm
Copy link
Collaborator Author

@a-random-steve I changed my mind and will go with the proper SerDes refactoring. Stay tuned.

The reason is that I found another couple of things to change/add which make sense to do within this PR:

  1. adapt proper deserialization of listIndexes response to the slight restructure of the index definition classes
  2. correspondingly, add deeper tests (i.e. that after an index is created, the listIndexes return the right structures)
  3. handle the "UNKNOWN" (aka unsupported) fourth type of index (cannot create, but the Data API can still return them)

@toptobes so please hold off and do not merge this one still :|

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants