Design spec for duckdb general index for dascore 0.2 #648
d-chambers
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
DuckDB Spool Indexer
Schema
meta_dataIndex-level metadata and compatibility information.
what_is_thisdascore_duckdb_index.index_versiondascore_versionlast_indexed_nssourcesOne row per indexed source. A source is one FiberIO scan unit. It will usually be a file, but may also be a directory or another non-overlapping FiberIO-backed entity.
source_idbase_urisource_pathsource_formatformat_versionmtime_nslast_indexed_nsNotes:
source_path, interpreted relative tobase_uriwhenbase_uriis present.update()in v1 is only defined for source types that can provide a meaningful source-levelmtime_ns.patchesOne row per patch summary emitted by a source.
patch_idsource_idsources.source_patch_idn_dimssample_count_totaldimsshapestationnetworkchannelPatchAttrs.tagdata_typedata_categorytime_mintimecoord.time_maxtimecoord.time_steptimecoord when available.distance_mindistancecoord.distance_maxdistancecoord.distance_stepdistancecoord when available.Constraint:
(source_id, source_patch_id).Notes:
(source_id, source_patch_id).patches; they are excluded fromattr_index.attr_indexTyped key/value index for non-promoted patch attrs.
patch_idpatches.attr_namevalue_kindunitsvalue_strvalue_intvalue_numvalue_boolvalue_time_nsvalue_duration_nsConstraint:
(patch_id, attr_name).Notes:
attr_indexstores only non-promoted, non-private, non-historyattrs.str->value_kind = str,value_strbool->value_kind = bool,value_boolint->value_kind = int,value_intfloat->value_kind = float,value_numvalue_kind = time,value_time_nsvalue_kind = duration,value_duration_nsvalue_kind = unit,value_strandunitsvalue_kind = quantity,value_numandunitscoord_indexDispatch table for coord summaries.
coord_entry_idpatch_idpatches.coord_nametime,distance,lag_time.coord_dtypecoord_dimscoord_lenCoordSummary.coord_hashunitspayload_tablecoord_time,coord_numeric, orcoord_str.payload_idConstraint:
(patch_id, coord_name, payload_table).Notes:
coord_hashis optional and not required for v1 query behavior.payload_tableis a physical dispatch field, not an extra semantic type system.coord_timeSummary-only payload table for time-like coords.
idcoord_index.payload_id.minmaxstepcountis_monotonicis_relativefalsefor absolute epoch-based time,truefor duration-like relative time.coord_numericSummary-only payload table for numeric coords.
idcoord_index.payload_id.minmaxstepcountis_monotoniccoord_strSummary-only payload table for string coords.
idcoord_index.payload_id.minmaxcountRelations
Notes
dc.scan(..., full_coords=True)is the cleaner future extension point for exact coord extraction.patchescolumns where possible.attr_index.coord_indexplus the resolved payload table.strselector -> Unix glob semanticsre.Patternselector -> regex semanticsDirectorySpool.update()should stay cheap and should not perform full archive reconciliation.reconcile()is a distinct operation:sourcestablemtime_nschangedsource_id.update(),reconcile(), andrebuild()assume exclusive access to the index.source_path, resolve against current spool root, do not rely on persisted localbase_uribase_uri, resolvesource_pathrelative to itsource_path,base_uri = nullTODO
channelto the standardPatchAttrsdefaults and treat it as a first-class attr, likestation,network, andtag.channelparticipates consistently in scan summaries, querying, and promotedpatchescolumns once added toPatchAttrs.Beta Was this translation helpful? Give feedback.
All reactions