Integrating tabix-based index files for faster subgraph extraction #484

jmonlong · 2025-06-14T14:37:45Z

With this, sequenceTubeMap could work with tabix-indexed files representing the pangenome and use a python script to query a subgraph fast. For the HPRC MC v1.1 pangenome, it takes, on average, less than a second to query a region, versus ~30s currently with vg chunk. All haplotypes in the pangenome can be queried.

More details on the tabix indexing and this subgraph extraction in https://github.com/jmonlong/manu-vggafannot

I've tried to document what are the new index files, how to use them and how to make them in a new README.tabix.md file.

This branch also contained minor other changes, like

Fixing a problem when toggling node transparency
Making tracks transparent according to their mapq value

adamnovak · 2025-06-23T15:47:59Z

@jmonlong I did the back-merge from master, but it looks like as with #460 the tests (npm run test -- --watchAll=false) don't pass for TrackPickerDisplay, and the schema change from there is still needed.

It also looks like the tabix-backed codepath can't mix at all with anything on the vg-backed codepath, so I can't use a tabix-backed graph to view a GAM or a small unindexed GFA, and I can't use the simplify switch that calls vg simplify. What's the right way to communicate that to users in the UI?

I also don't think this will work right with the file upload feature, since we can only upload one file per track but Tabix needs two (data and index). We might need to adjust the way that works to allow uploading multiple files per track.

It should be possible to make this work great with remote data URLs, by doing range reads on them directly, but I'm not sure if the tabix command line tool can do web requests itself.

adamnovak · 2025-06-23T15:57:11Z

Instead of a "node track" and a "graph track", it might make more sense to present this feature as a graph track that consists of four files: the positions and nodes files and their indexes.

Is it reasonable to imagine drawing a view with only those four files, and no haplotypes? It looks like that is locked out right now, probably because without the hapolotypes file there are no paths/edges at all, and the tube map needs those to draw anything.

To support that we might need to get rid of the 1 to 1 connection between having a haplotypes track file and displaying the haplotype paths. Someone might want to look at a tabix-backed graph but only see the reference paths in a particular view, but the haplotype database still needs to be included to see anything.

GBZ is a little like this because it has the haplotype data in the same file as the graph, and we fake it by having the GBZ file provide the graph track and also having it separately as the source of a haplotype track that can turn on and off.

Maybe we need to change the track model so that tracks come from databases, and databases are sets of n files that offer m tracks that you can toggle on and off.

jmonlong and others added 12 commits August 28, 2024 20:34

make nodes fully transparent again

25bf649

add a checkbox to have read transparency by mapq

90cb8d4

update docker config

93db207

Merge branch 'master' into color

5cf9051

integration of tabix-based index files

7371f81

document usage of new tabix-based index files

af05806

syntax polish

df8195d

fix a warning in the terminal

129868d

fix order=-1 issue causing white screen error

7c1ba2d

polish doc and error messages

ef460db

Merge branch 'tabix' of github.com:jmonlong/sequenceTubeMap into tabix

48926a9

Merge remote-tracking branch 'origin/master' into tabix

917db5f

adamnovak force-pushed the tabix branch from c4f20d8 to 917db5f Compare June 23, 2025 15:24

adamnovak added 2 commits July 15, 2025 14:49

Fix config files to add new field and pass tests

e6bd510

Document schema changes in types file

9bd8dbc

adamnovak merged commit ad62f98 into master Jul 15, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrating tabix-based index files for faster subgraph extraction #484

Integrating tabix-based index files for faster subgraph extraction #484

Uh oh!

jmonlong commented Jun 14, 2025

Uh oh!

adamnovak commented Jun 23, 2025

Uh oh!

adamnovak commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Integrating tabix-based index files for faster subgraph extraction #484

Integrating tabix-based index files for faster subgraph extraction #484

Uh oh!

Conversation

jmonlong commented Jun 14, 2025

Uh oh!

adamnovak commented Jun 23, 2025

Uh oh!

adamnovak commented Jun 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants