Strange results

Hello,

Thanks for this amazing tool. I am using fastv in perhaps an unusual way. I'm looking to detect the presence or absence of homologous gene clusters in metagenomic data. I started with ~17 million ORFs from our contigs, and clustered (mmseqs2, 95% identity) them into almost 4 million clusters. I want to detecte the clusters by detecting one or more unique kmers from their representative sequences.

I used `unique_kmer` (initially) to identify unique 24mers, but this took a long time, generated millions of files and unique 24-mers could not be found for a majority of sequences. I fell back on jellyfish, 32mers, and a convoluted pipeline of aligning the kmers against the cluster representatives and then filtering the sorted SAM file to include only non-overlapping kmers. This way the vast majority of cluster representatives had one or more unique kmers (mean of 3 and up to hundreds).

I applied `fastv` with minimal filtering and lowest thresholds (`-A -G -Q -L -p 0.001 -d 0.001`), but only ~100k of ~3.4 million cluster representatives are _ever_ identified across all >200 samples.

I tested further by extracting only unique kmers from one sample and testing them against the sample reads: no hits!

Yet, when I search with `seqkit`, I find that the sequence file does indeed contain this kmer three times: "ATGAAATTCCATGGAATGGAATGGAATGGAAA"

```
e.g.
@K00150:405:HGVM5BBXY:6:2119:16741:38381/1
ATGAAATTCCATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGA*ATGAAATTCCATGGAATGGAATGGAATGGAAA*AGAATGGAATGGAATGAAATTCCATGGAATGGAATGGAATGGAATGGAATGAAATTCC
+
AAAFFJJJJJFAJJJJFJAFAAFJJFAFFJJAJJFF-FJAJ<JFJFJFFJFJJJFFAJJJFFJJJJJJJJFFJFFAAFJJFAFJFA-FJJFFJJJJFFFJFFJJJJFJFJJFJJJJFJJJF7FJFFFJJJF7FJJAAAFJF7AFJJFFAF
```

Can you advise why `fastv` seems not to be detecting it?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange results #15

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Strange results #15

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions