Skip to content

Missing sampels in clinical data causing wrong prevalence calculated #1103

@sainadfensi

Description

@sainadfensi

Describe the issue

Some sampels is missing from the clinical data if no mutations were found. This will make prevalence looks higher than the actual number.

Command

maf = data.table::fread("xx")
maf$NCBI_Build = as.character(maf$NCBI_Build)
maf$NCBI_Build = "GRCh38"  

dat=dat[dat$column==Group,] # for my data, it should be 107 samples
maf=maf[Tumor_Sample_Barcode %in% dat$tmid,] # only 90 were found having variants 

laml = read.maf(maf = maf,
                    removeDuplicatedVariants=FALSE,
                    vc_nonSyn=c("Frame_Shift_Del", "Frame_Shift_Ins", "Splice_Site",  "Translation_Start_Site","Nonsense_Mutation", "Nonstop_Mutation", "In_Frame_Del","In_Frame_Ins", "Missense_Mutation","Non-coding"),
                    verbose = FALSE,
                    cnTable = cn_table,
                    clinicalData = dat)


After loading, although the clinicalData is set as using dat, only 90 samples:

nrow(laml@clinical.data)
[1] 90

I'm using version 2.22.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions