junctionCounts version 1.0.0
The dpsi.csv output file from DEXSeq_comparison.R seems to be using the default R write.csv() function.
Unfortunately, for numerical fields, this has some drawbacks for human readers
- excessive (and variable) digits to the right of the decimal point
- mix of exponential and floating point formats for the same field depending on value
I note that the format of outputs in the psi.tsv files from junctionCounts.py is much nicer.
I propose the following C-style formats for the fields in dpsi.csv
%s event_id
%4.3f dpsi
%4.3e event_qval
%4.3f cond_mean_psi
%d cond_mean_ijc
%d cond_mean_ejc
%s event_type
%s chr
%d start
%d end
%s strand
%s gene
%d sig
I have implemented this after I converted the dpsi.csv file to a dpsi.tsv file. I also added an "abs_dpsi" field to make it easy to sort on the absolute value of dpsi. And I rearranged the order of the fields, putting "genes" at the end because it is inherently variable length. This makes everything more readable. Finally, I sorted the lines by qval (primary) and absdpsi (secondary), because that is or most interest to the biologists. (I did not include the sig field, allowing for a user to decide later what parameters to use to determine significance.)
Here is a sample:
SE_control_u2muca_dpsi.tsv
Sol Katzman
UC Santa Cruz Genomics Institute
junctionCounts version 1.0.0
The dpsi.csv output file from DEXSeq_comparison.R seems to be using the default R write.csv() function.
Unfortunately, for numerical fields, this has some drawbacks for human readers
I note that the format of outputs in the psi.tsv files from junctionCounts.py is much nicer.
I propose the following C-style formats for the fields in dpsi.csv
I have implemented this after I converted the dpsi.csv file to a dpsi.tsv file. I also added an "abs_dpsi" field to make it easy to sort on the absolute value of dpsi. And I rearranged the order of the fields, putting "genes" at the end because it is inherently variable length. This makes everything more readable. Finally, I sorted the lines by qval (primary) and absdpsi (secondary), because that is or most interest to the biologists. (I did not include the sig field, allowing for a user to decide later what parameters to use to determine significance.)
Here is a sample:
SE_control_u2muca_dpsi.tsv
Sol Katzman
UC Santa Cruz Genomics Institute