Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
3e8adbc
Merge pull request #1 from orenavram/master
shacharmo Feb 6, 2020
6f9af3b
Added support for local run (Linux)
shacharmo Feb 6, 2020
7507e8d
Fixed bug where the wrong faa file is passed as input to upper_case_s…
TsabarM Feb 13, 2020
95ab1c8
Added docker support including AWS upload
shacharmo Feb 14, 2020
8f4decb
Changed random forest running to support local run (previous code ass…
shacharmo Feb 14, 2020
e9b0a0e
Fixed docker file
shacharmo Feb 16, 2020
5d9d021
Added support for distributed (multi servers) run using celery (submi…
shacharmo Feb 17, 2020
622d3cf
Added prefetch multiplier of 1 to worker entrypoint
shacharmo Feb 17, 2020
1f84bd4
Added sleep with retries in verify_file_is_not_empty for late synchro…
shacharmo Feb 21, 2020
0308971
Changed batch sizes for better concurrency
shacharmo Feb 29, 2020
e5aecf7
Added support for setting meme_split_size
shacharmo Mar 2, 2020
5a9cc63
Added support for setting run script in docker.
shacharmo Mar 23, 2020
243e5b5
Calculating hits faster (base for td-idf)
shacharmo Mar 25, 2020
ea6736f
Completed hits
shacharmo Mar 27, 2020
8f55b7b
Implemented TF-IDF (not yet connected to wrapper)
shacharmo Mar 27, 2020
9b48eab
Completed TF-IDF incluing pipeline parameters
shacharmo Mar 27, 2020
e199b70
Fixed wrong faa input to scan peptides (sometimes unique instead of r…
shacharmo Mar 28, 2020
6d362c3
Skipping completed run to allow resuming the pipeline
shacharmo Apr 3, 2020
b6a340c
fix argument passing
shacharmo Apr 3, 2020
38ee982
Added tfidf output retaining original memes order
shacharmo Apr 3, 2020
1dc48ed
fixed correct pvalues merge for > 100 memes
shacharmo Apr 3, 2020
57c5c8a
Added calculate tf-idf score for BC sequences only (default)
shacharmo Apr 4, 2020
6efc0a2
fixed sorted output of pvalues
shacharmo Apr 4, 2020
4153e96
Added controlled shuffles generator
shacharmo Apr 7, 2020
c921beb
Fixed support for windows \r\n in C++ code
shacharmo Apr 24, 2020
d202fbd
Added hpp output to shuffles generator (python)
shacharmo Apr 24, 2020
a2c063f
Added whether to output sequeneces in hits
shacharmo Apr 25, 2020
25756be
Added shuffle hits (no rating yet)
shacharmo Apr 25, 2020
9bec78a
Added meme rating vs shuffles in hits
shacharmo Apr 25, 2020
e869259
Fixed log
shacharmo Apr 25, 2020
3f72aed
Added shuffle to pipeline
shacharmo Apr 25, 2020
da7ab40
Updated README
shacharmo May 7, 2020
5a6c331
Fixed running executable directly in cluster pipeline
shacharmo Jun 14, 2020
6d91370
added mapitope encoding
jonathanAK Jul 21, 2020
0408d12
removed break
jonathanAK Jul 21, 2020
33aa3b9
code review fixes
jonathanAK Jul 22, 2020
74888d6
distributed mapitope encoding
jonathanAK Jul 22, 2020
757a079
added mapitope encoding to phase 3
jonathanAK Jul 22, 2020
e53b506
added an option to set the minimum length of sequence to read in read…
TsabarM Aug 20, 2020
a215367
Added - option to set minimum sequence length to read in the main pip…
TsabarM Aug 20, 2020
61da5df
updated mapitope_conversion.py:
TsabarM Aug 30, 2020
78af65d
correction of small errors after local tests.
TsabarM Sep 27, 2020
279755c
fix PR+ extract n_splits to an argument
TsabarM Oct 5, 2020
f80b158
change n_splits to positinal argument
TsabarM Oct 5, 2020
496c59a
missing seed value in sample_configuration
TsabarM Oct 5, 2020
d119e0e
use only cv_num_splits parameter, no need for n_splits
TsabarM Oct 6, 2020
f813cf8
change the calculate of p-value
yael1994 Jan 26, 2021
48d2c60
Change following the sorting direction of the array
yael1994 Jan 26, 2021
6788490
change the pvalue, change normalization color RF and change mistake o…
yael1994 Feb 11, 2021
d761a25
changes PR
yael1994 Feb 11, 2021
306366b
changes the variable name from use_mapitop to use_mapitope
yael1994 Feb 11, 2021
f5dabaf
Merge pull request #16 from Webiks/pvalue_polynomial_fit
shacharmo Feb 11, 2021
b8e85e0
fix the variable names
yael1994 Feb 15, 2021
1218d3b
Merge pull request #17 from Webiks/fix_run_model_fitting
shacharmo Feb 15, 2021
8e60227
change the variable of df to train_data
yael1994 Mar 10, 2021
bb26b5c
Merge pull request #19 from Webiks/hits_color_heatmap
shacharmo Mar 14, 2021
a5a321a
change cd hit to 0.5 similarity
sandravoels Apr 12, 2021
7c356ef
paired end
sandravoels Apr 18, 2021
3f7188c
reads_filt_paired_end
sandravoels Apr 25, 2021
c499cd6
add commit to file
tehilayehudai Nov 22, 2021
684fd6b
add commit to file
tehilayehudai Nov 22, 2021
50ebad3
add commit to file
tehilayehudai Nov 22, 2021
b3828bf
add commit to file
tehilayehudai Nov 22, 2021
bfd21d8
add commit to file
tehilayehudai Nov 22, 2021
3349316
add commit to file
tehilayehudai Nov 23, 2021
8028278
check for frameshifts
tehilayehudai Dec 10, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
.*
__pycache_
docker
test*
output*
# mock_data
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.venv
.vscode
__pycache__
test*
output*

UnitePSSMs/UnitePSSMs
PSSM_score_Peptide/PSSM_score_Peptide
hits_cpp/hits
tfidf/tfidf
42 changes: 42 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
FROM ubuntu:18.04

RUN apt update -y && \
apt install -y python3 python3-pip python3-venv && \
apt install -y g++ && \
apt install -y zlib1g-dev && \
apt install -y wget

RUN wget https://github.com/weizhongli/cdhit/releases/download/V4.8.1/cd-hit-v4.8.1-2019-0228.tar.gz && \
tar xzf cd-hit-v4.8.1-2019-0228.tar.gz && \
rm -rf cd-hit-v4.8.1-2019-0228.tar.gz && \
cd cd-hit-v4.8.1-2019-0228 && \
make && \
make install && \
cd .. && \
rm -rf cd-hit-v4.8.1-2019-0228

RUN wget https://mafft.cbrc.jp/alignment/software/mafft_7.450-1_amd64.deb && \
dpkg -i mafft_7.450-1_amd64.deb && \
rm -rf mafft_7.450-1_amd64.deb

RUN mkdir /app
WORKDIR /app

COPY requirements.txt /app
RUN python3 -m venv .venv && \
. .venv/bin/activate && \
pip install -r requirements.txt

COPY . /app
RUN cd UnitePSSMs && \
g++ *.cpp -std=c++11 -O3 -o UnitePSSMs && \
cd ../PSSM_score_Peptide && \
g++ *.cpp -std=c++11 -O3 -o PSSM_score_Peptide && \
cd ../hits_cpp && \
g++ *.cpp -std=c++11 -O3 -o hits && \
cd ../tfidf && \
g++ *.cpp -std=c++11 -O3 -o tfidf

ENV APP_FILE IgOmeProfiling_pipeline.py
ENTRYPOINT ["./entrypoint.sh"]
CMD ["-h"]
3 changes: 3 additions & 0 deletions DockerfileWorker
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
FROM webiks/igome-profile:latest

ENTRYPOINT ["./worker_entrypoint.sh"]
53 changes: 41 additions & 12 deletions IgOmeProfiling_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,20 @@
import sys
if os.path.exists('/groups/pupko/orenavr2/'):
src_dir = '/groups/pupko/orenavr2/igomeProfilingPipeline/src'
else:
elif os.path.exists('/Users/Oren/Dropbox/Projects/'):
src_dir = '/Users/Oren/Dropbox/Projects/gershoni/src'
else:
src_dir = '.'
sys.path.insert(0, src_dir)

from auxiliaries.pipeline_auxiliaries import *

def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondition_path, analysis_dir, logs_dir,
left_construct, right_construct, max_mismatches_allowed, min_sequencing_quality, gz,
left_construct, right_construct, max_mismatches_allowed, min_sequencing_quality, minimal_length_required, gz,
max_msas_per_sample, max_msas_per_bc,
max_number_of_cluster_members_per_sample, max_number_of_cluster_members_per_bc,
allowed_gap_frequency, number_of_random_pssms,
allowed_gap_frequency, concurrent_cutoffs, meme_split_size, use_mapitope, number_of_random_pssms,
rank_method, tfidf_method, tfidf_factor, shuffles, shuffles_percent, shuffles_digits,
run_summary_path, error_path, queue, verbose, argv):

os.makedirs(os.path.split(run_summary_path)[0], exist_ok=True)
Expand All @@ -39,8 +42,8 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi

module_parameters = [fastq_path, first_phase_output_path, first_phase_logs_path,
barcode2samplename_path, left_construct, right_construct,
max_mismatches_allowed, min_sequencing_quality, first_phase_done_path,
'--gz' if gz else '', f'--error_path {error_path}', '-v' if verbose else '']
max_mismatches_allowed, min_sequencing_quality, minimal_length_required,first_phase_done_path,
'--gz' if gz else '', f'--error_path {error_path}', '-v' if verbose else '', '-m' if use_mapitope else '']
cmd = submit_pipeline_step(f'{src_dir}/reads_filtration/module_wraper.py',
[module_parameters],
logs_dir, f'{exp_name}_reads_filtration',
Expand All @@ -61,7 +64,10 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi
samplename2biologicalcondition_path, max_msas_per_sample, max_msas_per_bc,
max_number_of_cluster_members_per_sample, max_number_of_cluster_members_per_bc,
allowed_gap_frequency, second_phase_done_path,
f'--error_path {error_path}', '-v' if verbose else '', f'-q {queue}']
f'--meme_split_size {meme_split_size}',
f'--error_path {error_path}', '-v' if verbose else '', f'-q {queue}','-m' if use_mapitope else '']
if concurrent_cutoffs:
module_parameters.append('--concurrent_cutoffs')
cmd = submit_pipeline_step(f'{src_dir}/motif_inference/module_wraper.py',
[module_parameters],
logs_dir, f'{exp_name}_motif_inference',
Expand All @@ -71,7 +77,7 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi
error_file_path=error_path, suffix='motif_inference_done.txt')
else:
logger.info(f'{datetime.datetime.now()}: skipping motif inference. Done file exists at:\n{second_phase_done_path}')

third_phase_done_path = f'{logs_dir}/model_fitting_done.txt'
if not os.path.exists(third_phase_done_path):
os.makedirs(third_phase_output_path, exist_ok=True)
Expand All @@ -80,8 +86,16 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi

module_parameters = [first_phase_output_path, second_phase_output_path, third_phase_output_path,
third_phase_logs_path, samplename2biologicalcondition_path, number_of_random_pssms,
third_phase_done_path, f'--error_path {error_path}', '-v' if verbose else '',
f'-q {queue}']
third_phase_done_path, f'--shuffles_percent {shuffles_percent}', f'--shuffles_digits {shuffles_digits}' ,f'--rank_method {rank_method}', f'--error_path {error_path}',
'-v' if verbose else '', f'-q {queue}','-m' if use_mapitope else '']
if rank_method == 'tfidf':
if tfidf_method:
module_parameters += ['--tfidf_method', tfidf_method]
if tfidf_factor:
module_parameters += ['--tfidf_factor', str(tfidf_factor)]
elif rank_method == 'shuffles':
if shuffles:
module_parameters += ['--shuffles', shuffles]
cmd = submit_pipeline_step(f'{src_dir}/model_fitting/module_wraper.py',
[module_parameters],
logs_dir, f'{exp_name}_model_fitting',
Expand Down Expand Up @@ -123,6 +137,8 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi
parser.add_argument('--min_sequencing_quality', type=int, default=38,
help='Minimum average sequencing threshold allowed after filtration'
'for more details, see: https://en.wikipedia.org/wiki/Phred_quality_score')
parser.add_argument('--minimal_length_required', default=3, type=int,
help='Shorter peptides will be discarded')
parser.add_argument('--gz', action='store_true', help='gzip fastq, filtration_log, fna, and faa files')

# optional parameters for the motif inference
Expand All @@ -138,9 +154,20 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi
help='Maximal gap frequency allowed in msa (higher frequency columns are removed)',
type=lambda x: float(x) if 0 < float(x) < 1
else parser.error(f'The threshold of the maximal gap frequency allowed per column should be between 0 to 1'))
parser.add_argument('--concurrent_cutoffs', action='store_true',
help='Use new method which splits meme before cutoffs and runs cutoffs concurrently')
parser.add_argument('--meme_split_size', type=int, default=1, # TODO default of 1, 5 or 10?
help='Split size, how many meme per files for calculations')
parser.add_argument('-m', '--mapitope', action='store_true', help='use mapitope encoding')

# optional parameters for the modelling step
parser.add_argument('--number_of_random_pssms', default=100, type=int, help='Number of pssm permutations')
parser.add_argument('--rank_method', choices=['pval', 'tfidf', 'shuffles'], default='pval', help='Motifs ranking method')
parser.add_argument('--tfidf_method', choices=['boolean', 'terms', 'log', 'augmented'], default='boolean', help='TF-IDF method')
parser.add_argument('--tfidf_factor', type=float, default=0.5, help='TF-IDF augmented method factor (0-1)')
parser.add_argument('--shuffles', default=5, type=int, help='Number of controlled shuffles permutations')
parser.add_argument('--shuffles_percent', default=0.2, type=float, help='Percent from shuffle with greatest number of hits (0-1)')
parser.add_argument('--shuffles_digits', default=2, type=int, help='Number of digits after the point to print in scanning files.')

# general optional parameters
parser.add_argument('--run_summary_path', type=str,
Expand All @@ -161,11 +188,13 @@ def run_pipeline(fastq_path, barcode2samplename_path, samplename2biologicalcondi
run_summary_path = args.error_path if args.error_path else os.path.join(args.analysis_dir, 'run_summary_path.txt')
error_path = args.error_path if args.error_path else os.path.join(args.logs_dir, 'error.txt')

concurrent_cutoffs = True if args.concurrent_cutoffs else False

run_pipeline(args.fastq_path, args.barcode2samplename_path, args.samplename2biologicalcondition_path,
args.analysis_dir.rstrip('/'), args.logs_dir.rstrip('/'),
args.left_construct, args.right_construct, args.max_mismatches_allowed, args.min_sequencing_quality, True if args.gz else False,
args.left_construct, args.right_construct, args.max_mismatches_allowed, args.min_sequencing_quality, args.minimal_length_required, True if args.gz else False,
args.max_msas_per_sample, args.max_msas_per_bc,
args.max_number_of_cluster_members_per_sample, args.max_number_of_cluster_members_per_bc,
args.allowed_gap_frequency, args.number_of_random_pssms,
args.allowed_gap_frequency, concurrent_cutoffs, args.meme_split_size, args.mapitope, args.number_of_random_pssms,
args.rank_method, args.tfidf_method, args.tfidf_factor, args.shuffles, args.shuffles_percent, args.shuffles_digits,
run_summary_path, error_path, args.queue, True if args.verbose else False, sys.argv)

5 changes: 3 additions & 2 deletions PSSM_score_Peptide/PSSM.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#include "SEQ.h"

#include <algorithm>
#include <random>
using namespace std;

void PSSM::setMatrix(const vector<string> & PSSMLines)
Expand Down Expand Up @@ -179,8 +180,8 @@ double PSSM::computeScoreExacrPos(size_t posInPSSM, const size_t charInSeq) cons
}// end of function
*/

PSSM PSSM::randomize() {
PSSM PSSM::randomize(default_random_engine &gen) {
PSSM res(*this);
random_shuffle(res.PSSMmatrix.begin(), res.PSSMmatrix.end());
shuffle(res.PSSMmatrix.begin(), res.PSSMmatrix.end(), gen);
return res;
}
3 changes: 2 additions & 1 deletion PSSM_score_Peptide/PSSM.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
#include <algorithm> // std::min_element, std::max_element
#include <iostream>
#include <cmath> // because of unix
#include <random>

using namespace std;
#include "alphabet.h"
Expand Down Expand Up @@ -43,7 +44,7 @@ class PSSM {
return PSSMmatrix.size();

}
PSSM randomize();
PSSM randomize(default_random_engine &gen);
//int integerOfChar(const char s) const;

//members:
Expand Down
11 changes: 9 additions & 2 deletions PSSM_score_Peptide/computePSSM_cutoffs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,13 @@ void PSSM_scoresFromSeqVector(const PSSM& PSSM1, const vector<vector <size_t>> &
computePSSM_cutoffs::computePSSM_cutoffs(vector<PSSM> & PSSM_array,
size_t TotalNumberOfRandoSeq,
alphabet& alph,
const string & CutofsPerPSSM_FileName) : _PSSM_array(PSSM_array), _totalNumberOfRandoSeq(TotalNumberOfRandoSeq), _alph(alph), _CutofsPerPSSM_FileName(CutofsPerPSSM_FileName)
const string & CutofsPerPSSM_FileName,
int totalMemes) :
_PSSM_array(PSSM_array),
_totalNumberOfRandoSeq(TotalNumberOfRandoSeq),
_alph(alph),
_CutofsPerPSSM_FileName(CutofsPerPSSM_FileName),
_totalMemes(totalMemes)
{
generateRandomPeptides();
computecCutoffsBasedOnRandomPeptides();
Expand All @@ -23,6 +29,7 @@ string SizetToString(size_t sz) {

void computePSSM_cutoffs::generateRandomPeptides() {
size_t NumberOfRandoSeq = _totalNumberOfRandoSeq;
srand(931); // Set srand for generating random pepties // TODO set srand from input argument
map<string, randomPeptides>::iterator it = _randomPeptideDataSet.begin(); // use iteration and insert to add values to map
for (size_t length = 5; length <= 12; length++)
{
Expand Down Expand Up @@ -73,7 +80,7 @@ void computePSSM_cutoffs::computecCutoffsBasedOnRandomPeptides() {
ofstream PSSM_Scores_Cutoff;
PSSM_Scores_Cutoff.open(_CutofsPerPSSM_FileName);

double PercentOfAcceptedPeptidesPerType = PercentOfRandomHitsPerPSSM / _PSSM_array.size(); // for each seq type the cutoff will be the percent of hits accepted for all PSSMs devided by the number of PSSMs
double PercentOfAcceptedPeptidesPerType = PercentOfRandomHitsPerPSSM / _totalMemes; // for each seq type the cutoff will be the percent of hits accepted for all PSSMs devided by the number of PSSMs
for (size_t i = 0; i<_PSSM_array.size(); ++i)
{
PSSM_Scores_Cutoff << "###\t" << _PSSM_array[i].PSSM_name << "\t";
Expand Down
10 changes: 6 additions & 4 deletions PSSM_score_Peptide/computePSSM_cutoffs.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@ using namespace std;

class computePSSM_cutoffs{
public:
computePSSM_cutoffs( vector<PSSM> & PSSM_array,
size_t TotalNumberOfRandoSeq,
alphabet & alph,
const string & CutofsPerPSSM_FileName);
computePSSM_cutoffs(vector<PSSM> & PSSM_array,
size_t TotalNumberOfRandoSeq,
alphabet & alph,
const string & CutofsPerPSSM_FileName,
int totalMemes);

private:
void generateRandomPeptides();
Expand All @@ -32,6 +33,7 @@ class computePSSM_cutoffs{
string const & _CutofsPerPSSM_FileName;
const double PercentOfRandomHitsPerPSSM = 0.05;
alphabet& _alph;
int _totalMemes;



Expand Down
20 changes: 13 additions & 7 deletions PSSM_score_Peptide/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include <limits>
#include <cfloat> // eli because of unix
#include <cmath> // because of unix
#include <random>
using namespace std;

#include "PSSM.h"
Expand Down Expand Up @@ -67,7 +68,7 @@ size_t get_running_mode(int argc, char *argv[]){
int min_required_params = 2;
if ((argc > (max_required_params * 2) + 2) || (argc < (min_required_params * 2) + 2)) {// each with its flag and mode flag, check the value of argc. If not enough parameters have been passed, inform user and exit.
cout << "Usage is in one of few modes: <<endl";
cout << "[1: CalcPSSM_Cutoff] " << argv[0] << " -pssm <PSSMs_in_MAST_Format> -pssm_cutoffs <filename_for PSSM_cutoffs> -CalcPSSM_Cutoff" << endl; // Inform the user of how to use the program
cout << "[1: CalcPSSM_Cutoff] " << argv[0] << " -pssm <PSSMs_in_MAST_Format> -pssm_cutoffs <filename_for PSSM_cutoffs> -CalcPSSM_Cutoff -total_memes <total memes, used if input is splitted otherwise 0>" << endl; // Inform the user of how to use the program
cout << "[2: CalcPSSM_Pval] "<<argv[0]<<" -pssm <PSSMs_in_MAST_Format> -pssm_cutoffs <filename_for PSSM_cutoffs> -seq <input_seq_FASTA> -out <out> -NrandPSSM <number_of_random_PSSMs> -CalcPSSM_Pval" << endl; // Inform the user of how to use the program
cout << "[3: CalcPSSM_Hits] " << argv[0] << " -pssm <PSSMs_in_MAST_Format> -pssm_cutoffs <filename_for PSSM_cutoffs> -seq <input_seq_FASTA> -out <out> -CalcPSSM_Hits " << endl; // Inform the user of how to use the program
cout << "\n\nThe number of provided arguments is "<<argc<<endl;
Expand Down Expand Up @@ -95,11 +96,11 @@ size_t get_running_mode(int argc, char *argv[]){
return mode;
}

void getFileNamesFromArgv(int argc, char *argv[], string & PSSM_FileName, string & CutofsPerPSSM_FileName) {
void getFileNamesFromArgv(int argc, char *argv[], string & PSSM_FileName, string & CutofsPerPSSM_FileName, int & totalMemes) {
// parse ARGV arguments
size_t num_required_params = 2;
size_t num_required_params = 3;
if (argc != (num_required_params * 2)+2) {// each with its flag and mode_flag, check the value of argc. If not enough parameters have been passed, inform user and exit.
cout << "Usage is -pssm <PSSMs_in_MAST_Format> -pssm_cutoffs <filename_for PSSM_cutoffs>\n"; // Inform the user of how to use the program
cout << "Usage is -pssm <PSSMs_in_MAST_Format> -pssm_cutoffs <filename_for PSSM_cutoffs> -total_memes <total memes, used if input is splitted otherwise 0>\n"; // Inform the user of how to use the program
exit(17);
}
cout << argv[0] <<" ";
Expand All @@ -110,6 +111,7 @@ void getFileNamesFromArgv(int argc, char *argv[], string & PSSM_FileName, string
* name of the program, which is stored in argv[0] */
if (string(argv[i]) == "-pssm") PSSM_FileName = string(argv[i + 1]);
else if (string(argv[i]) == "-pssm_cutoffs") CutofsPerPSSM_FileName = string(argv[i + 1]);
else if (string(argv[i]) == "-total_memes") totalMemes = stoi(string(argv[i + 1]));
}
}

Expand Down Expand Up @@ -165,9 +167,12 @@ int computeCutoffsOfPssmMain(int argc, char *argv[])
size_t TotalNumberOfRandoSeq=100000;
string PSSM_FileName = "";
string CutofsPerPSSM_FileName = "";
getFileNamesFromArgv(argc,argv,PSSM_FileName, CutofsPerPSSM_FileName);
int totalMemes = 0;
getFileNamesFromArgv(argc,argv,PSSM_FileName, CutofsPerPSSM_FileName, totalMemes);
readPSSM_info_from_file rpif(PSSM_FileName);
computePSSM_cutoffs cpc1(rpif._PSSM_array, TotalNumberOfRandoSeq, rpif._alph, CutofsPerPSSM_FileName);
if (totalMemes == 0)
totalMemes = rpif._PSSM_array.size();
computePSSM_cutoffs cpc1(rpif._PSSM_array, TotalNumberOfRandoSeq, rpif._alph, CutofsPerPSSM_FileName, totalMemes);
return 0;
}

Expand Down Expand Up @@ -461,8 +466,9 @@ int assignPvalueToPSSMaRRAY(int argc, char *argv[])
//for (size_t i = 0; i < 1; ++i) {
double numberOfHitsInRealPSSM = numberOfTotalHitsPerPSSM(rpif._PSSM_array[i], Seq_array,1);
vector<double> numSigPeptides;
default_random_engine gen(483); // TODO seed should be fro input
for (size_t j = 0; j < numberOfRandomPSSM; ++j) {
PSSM randomPSSM = rpif._PSSM_array[i].randomize(); //1 generate a random PPSM.
PSSM randomPSSM = rpif._PSSM_array[i].randomize(gen); //1 generate a random PPSM.
double sum = numberOfTotalHitsPerPSSM(randomPSSM, Seq_array,0);
numSigPeptides.push_back(sum);// store the number
}
Expand Down
11 changes: 7 additions & 4 deletions PSSM_score_Peptide/randomPeptides.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,16 @@
using namespace std;

#include "randomPeptides.h"
randomPeptides::randomPeptides(vector<double> &characterFrequencies, size_t numberOfSeqToSimulate, size_t sequenceLength) : _characterFrequencies(characterFrequencies), _numberOfSeqToSimulate(numberOfSeqToSimulate), _sequenceLength(sequenceLength),_CysLoop (0) { // deafualt constroctor - NoCysLoop
srand(static_cast<unsigned int>(time(NULL)));
randomPeptides::randomPeptides(vector<double> &characterFrequencies, size_t numberOfSeqToSimulate, size_t sequenceLength) :
_characterFrequencies(characterFrequencies), _numberOfSeqToSimulate(numberOfSeqToSimulate), _sequenceLength(sequenceLength),_CysLoop (0) { // deafualt constroctor - NoCysLoop
// srand(static_cast<unsigned int>(time(NULL))); // srand is set using seed at a higher level
}

randomPeptides::randomPeptides(vector<double> &characterFrequencies, size_t numberOfSeqToSimulate, size_t sequenceLength,bool CysLoop) : _characterFrequencies(characterFrequencies), _numberOfSeqToSimulate(numberOfSeqToSimulate), _sequenceLength(sequenceLength),_CysLoop (CysLoop) {
srand(static_cast<unsigned int>(time(NULL)));
randomPeptides::randomPeptides(vector<double> &characterFrequencies, size_t numberOfSeqToSimulate, size_t sequenceLength,bool CysLoop) :
_characterFrequencies(characterFrequencies), _numberOfSeqToSimulate(numberOfSeqToSimulate), _sequenceLength(sequenceLength),_CysLoop (CysLoop) {
// srand(static_cast<unsigned int>(time(NULL))); // srand is set using seed at a higher level
}

void randomPeptides::generateRandomSequences() {
// std::mt19937 Seq_eng; // a core engine class
// std::random_device dev_random;
Expand Down
Loading