-
Notifications
You must be signed in to change notification settings - Fork 0
Bidirectional LSTM recurrent neural network python program designed to utilize FASTA sequence data to predict miRNA molecules for specified areas.
License
SephDavis/miRNAPredictionAI
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Here's a breakdown of how the program works: 1. Import necessary libraries and modules, such as NumPy, Keras, and scikit-learn, which provide functionalities for working with arrays, deep learning models, and data preprocessing. 2. Define the parameters for the model, such as sequence length, number of hidden units, number of output units, batch size, and number of epochs. 3. Load the data from two FASTA files, one containing precursor miRNA sequences and the other containing X chromosome sequences. Preprocess the data by removing any characters that are not A, C, G, U, or T, and then tokenize the sequences to map nucleotides to numerical values. 4. Encode the sequences by converting the nucleotide characters to their corresponding numerical values using a tokenizer. Pad the sequences to a fixed length. 5. Define a bidirectional LSTM model with an embedding layer, three bidirectional LSTM layers, and a time-distributed dense layer with a softmax activation function. 6. Compile the model using categorical crossentropy loss, the Adam optimizer, and accuracy as a performance metric. 7. Prepare the input and output data for training by converting the numerical values of the sequences to one-hot encoded arrays. 8. Split the input and output data into training and validation sets using train_test_split. 9. Train the model on the training data for the specified number of epochs, using the validation data to evaluate the model's performance during training. 10. Generate predictions for the X chromosome sequences using a sliding window approach, where each window of the specified sequence length is input into the model. 11. Convert the predicted sequences back to their original nucleotide representations. 12. Save the predicted sequences to a text file and print a confirmation message. By following these steps, the program trains a deep learning model to predict potential miRNA precursor sequences in the X chromosome using precursor miRNA sequences as training data. It should be noted that this program can work with any set of miRNA molecules for prediction across any chromosome.
About
Bidirectional LSTM recurrent neural network python program designed to utilize FASTA sequence data to predict miRNA molecules for specified areas.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published