Hi, thanks for providing such a cool framework to the community. However, I have a few questions during the generation of 5'UTRs. For some reason, I would like to avoid generating sequences with the ATG start codon motif, and I used Optimus50 (for 50nt sequences) rather than the evaluation model shown in the tutorial (for 54nt sequences).
But actually in both scenarios, it always generated sequences with ATGs, and in the original BMC Bioinfo paper, the designed 5'UTRs always have ATG motifs.
I also noticed the punish ATG function:
def get_punish_atg(pwm_start, pwm_end) :
def punish(pwm) :
atg_score = K.sum(pwm[..., pwm_start:pwm_end-2, 0, 0] * pwm[..., pwm_start+1:pwm_end-1, 3, 0] * pwm[..., pwm_start+1:pwm_end-1, 2, 0], axis=-1)
return atg_score
return punish
but I didn't quite get the position dimension for the 3rd nt G: pwm[..., pwm_start+1:pwm_end-1, 2, 0], shouldn't it be pwm[..., pwm_start+2:pwm_end, 2, 0]?
In this case, would it penalize the ATG ONLY at the end of the sequence or it would penalize all possible ATG motifs along the sequences? I also try to add high weight scalar for this seq_loss term corresponding to ATG penalty, but it still generates sequences with ATGs.
Thanks in advance for your time.
Hi, thanks for providing such a cool framework to the community. However, I have a few questions during the generation of 5'UTRs. For some reason, I would like to avoid generating sequences with the
ATGstart codon motif, and I usedOptimus50(for 50nt sequences) rather than the evaluation model shown in the tutorial (for 54nt sequences).But actually in both scenarios, it always generated sequences with
ATGs, and in the original BMC Bioinfo paper, the designed 5'UTRs always haveATGmotifs.I also noticed the punish ATG function:
but I didn't quite get the position dimension for the 3rd nt G:
pwm[..., pwm_start+1:pwm_end-1, 2, 0], shouldn't it bepwm[..., pwm_start+2:pwm_end, 2, 0]?In this case, would it penalize the
ATGONLY at the end of the sequence or it would penalize all possibleATGmotifs along the sequences? I also try to add high weight scalar for thisseq_lossterm corresponding to ATG penalty, but it still generates sequences with ATGs.Thanks in advance for your time.