You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/dscode.md
+45-46Lines changed: 45 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,50 +31,49 @@ However, they do not address the single or double strandedness of DNA.
31
31
32
32
The dscode alphabet is a super set of the IUPAC alphabet. The symbols take on a different meaning as each symbol represent a base pair (a base in a DNA strand and its complementary base on the other strand) instead of a single base.
33
33
34
-
The alphabet uses an additional ten symbols to represent single stranded regions where there is no complementary base (see table below). Dscode remains 100% backward compatible with the IUPAC alphabet. [link](https://docs.google.com/document/d/1QAjGeCByWemjVZnva7ap3sg7e8wIwVUK4-pIjt4GIZ0/edit?usp=sharing)
35
-
36
-
| Alphabet | Symbol | Complement | Bases | dsIUPAC extended meaning |
|**dscode**| U | O | U in top strand, A in complementary strand | U/A |
53
+
| " | O | U | A in top strand, U in complementary strand | A/U |
54
+
| " | E | F | A in top strand, complementary strand empty | A/◻ |
55
+
|**"**| I | J | C " | C/◻ |
56
+
|**"**| P | Q | G " | G/◻ |
57
+
|**"**| X | Z | T " | T/◻ |
58
+
|**"**| Z | X | A in complementary strand, top strand empty | ◻/A |
59
+
|**"**| Q | P | C " | ◻/C |
60
+
|**"**| J | I | G " | ◻/G |
61
+
|**"**| F | E | T " | ◻/T |
62
+
| " | ! | A | A in upper strand A in lower strand | A/A |
63
+
| " | # | C | A in upper strand C in lower strand | A/C |
64
+
|**"**| $ | G | A in upper strand G in lower strand | A/G |
65
+
|**"**| % | A | C in upper strand A in lower strand | C/A |
66
+
|**"**| & | C | C in upper strand C in lower strand | C/C |
67
+
| " | * | T | C in upper strand T in lower strand | C/T |
68
+
| " | ( | A | G in upper strand A in lower strand | G/A |
69
+
|**"**| ) | G | G in upper strand G in lower strand | G/G |
70
+
|**"**| < | T | G in upper strand T in lower strand | G/T |
71
+
|**"**| > | C | T in upper strand C in lower strand | T/C |
72
+
|**"**| @ | G | T in upper strand G in lower strand | T/G |
73
+
|**"**| : | T | T in upper strand T in lower strand | T/T |
74
+
|**"**| ? | G | U in upper strand G in lower strand | U/G |
75
+
|**"**|[| C | U in upper strand C in lower strand | U/C |
76
+
|**"**|]| T | U in upper strand T in lower strand | U/T |
78
77
79
78
The symbols PEXI and QFZJ that are not occupied by the extended IUPAC alphabet were adopted to imply single stranded DNA on either
80
79
strand where no complementary bas exist.
@@ -84,7 +83,7 @@ GATCaUaAa ad-hoc representation
84
83
tAtUtCTAG
85
84
86
85
87
-
PEXIaUaOaQFZJ representation using dsIUPAC
86
+
PEXIaUaOaQFZJ representation using dscode
88
87
```
89
88
90
89
The choice of symbols for the dscode extension facilitate intuitive recognition of compatible single stranded regions, i.e. sticky-ends. The symbols that can anneal are adjacent in the alphabet eg. `Q-P`, `E-F`, `I-J`, only broken by X-Z due to necessity as Y is a parth of the IUPAC alphabet.
0 commit comments