|
| 1 | +# RNA / DNA Design in RFdiffusion3 |
| 2 | + |
| 3 | +This guide describes extensions to RFdiffusion3 for nucleic acid and hybrid RNA–protein design, including: |
| 4 | + |
| 5 | +- RNA/DNA-aware contigs (`R` / `D` suffix) |
| 6 | +- Ligand-conditioned aptamer design |
| 7 | +- Secondary structure (SS) conditioning |
| 8 | +- Base-pair constraints (region- and position-level) |
| 9 | +- Partial structure fixing and unindexing |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +## 1. Contig Syntax for RNA/DNA |
| 14 | + |
| 15 | +Contigs now support nucleic acid specification: |
| 16 | + |
| 17 | +- `R` → RNA segment |
| 18 | +- `D` → DNA segment |
| 19 | +- No suffix → protein (default) |
| 20 | + |
| 21 | +### Example |
| 22 | + |
| 23 | +```json |
| 24 | +{ |
| 25 | + "contig": "40-50R,/0,10-20D,/0,80-110" |
| 26 | +} |
| 27 | +``` |
| 28 | +This corresponds to: 40–50 nt RNA, chain break, 10–20 nt DNA, chain break, 80–110 aa protein |
| 29 | + |
| 30 | +Multipolymer Design |
| 31 | + |
| 32 | +```json |
| 33 | + |
| 34 | +{ |
| 35 | + "multipolymer": { |
| 36 | + "contig": "40-50R,/0,10-20D,/0,80-110", |
| 37 | + "length": "130-180", |
| 38 | + "input": "../input_pdbs/AMP.pdb" |
| 39 | + } |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +## 2. Secondary Structure Conditioning |
| 44 | +### 2.1 Dot-Bracket Notation (Global) |
| 45 | +```json |
| 46 | +{ |
| 47 | + "W05": { |
| 48 | + "ss_dbn": ".(((((((((((((((((((..[[[[[[.)))))(((....)))(((....)))))))))))))))))((((((..]]]]]].)))))).", |
| 49 | + "select_fixed_atoms": false, |
| 50 | + "contig": "90-90R", |
| 51 | + "length": "90-90", |
| 52 | + "input": "../input_pdbs/AMP.pdb" |
| 53 | + } |
| 54 | +} |
| 55 | +``` |
| 56 | +`ss_dbn` specifies full RNA secondary structure |
| 57 | + |
| 58 | +Will be applied to the first L tokens, where L is the length of `ss_dbn`. |
| 59 | + |
| 60 | +### 2.2 Dictionary-Based SS Input |
| 61 | + |
| 62 | +Specify secondary structure for subsections: |
| 63 | +``` json |
| 64 | +{ |
| 65 | + "ss_dbn_dict": { |
| 66 | + "A6-25": "(((..)))....(((..)))", |
| 67 | + "B1-20": "((((..))))...((...))" |
| 68 | + } |
| 69 | +} |
| 70 | +``` |
| 71 | +Used in: |
| 72 | +``` json |
| 73 | +{ |
| 74 | + "dict_input_ss": { |
| 75 | + "ss_dbn_dict": { |
| 76 | + "A6-25": "(((..)))....(((..)))", |
| 77 | + "B1-20": "((((..))))...((...))" |
| 78 | + }, |
| 79 | + "contig": "30-30R,/0,30-30R", |
| 80 | + "length": "60-60", |
| 81 | + "input": "../input_pdbs/AMP.pdb" |
| 82 | + } |
| 83 | +} |
| 84 | +``` |
| 85 | +## 3. Base Pair region Conditioning |
| 86 | +### 3.1 Paired Regions |
| 87 | + |
| 88 | +Define paired and loop regions: |
| 89 | +```json |
| 90 | +{ |
| 91 | + "paired_region_list": ["A20-25,B10-15"], |
| 92 | + "loop_region_list": ["A10-19","B20-30"] |
| 93 | +} |
| 94 | +``` |
| 95 | +Enforces pairing and loop propensity between residue ranges during sampling |
| 96 | + |
| 97 | +Used in: |
| 98 | +```json |
| 99 | +{ |
| 100 | + "paired_region_input_ss": { |
| 101 | + "paired_region_list": ["A20-25,B10-15"], |
| 102 | + "loop_region_list": ["A10-19","B20-30"], |
| 103 | + "contig": "50-50R,/0,50-50R", |
| 104 | + "length": "100-100", |
| 105 | + "input": "../input_pdbs/AMP.pdb" |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +### 3.2 Explicit Base Pair Positions |
| 111 | + |
| 112 | +Fine-grained base pairing control: |
| 113 | + |
| 114 | +```json |
| 115 | +{ |
| 116 | + "paired_position_list": [ |
| 117 | + "A3,B3","A5,B5","A7,B7","A9,B9","A11,B11", |
| 118 | + "A13,B13","A15,B15","A17,B17","A19,B19" |
| 119 | + ] |
| 120 | +} |
| 121 | +``` |
| 122 | +Used in: |
| 123 | +```json |
| 124 | +{ |
| 125 | + "paired_position_input_ss": { |
| 126 | + "paired_position_list": [ |
| 127 | + "A3,B3","A5,B5","A7,B7","A9,B9","A11,B11", |
| 128 | + "A13,B13","A15,B15","A17,B17","A19,B19" |
| 129 | + ], |
| 130 | + "contig": "20-20R,/0,20-20R", |
| 131 | + "length": "40-40", |
| 132 | + "input": "../input_pdbs/AMP.pdb" |
| 133 | + } |
| 134 | +} |
| 135 | +``` |
| 136 | +### Note: Most of the above jsons is not actually reading the `input` field. Kept as a dummy for the `inference3_engine`. |
| 137 | + |
| 138 | +## 4. Ligand-Conditioned Aptamer Design |
| 139 | + |
| 140 | +Supports small molecule binding RNA design. |
| 141 | + |
| 142 | +AMP Aptamer Example |
| 143 | +```json |
| 144 | +{ |
| 145 | + "AMP_aptamer": { |
| 146 | + "input": "../input_pdbs/AMP.pdb", |
| 147 | + "ligand": "AMP", |
| 148 | + "contig": "40-50R", |
| 149 | + "length": "40-50", |
| 150 | + "ori_jitter": 1, |
| 151 | + "select_buried": {"AMP": "ALL"}, |
| 152 | + "select_hbond_acceptor": { |
| 153 | + "AMP": "N7,O4',O1P,O2P,O3P,N3,N1" |
| 154 | + }, |
| 155 | + "select_hbond_donor": { |
| 156 | + "AMP": "N6,O3',O2'" |
| 157 | + } |
| 158 | + } |
| 159 | +} |
| 160 | +``` |
| 161 | +Key Options |
| 162 | + |
| 163 | +`ligand`: ligand name in the input PDB |
| 164 | + |
| 165 | +`select_buried`: enforce burial of ligand atoms |
| 166 | + |
| 167 | +`select_hbond_acceptor` / `select_hbond_donor`: suggest Hbond interaction atoms |
| 168 | + |
| 169 | +`ori_jitter`: small random perturbation of ori token (from ligand COM) |
| 170 | + |
| 171 | + |
| 172 | +## 5. Hybrid RNA–Protein Design with Constraints |
| 173 | +### RNase P Active Site Example |
| 174 | + |
| 175 | +```json |
| 176 | +{ |
| 177 | + "unindexed_rnasep": { |
| 178 | + "input": "../input_pdbs/rnase_p_3q1q_active_site_small.pdb", |
| 179 | + "contig": "50-80R,/0,100-120,/0,C1-4,C79-86", |
| 180 | + "length": "162-212", |
| 181 | + "ligand": "MG,PO4", |
| 182 | + "unindex": "B49,B50,B51,B52,B321,/0,A56-58,/0", |
| 183 | + "select_fixed_atoms": { |
| 184 | + "B49": "ALL", |
| 185 | + "B50": "ALL", |
| 186 | + "B51": "ALL", |
| 187 | + "B52": "ALL", |
| 188 | + "B321": "ALL", |
| 189 | + "A56-58": "ALL", |
| 190 | + "C1-4": "ALL", |
| 191 | + "C79-86": "ALL" |
| 192 | + } |
| 193 | + } |
| 194 | +} |
| 195 | +``` |
| 196 | +Key Features |
| 197 | + |
| 198 | +Mixed RNA + protein + fixed fragments |
| 199 | + |
| 200 | +`unindex`: removes residues from positional indexing |
| 201 | + |
| 202 | +`select_fixed_atoms`: freezes specified atoms |
| 203 | + |
| 204 | +Ligands (MG, PO4) included in design context |
| 205 | + |
| 206 | +Useful for catalytic residues or structural motifs |
| 207 | + |
| 208 | +## 7. Summary of New Features |
| 209 | + |
| 210 | +R / D suffix → RNA / DNA specification in contigs |
| 211 | + |
| 212 | +`ss_dbn` → global secondary structure constraint (optional) |
| 213 | + |
| 214 | +`ss_dbn_dict` → local secondary structure constraints (optional) |
| 215 | + |
| 216 | +`paired_region_list` → helix-level pairing constraints (optional) |
| 217 | + |
| 218 | +`paired_position_list` → base-level pairing constraints (optional) |
| 219 | + |
| 220 | +ligand + selection options → aptamer design |
| 221 | + |
| 222 | +`unindex` → remove residues from indexing |
| 223 | + |
| 224 | +`select_fixed_atoms` → freeze structural elements |
| 225 | + |
| 226 | + |
| 227 | +--- |
| 228 | + |
0 commit comments