You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/finetuning.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,7 @@ Standard supervised fine-tuning using conversation data. This is the most basic
9
9
10
10
```python
11
11
from openweights import OpenWeights
12
+
from openweights.jobs import unsloth # import has the side affect that makes ow.fine_tuning available
12
13
client = OpenWeights()
13
14
14
15
# Upload a conversations dataset
@@ -34,7 +35,7 @@ The conversations dataset should be in JSONL format with each line containing a
34
35
]}
35
36
```
36
37
37
-
### 2. Direct Preference Optimization (DPO)
38
+
### 2. DPO
38
39
DPO is a method for fine-tuning language models from preference data without using reward modeling. It directly optimizes the model to prefer chosen responses over rejected ones.
0 commit comments