Skip to content

Commit 240903d

Browse files
committed
Merge branch 'main' of https://github.com/longtermrisk/openweights into main
2 parents 2d1d384 + 44e01e3 commit 240903d

1 file changed

Lines changed: 3 additions & 2 deletions

File tree

docs/finetuning.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Standard supervised fine-tuning using conversation data. This is the most basic
99

1010
```python
1111
from openweights import OpenWeights
12+
from openweights.jobs import unsloth # import has the side affect that makes ow.fine_tuning available
1213
client = OpenWeights()
1314

1415
# Upload a conversations dataset
@@ -34,7 +35,7 @@ The conversations dataset should be in JSONL format with each line containing a
3435
]}
3536
```
3637

37-
### 2. Direct Preference Optimization (DPO)
38+
### 2. DPO
3839
DPO is a method for fine-tuning language models from preference data without using reward modeling. It directly optimizes the model to prefer chosen responses over rejected ones.
3940

4041
```python
@@ -53,7 +54,7 @@ job = client.fine_tuning.create(
5354
)
5455
```
5556

56-
### 3. Offline Rejection Preference Optimization (ORPO)
57+
### 3. ORPO
5758
ORPO is similar to DPO but uses a different loss function that has been shown to be more stable in some cases.
5859

5960
```python

0 commit comments

Comments
 (0)