Skip to content

Commit bc8c7e5

Browse files
committed
Added cmb build process
Added the community build process doc Signed-off-by: JJ Asghar <awesome@ibm.com>
1 parent fe13bb1 commit bc8c7e5

File tree

1 file changed

+151
-0
lines changed

1 file changed

+151
-0
lines changed

docs/cmb/build_process.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
2+
!!! note
3+
This document is the Community Build Process, these are the general steps to get the cmb built.
4+
5+
6+
## Add the PRs to the local tree
7+
8+
Add the PRs you want to be built into the run. Tag the PRs with "cmb-running."
9+
10+
```bash
11+
mkdir -p compositional_skills/general/synonyms
12+
vi compositional_skills/general/synonyms/attribution.txt
13+
vi compositional_skills/general/synonyms/qna.yaml
14+
```
15+
16+
## Verify changes
17+
```bash
18+
ilab taxonomy diff
19+
```
20+
21+
!!! warning
22+
`~/.local/share/instructlab/datasets` -- should be empty before starting
23+
Every gpu should be "empty", or `0%` check with `nvidia-smi`
24+
25+
## Create the data
26+
```bash
27+
ilab data generate
28+
```
29+
30+
## Run the training after the generate is complete
31+
```bash
32+
ilab model train --strategy lab-multiphase --phased-phase1-data ~/.local/share/instructlab/datasets/knowledge_train_msgs_XXXXXXX.jsonl --phased-phase2-data ~/.local/share/instructlab/datasets/skills_train_msgs_XXXXXXX.jsonl
33+
```
34+
35+
## Post training evaluation steps
36+
37+
If you want to send a sanity check, you can set these two variables to do a subset of the training:
38+
```bash
39+
export INSTRUCTLAB_EVAL_FIRST_N_QUESTIONS=10 # mtbench
40+
export INSTRUCTLAB_EVAL_MMLU_MIN_TASKS=true # mmlu
41+
```
42+
43+
(optional in case of sanity of a specific Sample Model creation)
44+
```bash
45+
ilab model evaluate --benchmark mt_bench --model ~/.local/share/instructlab/checkpoints/hf_format/samples_XXXXXX
46+
```
47+
!!! tip
48+
We should do the revaluation because we want to reverify the numbers before going any farther.
49+
50+
## General Benchmarking
51+
52+
- `mmlu`: general model knowledge, general facts, it's a knowledge number out of 100
53+
- `mt_bench`: is a skill based, extraction, etc, out of 10
54+
!!! note
55+
we want around 7.1 for `mt_bench` average for a model candidate
56+
57+
## Specific Benchmarking
58+
`mmlu_branch`: these are specific to the general knowledge
59+
60+
```bash
61+
ilab model evaluate --benchmark mmlu_branch --model ~/.local/share/checkpoints/hf_format/<checkpoint> --tasks-dir ~/.local/share/instructlab/datasets/<node-dataset> --base-model ~/.cache/instructlab/models/granite-7b-redhat-lab
62+
```
63+
64+
`mt_bench_branch`: these are specific for the skills
65+
```bash
66+
ilab model evaluate --benchmark mt_bench_branch --model ~/.local/share/checkpoints/hf_format/<checkpoint> --taxonomy-path ~/.local/share/instructlab/taxonomy --judge-model ~/.cache/instructlab/models/prometheus-8x7b-v2-0 --base-model ~/.cache/instructlab/models/granite-7b-redhat-lab --base-branch main --branch main
67+
```
68+
69+
## Hosting the release candidates
70+
71+
rsync over the files
72+
```bash
73+
mkdir $(date +%F)
74+
cd $(date +%F)
75+
rsync --info=progress2 -avz -e <USERNAME>@<REMOTE>:~/.local/share/checkpoints/hf_format/samples_xxxxx ./
76+
```
77+
78+
Set up (if needed)
79+
```bash
80+
python3.11 -m venv venv
81+
source venv/bin/activate
82+
pip install vllm
83+
./run.sh
84+
```
85+
86+
`run.sh`
87+
```bash
88+
#!/bin/bash
89+
90+
DIRECTORY=$1
91+
92+
DATE=$(date +%F)
93+
RANDOM_STRING=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 10; echo)
94+
RANDOM_PORT=$(shuf -i 8001-8800 -n 1)
95+
API_KEY=$RANDOM_STRING-$DATE
96+
97+
echo "$DIRECTORY,$API_KEY,$RANDOM_PORT" >> model_hosting.csv
98+
99+
echo "ilab model chat --endpoint-url http://cmb-staging.DOMAIN.xx:$RANDOM_PORT/v1 --api-key $API_KEY --model $DIRECTORY" >> model_ilab_scripting.sh
100+
101+
python -m vllm.entrypoints.openai.api_server --model $DIRECTORY --api-key $API_KEY --host 0.0.0.0 --port $RANDOM_PORT --tensor-parallel-size 2
102+
```
103+
104+
Find the `ilab` random command to host the model, send that on after the PR letter
105+
```
106+
cat model_ilab_scripting.sh
107+
```
108+
## Form letter for PRs
109+
110+
Hi! 👋
111+
Thank you for submitting this PR. We are ready to do some validation now, and we have a few candidates to see if they improve the model.
112+
We some resources to run these release candidates, but we need _you_ to help us. Can you reach out to me either on Slack (@awesome) or email me at awesomeATinstructlab.ai so I can get you access via `ilab model chat`?
113+
We can only run these models for a "week" or so, so please reach out as soon as possible and tell me which one is best for you on this PR.
114+
115+
## With confirmed success
116+
117+
With confirmed success, tag the PR with "ready-for-merge" and remove the "community-build-ready" tags. Wait till the "week" before shutting down the staging instance, and merge in all the ones that have been tagged.
118+
119+
## Steps to Merge and Release
120+
121+
After you have merged in the PRs to the taxonomy, now you need to push this to huggingface, if you don't have access to HuggingFace, you will need to find someone to add you to it ;).
122+
123+
1) Clone down the repository on the staging box if you haven't already
124+
```bash
125+
git clone https://huggingface.co/instructlab/granite-7b-lab
126+
cd granite-7b-lab
127+
vi .git/config
128+
# url = git@hf.co:instructlab/granite-7b-lab
129+
# verify you can authenticate with hf.com: ssh -T git@hf.co
130+
```
131+
2) Copy in the `samples_xxxx` into the granite-7b-lab
132+
3) `git add . && git commit`
133+
4) Write up a good commit message
134+
5) tag and push
135+
```bash
136+
git tag cmb-run-XXXXX
137+
git push origin main
138+
git push origin cmb-run-XXXXX
139+
```
140+
141+
## Convert to `gguf`
142+
```bash
143+
git clone https://github.com/ggerganov/llama.cpp.git
144+
cd llama.cpp/
145+
pip install -r requirements.txt
146+
make -j8
147+
./convert_hf_to_gguf.py ../granite-7b-lab --outfile granite-7b-fp16.gguf
148+
./llama-quantize granite-7b-fp16.gguf granite-7b-_Q4_K_M.gguf Q4_K_M/
149+
./llama-cli -m granite-7b-_Q4_K_M.gguf -p "who is batman?" -n 128
150+
```
151+

0 commit comments

Comments
 (0)