You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/taxonomy/knowledge/contribution_details.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,22 +4,22 @@ description: The overview of 🐶 InstructLab's Knowledge contribution guideline
4
4
logo: images/ilab_dog.png
5
5
---
6
6
7
-
You can create a Git repository to host your knowledge contributions anywhere (GitLab, Gerrit, etc.) but it may be favorable to create one on GitHub. The following instructions show you how to create a knowledge repository in GitHub and contribute to the taxonomy.
7
+
You can create a Git repository to host your knowledge contributions anywhere (GitLab, Gerrit, etc.) but it might be favorable to create one on GitHub. The following instructions show you how to create a knowledge repository in GitHub and contribute to the taxonomy.
8
8
9
9
## Prerequisites
10
10
11
11
- You have a GitHub account
12
12
- You have a forked copy of the [taxonomy](https://github.com/instructlab/taxonomy/tree/main) repository
13
-
-Verify that the model does not already know the knowledge you want to submit
13
+
-You have verified that the model does not already know the knowledge you want to submit
14
14
15
15
## Creating your own knowledge repository
16
16
17
17
To create a new GitHub repository, follow the GitHub documentation in [Creating a new repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-new-repository).
18
18
19
19
The specific steps are listed as follows:
20
20
21
-
1. In your GitHub profile page, navigate to the repositories tab. You will see a search bar where you can search your repositories, or create a new one.
22
-
2. This takes you to a page titled “Create a new repository”. Create a custom name for your repository and add a README.md file. For example, “knowlege_contributions” could be a good name for your repository.
21
+
1. In your GitHub profile page, navigate to the repositories tab. You will see a search bar where you can search your repositories or create a new one.
22
+
2. This takes you to a page titled “Create a new repository”. Create a custom name for your repository and add a `README.md` file. For example, “knowlege_contributions” could be a good name for your repository.
23
23
3. Click “Create” when you are all set.
24
24
25
25
## Convert your knowledge documentation to markdown
@@ -40,15 +40,15 @@ The specific steps are listed as follows:
40
40
3. You can then see your new content in your repository.
41
41
42
42
!!! important
43
-
Make a note of your commit SHA; you need it for your `qna.yaml`.
43
+
Make a note of your commit SHA; you'll need it for your `qna.yaml`.
44
44
45
45
## Create a pull request in the taxonomy repository
46
46
47
47
Navigate to your forked taxonomy repository and ensure it is up-to-date.
48
48
49
49
There are a few ways you can create a pull request:
50
50
51
-
- For details on the local process, check out [The GitHub Workflow Guide](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md) in the kubernetes documentation and the [GitHub flow](https://docs.github.com/en/get-started/using-github/github-flow) in the GitHub documentation.
51
+
- For details on the local process, check out [The GitHub Workflow Guide](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md) in the Kubernetes documentation and the [GitHub flow](https://docs.github.com/en/get-started/using-github/github-flow) in the GitHub documentation.
52
52
- For details on contributing using the GitHub webpage UI, see [Contributing using the GH UI](https://github.com/instructlab/taxonomy/docs/contributing_via_GH_UI.md) or [Creating a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request?tool=webui) in the GitHub documentation.
53
53
54
54
## Verification
@@ -61,7 +61,7 @@ Here are a few things to check before seeking reviews for your contribution:
61
61
62
62
## PR Upstream Workflow
63
63
64
-
The following table outlines the expected timing for the PR(s) you have put in. The PRs go through a few steps, and checks, but you should be able to map your `label` to
64
+
The following table outlines the expected timing for the PRs you have submitted. The PRs go through a few steps, and checks, but you should be able to map your `label` to
description: The overview of 🐶 InstructLab's knowledge
3
+
description: An overview of 🐶 InstructLab's knowledge
4
4
logo: images/ilab_dog.png
5
5
---
6
6
# What is "Knowledge"?
@@ -11,7 +11,7 @@ Knowledge contributions in this project contain a few things.
11
11
12
12
- A file in a git repository that holds your information. For example, these repositories can include markdown versions of information on: Oscar 2024 winners, Law books, Shakespeare, Sports, Chemistry, etc.
13
13
- A `qna.yaml` file that asks and answers questions about the information in the git repository.
14
-
-A`attribution.txt` that includes the sources for the information used in the `qna.yaml`.
14
+
-An`attribution.txt` file that includes the sources for the information used in the `qna.yaml`.
15
15
16
16
You can learn more about the knowledge structure in [Getting Started with Knowledge contributions](https://github.com/instructlab/taxonomy/blob/main/README.md#getting-started-with-knowledge-contributions).
17
17
@@ -20,97 +20,39 @@ You can learn more about the knowledge structure in [Getting Started with Knowle
20
20
!!! important
21
21
We are currently only accepting knowledge contributions as a limited private beta and sources will be limited to articles from Wikipedia.
22
22
23
-
There are a few domains of knowledge that we are currently accepting. For a full list of knowledge fields, see [Knowledge domains](https://github.com/instructlab/taxonomy/blob/main/knowledge/knowledge_domains.md) in the taxonomy documentation
24
-
25
-
A few examples are as follows:
26
-
27
-
### STEM fields
28
-
29
-
- Physics
30
-
- Astronomy and Astrophysics
31
-
- Quantum Mechanics
32
-
- Special Relativity and General Relativity
33
-
34
-
- Chemistry & Chemical Engineering
35
-
- Organic Chemistry
36
-
- Inorganic Chemistry
37
-
- Chemical engineering
38
-
- Biotechnology
39
-
40
-
- Earth & Environmental Science
41
-
- Geology
42
-
- Geography
43
-
44
-
- Biology & Life Sciences
45
-
- Plants (Botany)
46
-
- Medicine & health
47
-
48
-
- Electrical Engineering
49
-
- Bioengineering
50
-
- Civil Engineering
51
-
- Industrial Engineering
52
-
53
-
### Legal and regulatory
54
-
55
-
- Intellectual Property
56
-
- Criminal Law
57
-
- Civil Rights
58
-
- Healthcare compliance
59
-
60
-
### Economy and Business
61
-
62
-
- Economy and Businesses
63
-
- Accounting and Finance
64
-
- Marketing
65
-
- Human Resource
66
-
- Management
67
-
68
-
### Philosophy
69
-
70
-
- Philosophy
71
-
- Metaphysics
72
-
- Epistemology
73
-
- Ethics
74
-
- Parapsychology & occultism
75
-
- Philosophical schools of thought
76
-
77
-
### Literature
78
-
79
-
- Literature, rhetoric & criticism
80
-
- American literature in English
81
-
- Other literatures
23
+
These are the main knowledge domains that we are currently accepting knowledge contributions for: arts, engineering, geography, history, linguistics, mathematics, philosophy, religion, science, and technology.
82
24
83
25
## Avoid These Topics
84
26
85
-
While the tuning process may eventually benefit from being used to help the models work with complex social topics, at this time this is an area of active research we do not want to take lightly. Therefore please keep your submissions clear of the following topics:
27
+
While the tuning process may eventually benefit from being used to help the models work with complex social topics, at this time this is an area of active research we do not want to take lightly. Therefore, please keep your submissions clear of the following topics:
86
28
87
29
- PII (personally identifiable information) or any content invasive of individual privacy rights
88
-
- Violence including self-harm
89
-
- Cyber Bullying
90
-
- Internal documentation or other that is confidential to your employer or organization, e.g. trade secrets
30
+
- Violence, including self-harm
31
+
- Cyber bullying
32
+
- Internal documentation or other information that is confidential to your employer or organization, such as trade secrets
91
33
- Discrimination
92
34
- Religion
93
-
- Facts such as, "[Christianity is, according to the 2011 census, the fifth most practiced religion in Nepal, with 375,699 adherents, or 1.4% of the population](https://en.wikipedia.org/wiki/Christianity_in_Nepal)", are fine as a knowledge contribution. Advocating in favor of or against any religious faith is not acceptable.
35
+
- Facts such as, "[Christianity is, according to the 2011 census, the fifth most practiced religion in Nepal, with 375,699 adherents, or 1.4% of the population](https://en.wikipedia.org/wiki/Christianity_in_Nepal)", are fine as a knowledge contribution. However, advocating in favor of or against any religious faith is not acceptable.
94
36
- Medical or health information
95
-
- Facts such as, "[In mammals, pulmonary ventilation occurs via inhalation (breathing)](https://opentextbc.ca/biology/chapter/11-3-circulatory-and-respiratory-systems/)," are fine as a knowledge contribution. Tailored medical/health advice is not acceptable.
37
+
- Facts such as, "[In mammals, pulmonary ventilation occurs via inhalation (breathing)](https://opentextbc.ca/biology/chapter/11-3-circulatory-and-respiratory-systems/)," are fine as a knowledge contribution. However, tailored medical/health advice is not acceptable.
96
38
- Financial information
97
-
- Facts such as "[laissez-faire economics ... argues that market forces alone should drive the economy and that governments should refrain from direct intervention in or moderation of the economic system](https://openstax.org/books/world-history-volume-2/pages/6-3-capitalism-and-the-first-industrial-revolution)," are fine as a knowledge contribution. Tailored financial advice is not acceptable.
98
-
- Legal settlements/mitigations
99
-
- Gender Bias
100
-
- Hostile Language, threats, slurs, derogatory or insensitive jokes or comments
39
+
- Facts such as "[laissez-faire economics ... argues that market forces alone should drive the economy and that governments should refrain from direct intervention in or moderation of the economic system](https://openstax.org/books/world-history-volume-2/pages/6-3-capitalism-and-the-first-industrial-revolution)," are fine as a knowledge contribution. However, tailored financial advice is not acceptable.
40
+
- Legal settlements or mitigations
41
+
- Gender bias
42
+
- Hostile language, threats, slurs, and derogatory or insensitive jokes or comments
101
43
- Profanity
102
44
- Pornography and sexually explicit or suggestive content
103
-
- Any contributions that would allow for automated decision making that affect an individual's rights or well-being, e.g. social scoring
45
+
- Any contributions that would allow for automated decision making that affect an individual's rights or well-being, such as social scoring
104
46
- Any contributions that engage in political campaigning or lobbying
105
47
106
48
We are also not accepting submissions of the following content:
107
49
108
50
- Code
109
-
- Anything code-related that can be traced back to code for a computer. Not limited to `sed` or `bash`but`yaml`s for OpenShift or Kubernetes, to `python` snippets to `Java` suggestions. There are specific models focused on this space and this isn't for this model for the time being.
51
+
- Anything code-related that can be traced back to code for a computer. Not limited to `sed` or `bash`or`yaml`s for OpenShift or Kubernetes, to `python` snippets to `Java` suggestions. There are specific models focused on this space and this isn't for this model for the time being.
110
52
- Jokes
111
53
- Poems
112
54
113
-
We received many joke and poem submissions at the beginning of the project, and with jokes being "in the eye of the beholder" and puns requiring nuance for native English speakers, we realized we were possibly unconsciously biasing our model. We have discovered that working with both topics has its own challenges, and if we want something generalized, finding consensus was unsuccessful. For now, we're not accepting additional submissions of jokes and poems.
55
+
We received many joke and poem submissions at the beginning of the project, and with jokes and poems being "in the eye of the beholder" and puns requiring nuance for native English speakers, we realized we were possibly unconsciously biasing our model. We have discovered that working with both topics has its own challenges, and if we want something generalized, finding consensus was unsuccessful. For now, we're not accepting additional submissions of jokes and poems.
114
56
115
57
## Building Your LLM Intuition
116
58
@@ -130,28 +72,34 @@ With a few of these qna's, the model will learn the periodic table because it ha
130
72
131
73
### LLMs are great at
132
74
133
-
For these, however, it's common for LLMs to already have excellent performance. Try 3-5 examples in `lab chat` to confirm a deficit in the model before you build your submission, and share the examples in your Pull Request (PR).
75
+
LLMs are great at these:
134
76
135
77
- Brainstorming
136
78
- Creativity
137
79
- Connecting information
138
80
- Cross-lingual behavior
139
81
82
+
For these, however, it's common for LLMs to already have excellent performance. Try 3-5 examples in `lab chat` to confirm a deficit in the model before you build your submission, and then share the examples in your Pull Request (PR).
83
+
140
84
### LLMs need help with
141
85
142
-
LLM behavior in these sorts of topics are very difficult for the model to get right. Try several examples to understand the nuances of the model's ability to do these sorts of tasks, and consider using corrections to the results you get in your tuning process.
86
+
LLMs need help with these:
143
87
144
88
- Chains of reasoning
145
89
- Analysis
146
90
- Story plots
147
91
- Reassembling information
148
92
- Effective and succinct summaries
149
93
94
+
LLM behavior in these sorts of topics are very difficult for the model to get right. Try several examples to understand the nuances of the model's ability to do these sorts of tasks, and then consider using corrections to the results you get in your tuning process.
95
+
150
96
### LLMs are not so great at
151
97
152
-
LLMs may struggle with solving math and computation. That said, improving some of these foundational skills may be something this work tackles in the future, but not at this time.
98
+
LLMs are not so great at these:
153
99
154
100
- Math
155
101
- Computation
156
102
- "Turing-complete"type tasks
157
103
- Generating only true real-world information (they're prone to hallucinations)
104
+
105
+
LLMs may struggle with solving math and computation problems. That said, improving some of these foundational skills may be something this work tackles in the future, but not at this time.
0 commit comments