Skip to content

Commit 0330db7

Browse files
mcorbin-ibmjjasgharjuliadenham
committed
Edits to the knowledge docs
- edited 3 knowledge docs files - removed broken links and list of knowledge domains Signed-off-by: Michelle Corbin <corbinm@us.ibm.com> Co-Authored-By: JJ Asghar <awesome@ibm.com> Co-Authored-By: Julia Denham <jdenham@redhat.com>
1 parent ad22bb3 commit 0330db7

File tree

3 files changed

+55
-104
lines changed

3 files changed

+55
-104
lines changed

docs/taxonomy/knowledge/contribution_details.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,22 @@ description: The overview of 🐶 InstructLab's Knowledge contribution guideline
44
logo: images/ilab_dog.png
55
---
66

7-
You can create a Git repository to host your knowledge contributions anywhere (GitLab, Gerrit, etc.) but it may be favorable to create one on GitHub. The following instructions show you how to create a knowledge repository in GitHub and contribute to the taxonomy.
7+
You can create a Git repository to host your knowledge contributions anywhere (GitLab, Gerrit, etc.) but it might be favorable to create one on GitHub. The following instructions show you how to create a knowledge repository in GitHub and contribute to the taxonomy.
88

99
## Prerequisites
1010

1111
- You have a GitHub account
1212
- You have a forked copy of the [taxonomy](https://github.com/instructlab/taxonomy/tree/main) repository
13-
- Verify that the model does not already know the knowledge you want to submit
13+
- You have verified that the model does not already know the knowledge you want to submit
1414

1515
## Creating your own knowledge repository
1616

1717
To create a new GitHub repository, follow the GitHub documentation in [Creating a new repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-new-repository).
1818

1919
The specific steps are listed as follows:
2020

21-
1. In your GitHub profile page, navigate to the repositories tab. You will see a search bar where you can search your repositories, or create a new one.
22-
2. This takes you to a page titled “Create a new repository”. Create a custom name for your repository and add a README.md file. For example, “knowlege_contributions” could be a good name for your repository.
21+
1. In your GitHub profile page, navigate to the repositories tab. You will see a search bar where you can search your repositories or create a new one.
22+
2. This takes you to a page titled “Create a new repository”. Create a custom name for your repository and add a `README.md` file. For example, “knowlege_contributions” could be a good name for your repository.
2323
3. Click “Create” when you are all set.
2424

2525
## Convert your knowledge documentation to markdown
@@ -40,15 +40,15 @@ The specific steps are listed as follows:
4040
3. You can then see your new content in your repository.
4141

4242
!!! important
43-
Make a note of your commit SHA; you need it for your `qna.yaml`.
43+
Make a note of your commit SHA; you'll need it for your `qna.yaml`.
4444

4545
## Create a pull request in the taxonomy repository
4646

4747
Navigate to your forked taxonomy repository and ensure it is up-to-date.
4848

4949
There are a few ways you can create a pull request:
5050

51-
- For details on the local process, check out [The GitHub Workflow Guide](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md) in the kubernetes documentation and the [GitHub flow](https://docs.github.com/en/get-started/using-github/github-flow) in the GitHub documentation.
51+
- For details on the local process, check out [The GitHub Workflow Guide](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md) in the Kubernetes documentation and the [GitHub flow](https://docs.github.com/en/get-started/using-github/github-flow) in the GitHub documentation.
5252
- For details on contributing using the GitHub webpage UI, see [Contributing using the GH UI](https://github.com/instructlab/taxonomy/docs/contributing_via_GH_UI.md) or [Creating a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request?tool=webui) in the GitHub documentation.
5353

5454
## Verification
@@ -61,7 +61,7 @@ Here are a few things to check before seeking reviews for your contribution:
6161

6262
## PR Upstream Workflow
6363

64-
The following table outlines the expected timing for the PR(s) you have put in. The PRs go through a few steps, and checks, but you should be able to map your `label` to
64+
The following table outlines the expected timing for the PRs you have submitted. The PRs go through a few steps, and checks, but you should be able to map your `label` to
6565
the place that it is in.
6666

6767
| Label | Actor | Action | Duration |

docs/taxonomy/knowledge/guide.md

Lines changed: 25 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Knowledge Guide
3-
description: The overview of 🐶 InstructLab's knowledge
3+
description: An overview of 🐶 InstructLab's knowledge
44
logo: images/ilab_dog.png
55
---
66
# What is "Knowledge"?
@@ -11,7 +11,7 @@ Knowledge contributions in this project contain a few things.
1111

1212
- A file in a git repository that holds your information. For example, these repositories can include markdown versions of information on: Oscar 2024 winners, Law books, Shakespeare, Sports, Chemistry, etc.
1313
- A `qna.yaml` file that asks and answers questions about the information in the git repository.
14-
- A `attribution.txt` that includes the sources for the information used in the `qna.yaml`.
14+
- An `attribution.txt` file that includes the sources for the information used in the `qna.yaml`.
1515

1616
You can learn more about the knowledge structure in [Getting Started with Knowledge contributions](https://github.com/instructlab/taxonomy/blob/main/README.md#getting-started-with-knowledge-contributions).
1717

@@ -20,97 +20,39 @@ You can learn more about the knowledge structure in [Getting Started with Knowle
2020
!!! important
2121
We are currently only accepting knowledge contributions as a limited private beta and sources will be limited to articles from Wikipedia.
2222

23-
There are a few domains of knowledge that we are currently accepting. For a full list of knowledge fields, see [Knowledge domains](https://github.com/instructlab/taxonomy/blob/main/knowledge/knowledge_domains.md) in the taxonomy documentation
24-
25-
A few examples are as follows:
26-
27-
### STEM fields
28-
29-
- Physics
30-
- Astronomy and Astrophysics
31-
- Quantum Mechanics
32-
- Special Relativity and General Relativity
33-
34-
- Chemistry & Chemical Engineering
35-
- Organic Chemistry
36-
- Inorganic Chemistry
37-
- Chemical engineering
38-
- Biotechnology
39-
40-
- Earth & Environmental Science
41-
- Geology
42-
- Geography
43-
44-
- Biology & Life Sciences
45-
- Plants (Botany)
46-
- Medicine & health
47-
48-
- Electrical Engineering
49-
- Bioengineering
50-
- Civil Engineering
51-
- Industrial Engineering
52-
53-
### Legal and regulatory
54-
55-
- Intellectual Property
56-
- Criminal Law
57-
- Civil Rights
58-
- Healthcare compliance
59-
60-
### Economy and Business
61-
62-
- Economy and Businesses
63-
- Accounting and Finance
64-
- Marketing
65-
- Human Resource
66-
- Management
67-
68-
### Philosophy
69-
70-
- Philosophy
71-
- Metaphysics
72-
- Epistemology
73-
- Ethics
74-
- Parapsychology & occultism
75-
- Philosophical schools of thought
76-
77-
### Literature
78-
79-
- Literature, rhetoric & criticism
80-
- American literature in English
81-
- Other literatures
23+
These are the main knowledge domains that we are currently accepting knowledge contributions for: arts, engineering, geography, history, linguistics, mathematics, philosophy, religion, science, and technology.
8224

8325
## Avoid These Topics
8426

85-
While the tuning process may eventually benefit from being used to help the models work with complex social topics, at this time this is an area of active research we do not want to take lightly. Therefore please keep your submissions clear of the following topics:
27+
While the tuning process may eventually benefit from being used to help the models work with complex social topics, at this time this is an area of active research we do not want to take lightly. Therefore, please keep your submissions clear of the following topics:
8628

8729
- PII (personally identifiable information) or any content invasive of individual privacy rights
88-
- Violence including self-harm
89-
- Cyber Bullying
90-
- Internal documentation or other that is confidential to your employer or organization, e.g. trade secrets
30+
- Violence, including self-harm
31+
- Cyber bullying
32+
- Internal documentation or other information that is confidential to your employer or organization, such as trade secrets
9133
- Discrimination
9234
- Religion
93-
- Facts such as, "[Christianity is, according to the 2011 census, the fifth most practiced religion in Nepal, with 375,699 adherents, or 1.4% of the population](https://en.wikipedia.org/wiki/Christianity_in_Nepal)", are fine as a knowledge contribution. Advocating in favor of or against any religious faith is not acceptable.
35+
- Facts such as, "[Christianity is, according to the 2011 census, the fifth most practiced religion in Nepal, with 375,699 adherents, or 1.4% of the population](https://en.wikipedia.org/wiki/Christianity_in_Nepal)", are fine as a knowledge contribution. However, advocating in favor of or against any religious faith is not acceptable.
9436
- Medical or health information
95-
- Facts such as, "[In mammals, pulmonary ventilation occurs via inhalation (breathing)](https://opentextbc.ca/biology/chapter/11-3-circulatory-and-respiratory-systems/)," are fine as a knowledge contribution. Tailored medical/health advice is not acceptable.
37+
- Facts such as, "[In mammals, pulmonary ventilation occurs via inhalation (breathing)](https://opentextbc.ca/biology/chapter/11-3-circulatory-and-respiratory-systems/)," are fine as a knowledge contribution. However, tailored medical/health advice is not acceptable.
9638
- Financial information
97-
- Facts such as "[laissez-faire economics ... argues that market forces alone should drive the economy and that governments should refrain from direct intervention in or moderation of the economic system](https://openstax.org/books/world-history-volume-2/pages/6-3-capitalism-and-the-first-industrial-revolution)," are fine as a knowledge contribution. Tailored financial advice is not acceptable.
98-
- Legal settlements/mitigations
99-
- Gender Bias
100-
- Hostile Language, threats, slurs, derogatory or insensitive jokes or comments
39+
- Facts such as "[laissez-faire economics ... argues that market forces alone should drive the economy and that governments should refrain from direct intervention in or moderation of the economic system](https://openstax.org/books/world-history-volume-2/pages/6-3-capitalism-and-the-first-industrial-revolution)," are fine as a knowledge contribution. However, tailored financial advice is not acceptable.
40+
- Legal settlements or mitigations
41+
- Gender bias
42+
- Hostile language, threats, slurs, and derogatory or insensitive jokes or comments
10143
- Profanity
10244
- Pornography and sexually explicit or suggestive content
103-
- Any contributions that would allow for automated decision making that affect an individual's rights or well-being, e.g. social scoring
45+
- Any contributions that would allow for automated decision making that affect an individual's rights or well-being, such as social scoring
10446
- Any contributions that engage in political campaigning or lobbying
10547

10648
We are also not accepting submissions of the following content:
10749

10850
- Code
109-
- Anything code-related that can be traced back to code for a computer. Not limited to `sed` or `bash` but `yaml`s for OpenShift or Kubernetes, to `python` snippets to `Java` suggestions. There are specific models focused on this space and this isn't for this model for the time being.
51+
- Anything code-related that can be traced back to code for a computer. Not limited to `sed` or `bash` or `yaml`s for OpenShift or Kubernetes, to `python` snippets to `Java` suggestions. There are specific models focused on this space and this isn't for this model for the time being.
11052
- Jokes
11153
- Poems
11254

113-
We received many joke and poem submissions at the beginning of the project, and with jokes being "in the eye of the beholder" and puns requiring nuance for native English speakers, we realized we were possibly unconsciously biasing our model. We have discovered that working with both topics has its own challenges, and if we want something generalized, finding consensus was unsuccessful. For now, we're not accepting additional submissions of jokes and poems.
55+
We received many joke and poem submissions at the beginning of the project, and with jokes and poems being "in the eye of the beholder" and puns requiring nuance for native English speakers, we realized we were possibly unconsciously biasing our model. We have discovered that working with both topics has its own challenges, and if we want something generalized, finding consensus was unsuccessful. For now, we're not accepting additional submissions of jokes and poems.
11456

11557
## Building Your LLM Intuition
11658

@@ -130,28 +72,34 @@ With a few of these qna's, the model will learn the periodic table because it ha
13072
13173
### LLMs are great at
13274
133-
For these, however, it's common for LLMs to already have excellent performance. Try 3-5 examples in `lab chat` to confirm a deficit in the model before you build your submission, and share the examples in your Pull Request (PR).
75+
LLMs are great at these:
13476
13577
- Brainstorming
13678
- Creativity
13779
- Connecting information
13880
- Cross-lingual behavior
13981
82+
For these, however, it's common for LLMs to already have excellent performance. Try 3-5 examples in `lab chat` to confirm a deficit in the model before you build your submission, and then share the examples in your Pull Request (PR).
83+
14084
### LLMs need help with
14185

142-
LLM behavior in these sorts of topics are very difficult for the model to get right. Try several examples to understand the nuances of the model's ability to do these sorts of tasks, and consider using corrections to the results you get in your tuning process.
86+
LLMs need help with these:
14387

14488
- Chains of reasoning
14589
- Analysis
14690
- Story plots
14791
- Reassembling information
14892
- Effective and succinct summaries
14993

94+
LLM behavior in these sorts of topics are very difficult for the model to get right. Try several examples to understand the nuances of the model's ability to do these sorts of tasks, and then consider using corrections to the results you get in your tuning process.
95+
15096
### LLMs are not so great at
15197

152-
LLMs may struggle with solving math and computation. That said, improving some of these foundational skills may be something this work tackles in the future, but not at this time.
98+
LLMs are not so great at these:
15399

154100
- Math
155101
- Computation
156102
- "Turing-complete" type tasks
157103
- Generating only true real-world information (they're prone to hallucinations)
104+
105+
LLMs may struggle with solving math and computation problems. That said, improving some of these foundational skills may be something this work tackles in the future, but not at this time.

0 commit comments

Comments
 (0)