Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
165 commits
Select commit Hold shift + click to select a range
f8725ca
Merge pull request #209 from Learnware-LAMDA/liuht_check_docs
GeneLiuXe Jan 17, 2024
f2ddd26
[DOC] change repo url
GeneLiuXe Jan 23, 2024
113a8a6
Merge pull request #210 from Learnware-LAMDA/mnt_repo_url
GeneLiuXe Jan 23, 2024
c964226
[DOC] change html tags
GeneLiuXe Jan 24, 2024
ed1a856
Merge pull request #211 from Learnware-LAMDA/fix_html
GeneLiuXe Jan 24, 2024
408d13e
Merge pull request #212 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
1fb0759
[FIX, DOC] fix readthedocs config bug, modify framework svg
bxdd Jan 24, 2024
9c97754
Merge pull request #213 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
e356b3a
[FIX] update readthedocs yam l
bxdd Jan 24, 2024
3d5e0eb
Merge pull request #214 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
a022728
[MNT] update readthedocs config
bxdd Jan 24, 2024
676852b
Merge pull request #215 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
ca3f03f
[MNT] add sphinx_book_theme req
bxdd Jan 24, 2024
d7b1ec5
Merge pull request #216 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
79b53e7
[FIX] fix readthedocs req with pip
bxdd Jan 24, 2024
09c2e4f
Merge pull request #217 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
06913b3
[FIX] fix readthedocs req to install all
bxdd Jan 24, 2024
2c7b1f0
Merge pull request #218 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
23544ed
[DOC, FIX] update badge, update autodoc config, fix no logo in doc bug
bxdd Jan 24, 2024
33191ce
Merge pull request #219 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
f2024fd
[MNT] publish 0.3.2 version
bxdd Jan 24, 2024
c6030a6
[MNT] fix autodoc bug with List
bxdd Jan 24, 2024
d4bf97f
[FIX] pass flake8 test
bxdd Jan 24, 2024
a2a91d4
Merge pull request #220 from Learnware-LAMDA/rel1
bxdd Jan 24, 2024
4af1118
[DOC] polish contents
GeneLiuXe Jan 24, 2024
a109612
Merge pull request #221 from Learnware-LAMDA/polish_contents
GeneLiuXe Jan 24, 2024
8e4bbe7
[DOC] modify details
GeneLiuXe Jan 24, 2024
cbadd7a
Merge pull request #222 from Learnware-LAMDA/polish_contents
GeneLiuXe Jan 24, 2024
4155e36
[DOC] polish contents of README
GeneLiuXe Jan 24, 2024
a2dc5f7
[DOC] polish contents of project docs
GeneLiuXe Jan 24, 2024
ff7c4dc
Merge pull request #223 from Learnware-LAMDA/polish_docs_content
GeneLiuXe Jan 24, 2024
69bc1cd
[DOC] fix details
GeneLiuXe Jan 24, 2024
6fb9707
[DOC] fix issue about line break
GeneLiuXe Jan 24, 2024
45cf66b
Merge pull request #224 from Learnware-LAMDA/polish_docs_content
GeneLiuXe Jan 24, 2024
9531603
Update CHANGES.rst
bxdd Jan 25, 2024
5af7bc9
Merge pull request #225 from Learnware-LAMDA/bxdd-patch-10
bxdd Jan 25, 2024
7c816e3
[DOCS] polish contents in README and docs
liuht-0807 Jan 28, 2024
bf663f7
[DOC] polish details
GeneLiuXe Jan 28, 2024
f38d40c
[DOC] update system features doc
liuht-0807 Jan 29, 2024
7042ad7
[DOC] add the arXiv reference
liuht-0807 Jan 29, 2024
5dc310e
[DOC] modify auto workflow example in quick.rst
liuht-0807 Jan 29, 2024
002fe33
[DOC] update references
liuht-0807 Jan 29, 2024
bbc4b26
Update README.md
bxdd Jan 30, 2024
6a8ef91
Merge pull request #228 from Learnware-LAMDA/bxdd-patch-11
bxdd Jan 30, 2024
d832c3f
Update Citation
bxdd Jan 30, 2024
2b99bcf
Merge pull request #229 from Learnware-LAMDA/bxdd-patch-12
bxdd Jan 30, 2024
d422bca
[DOC, FIX] Update Citation
bxdd Jan 30, 2024
606bbd3
Merge pull request #230 from Learnware-LAMDA/bxdd-patch-13
bxdd Jan 30, 2024
61469ee
[DOC, FIX] Update Citation
bxdd Jan 30, 2024
234978a
Merge pull request #231 from Learnware-LAMDA/bxdd-patch-14
bxdd Jan 30, 2024
39c9fe9
[FIX, DOC] Update ZH Citation
bxdd Jan 30, 2024
c6b8691
Merge pull request #232 from Learnware-LAMDA/bxdd-patch-15
bxdd Jan 30, 2024
b5dc682
Merge pull request #227 from Learnware-LAMDA/polish_content
GeneLiuXe Feb 1, 2024
2c4a00a
[DOC] update citation in readme
GeneLiuXe Feb 1, 2024
2028725
Merge pull request #233 from Learnware-LAMDA/modify_citation
GeneLiuXe Feb 1, 2024
f9e7365
[DOC] add the github link
GeneLiuXe Feb 29, 2024
e27211f
Merge pull request #234 from Learnware-LAMDA/github_link
GeneLiuXe Feb 29, 2024
2ec441f
[ENH] build the framework of llm market
Asymptotez Nov 25, 2024
58f3c87
[ENH] add llm specifications
liuht-0807 Nov 25, 2024
c3ca7c9
[FIX] add import
liuht-0807 Nov 26, 2024
7b67d49
Merge remote-tracking branch 'origin/feature/llm' into feature/llm_ma…
Asymptotez Nov 28, 2024
a7c8f97
[MNT] add checker for TaskVectorSpecification
Asymptotez Nov 28, 2024
43597ff
[ENH] add llm searcher
liuht-0807 Nov 29, 2024
5f57283
[MNT | ENH] refactor searchers into BasicSearcher and CombinedSearcher
liuht-0807 Dec 6, 2024
098481a
[FIX] fix bugs in market module
liuht-0807 Dec 6, 2024
71aa76e
[MNT] add LLMBenchmark; extend LearnwareBenchmark to LearnwareBenchma…
liuht-0807 Dec 12, 2024
5bfa023
[ENH] Add "Model Type" for sematic sepcification and modify checker
Asymptotez Dec 16, 2024
5571633
[FIX] fix bugs
liuht-0807 Dec 17, 2024
0a4fa12
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
liuht-0807 Dec 17, 2024
1497553
[MNT] Add MODEL_TYPE in SemanticSpecificationKey
Asymptotez Dec 18, 2024
1e9de39
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
Asymptotez Dec 18, 2024
2b1572a
[MNT] add LLMEasyOrganizer, modify LLMGeneralCapabilitySpecification
liuht-0807 Dec 19, 2024
6a6ee9a
[MNT] modify general capability specification details
liuht-0807 Dec 19, 2024
aaf80da
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
Asymptotez Dec 19, 2024
ed9c94c
[ENH] automatically download required learnware
zouxiaochuan Dec 20, 2024
a0d3994
Merge commit 'ed9c94c3362663e6d5a242056d18f561f9d6f4a1' into feature/…
Googol2002 Dec 20, 2024
44d701f
[ENH] Add new Specification for Large Language Model (LLM).
Googol2002 Dec 20, 2024
f21e8df
Merge pull request #235 from Learnware-LAMDA/add_dependency
zouxiaochuan Dec 20, 2024
0458788
Add Test for LLM Specification.
Googol2002 Dec 20, 2024
c1f5994
[MIT] Remove some redundant codes.
Googol2002 Dec 23, 2024
c21ed43
[ENH | FIX] complete llm benchmark, general capability specification …
SYCzzc Dec 24, 2024
f98ff08
Merge remote-tracking branch 'origin/feature/llm_benchmark_and_genera…
Googol2002 Dec 25, 2024
ac3b71b
[ENH] Add _search_by_taskvector_spec_single in LLMStatSearcher
Googol2002 Jan 1, 2025
82d5461
[MNT] Releasing commented out test code
Googol2002 Jan 1, 2025
95cad2f
[MNT] add check for learnware.yaml's new fields
Asymptotez Jan 2, 2025
e3916ba
[MNT] modify details in general_capability_spec
SYCzzc Jan 2, 2025
e7402d9
[MNT] modify class annotations
SYCzzc Jan 7, 2025
51cca25
Merge remote-tracking branch 'origin/feature/llm_benchmark_and_genera…
Googol2002 Jan 9, 2025
0e6b96c
[MNT] modify test_text_generative.py
SYCzzc Jan 9, 2025
8b40ed1
Merge branch 'feature/add_taskvector_spec' of https://github.com/Lear…
Googol2002 Jan 9, 2025
4636e99
[FIX] Work around trl package bug with multi-GPU parallelism
Googol2002 Jan 9, 2025
63a7079
[MNT] Adding Dependencies
Googol2002 Jan 10, 2025
f220087
Merge pull request #236 from Learnware-LAMDA/feature/add_taskvector_spec
Asymptotez Jan 10, 2025
3392585
[FIX] add `trust_remote_code` parameter to fix dataset loading
Asymptotez Jan 10, 2025
4ba0f98
[FIX] fix bug in func `parse_specification_type`
Asymptotez Jan 10, 2025
3015307
[FIX] fix variable name conflicts bug
Asymptotez Jan 10, 2025
9eec902
[FIX] add text generation task type
zouxiaochuan Jan 11, 2025
46a468e
Merge pull request #237 from Learnware-LAMDA/fix_add_text_gen
zouxiaochuan Jan 11, 2025
8a86dba
[FIX] bug fix: add comma
zouxiaochuan Jan 11, 2025
f788a66
[FIX] update package versions in setup.py for python3.8 compatibility
Asymptotez Jan 11, 2025
e43c203
[FIX] simplify applicability checks in `EasyExactSemanticSearcher` an…
Asymptotez Jan 15, 2025
a515318
[FIX] initial implementation of `__call__` func of `LLMStatSearcher`
Asymptotez Jan 15, 2025
6b73f93
Merge pull request #238 from Learnware-LAMDA/fix_add_text_gen2
zouxiaochuan Jan 16, 2025
a730f3e
[FIX] Fix the generation logic of `remain_config_list` to ensure that…
Asymptotez Jan 16, 2025
0c91b29
[FIX] add model type to default semantic specification
zouxiaochuan Jan 16, 2025
70e59e5
Merge commit 'a730f3e354856895bded227b42422afd21366ae1' into feature/…
Googol2002 Jan 16, 2025
cd7cf16
[ENH] Download models from beimingwu
Googol2002 Jan 16, 2025
ace3702
[FIX] fix typo.
Googol2002 Jan 16, 2025
5b6445a
[MNT] add general specification test
SYCzzc Jan 17, 2025
89d5338
[MNT] Modify interfaces and add corresponding tests.
Googol2002 Jan 17, 2025
5dbbeda
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
SYCzzc Jan 17, 2025
a42471a
[FIX] update `sentence_transformers` version to 3.2.1
Asymptotez Jan 18, 2025
07a180a
[MNT | FIX] adjust logic for checking base model
Asymptotez Jan 18, 2025
f3735b0
[ENH] add `get_model` method to retrieve the `nn.Module` object in `B…
Asymptotez Jan 18, 2025
c3ea59f
[MNT] add general_spec test of checker and organizer and modify some …
SYCzzc Jan 19, 2025
b9ae1c1
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
SYCzzc Jan 19, 2025
ab9550d
[FIX] fix CUDA OOM bug
SYCzzc Jan 20, 2025
8f02405
[FIX] get backend url from env in client
zouxiaochuan Jan 20, 2025
32f51b3
[FIX] Modify type checking in generate_generative_model_spec.
Googol2002 Jan 21, 2025
bff7c25
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
Googol2002 Jan 21, 2025
c38f461
[FIX] Modify type checking in generate_generative_model_spec.
Googol2002 Jan 21, 2025
8b54ce9
Merge pull request #239 from Learnware-LAMDA/add_model_type
zouxiaochuan Jan 21, 2025
07a26e7
[MNT] add dist to TaskVectorSpecification
Googol2002 Jan 22, 2025
1fa823d
[ENH] Add _convert_similarity_to_score
Googol2002 Jan 23, 2025
9a7cc62
Merge commit '1fa823d73b1f9c4e2383d0e105bcfcd9fc53b022' into HEAD
Googol2002 Jan 23, 2025
3516d70
[MNT] add exception detection in general_spec generation
SYCzzc Feb 15, 2025
de70332
Merge branch 'main' into feature/llm_market
zouxiaochuan Feb 21, 2025
3824e24
final test of llm_market
zouxiaochuan Mar 14, 2025
6da4aa3
[ENH] Complete workflow of skipping evaluation
SYCzzc Mar 17, 2025
56758b0
[MNT] update some hyperparameter in llm generative specification and …
SYCzzc Mar 20, 2025
598d429
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
SYCzzc Mar 21, 2025
077101a
final test of llm_market
zouxiaochuan Mar 14, 2025
8b28749
Merge pull request #241 from Learnware-LAMDA/feature/llm_market
zouxiaochuan Mar 21, 2025
77b00f1
[MNT] Complete workflow loading local learnwares
SYCzzc Mar 21, 2025
4229aa7
[MNT] modify details
SYCzzc Mar 21, 2025
b45b88b
Merge branch 'feature/llm_market' of https://github.com/Learnware-LAM…
SYCzzc Mar 21, 2025
1622738
[MNT] Complete llm workflow, modify details and init the Readme.
SYCzzc Mar 22, 2025
ec0cc62
[MNT] modify details and update README in llm workflow
SYCzzc Mar 25, 2025
e89bcef
[MNT] do not return learnware that has semantic not in user query
zouxiaochuan Mar 25, 2025
a04cafc
Merge pull request #242 from Learnware-LAMDA/bugfix_semantic_search
zouxiaochuan Mar 25, 2025
23f3e82
[MNT] add Optional type in semantic search
zouxiaochuan Mar 25, 2025
61d43c5
Merge pull request #243 from Learnware-LAMDA/bugfix_add_optional_search
zouxiaochuan Mar 26, 2025
06d4f59
[MNT] modify learnware_ids in llm_workflow config
SYCzzc Mar 26, 2025
9f4a531
[MNT] modify generative.py on loading base and adapter model from bei…
SYCzzc Mar 26, 2025
47df488
Merge pull request #244 from Learnware-LAMDA/feature/llm_workflow
Googol2002 Mar 26, 2025
3d569cc
[MNT] modify workflow details
SYCzzc May 20, 2025
fbaa570
Merge pull request #245 from Learnware-LAMDA/feature/llm_workflow
SYCzzc May 20, 2025
75e09c6
[MNT] update version to 0.4.0 and enhance changelog
Asymptotez May 20, 2025
be4d7d3
[FIX] update datasets version to 2.16.0
Asymptotez May 24, 2025
5a7e4a7
[DOC] update llm-related readme
SYCzzc May 24, 2025
9f60651
[MNT] update version to 0.4.0.post1 and add changelog entry for bugfi…
Asymptotez May 25, 2025
54b29c3
Merge branch 'main' into feature/llm
Asymptotez May 25, 2025
233c185
[MNT] fix the code style
SYCzzc May 25, 2025
6b9cd91
[FIX] Update github actions ubuntu version
Asymptotez May 25, 2025
49878d0
[FIX] Update github actions ubuntu version
Asymptotez May 25, 2025
703ecb5
Merge branch 'feature/llm' of https://github.com/Learnware-LAMDA/Lear…
Asymptotez May 25, 2025
c5b4199
[MNT] Remove redundant comments.
Googol2002 May 25, 2025
b98853e
[MNT] Refactor imports and clean up code structure across multiple mo…
Asymptotez May 25, 2025
b0af2a0
[MNT] Update torch and torchvision dependencies to latest compatible …
Asymptotez May 25, 2025
ed029f3
Merge branch 'feature/llm' of https://github.com/Learnware-LAMDA/Lear…
Asymptotez May 25, 2025
2e9bce1
[MNT] Remove unused imports and clean up code for F401 & F541.
Asymptotez May 25, 2025
bd9d778
Merge branch 'main' of https://github.com/Learnware-LAMDA/Learnware-P…
Asymptotez May 25, 2025
d5e4834
Merge branch 'feature/llm' of https://github.com/Learnware-LAMDA/Lear…
Asymptotez May 25, 2025
361f8a4
[MNT] Another fix for F401.
Asymptotez May 25, 2025
c86ebb0
Merge branch 'main' of https://github.com/Learnware-LAMDA/Learnware-P…
Asymptotez May 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test_learnware_with_pip.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04]
os: [ubuntu-22.04]
python-version: [3.9]

steps:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/test_learnware_with_source.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-20.04]
os: [ubuntu-22.04]
python-version: [3.9]

steps:
Expand Down Expand Up @@ -50,4 +50,4 @@ jobs:

- name: Test workflow
run: |
conda run -n learnware python -m pytest tests/test_workflow/test_hetero_workflow.py
conda run -n learnware python -m pytest tests/test_workflow/test_hetero_workflow.py
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ dist/
*.pkl
*.hd5
*.csv
!/examples/dataset_llm_workflow/model_performance/medical.csv
!/examples/dataset_llm_workflow/model_performance/math.csv
!/examples/dataset_llm_workflow/model_performance/finance.csv
*.out
*.html
*.dot
Expand Down Expand Up @@ -45,4 +48,5 @@ learnware_pool/
PFS/
data/
examples/results/
examples/*/results/
examples/*/results/
examples/*/user_specs/
15 changes: 12 additions & 3 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
Changelog
=========
Here you can see the full list of changes between ``learnware`` release.
Here you can see the full list of changes between ``learnware`` releases.

Version 0.3.2
Version 0.4.0.post1 (2025-05-25)
---------------
This is the first public release of ``learnware`` package.
* Bugfix release.

Version 0.4.0 (2025-05-20)
---------------
* Added support for 7B level language model learnwares.
* Added two new specifications, specifically designed for language model learnwares.

Version 0.3.2 (2024-01-24)
---------------
* First public release of ``learnware`` package.
45 changes: 45 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,51 @@ The results are depicted in the following table and figure. Similarly, even when
<img src="./docs/_static/img/text_labeled_curves.svg" width="500" height="auto" style="max-width: 100%;"/>
</div>

# LLM Experimental Results (New)

This section refers to Section 4 of our paper [*Learnware of Language Models: Specialized Small Language Models Can Do Big*](https://arxiv.org/abs/2505.13425). We simulate a learnware system comprising approximately 100 learnwares of specialized SLMs with 8B parameters, fine-tuned across finance, healthcare, and mathematics domains.

Experimental results demonstrate promising performance: by selecting one suitable learnware for each task-specific inference, the system outperforms the base SLMs on all benchmarks. Compared to LLMs, the system outperforms Qwen1.5-110B, Qwen2.5-72B, and Llama3.1-70B-Instruct by at least 14% in finance domain tasks. Additionally, it surpasses Flan-PaLM-540B (ranked 7th on the [Open Medical LLM Leaderboard](https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard)) in medical domain tasks.

The figure and table below show the performance value in finance scenario.

<div align=center>
<img src="./docs/_static/img/llm-finance.svg" width="800" height="auto" style="max-width: 100%;"/>
</div>

<div align=center>

| User | Qwen2.5-7B | Llama3.1-8B-Instruct | Llama3.1-8B | Qwen1.5-110B | Qwen2.5-72B | Llama3.1-70B-Instruct | Random | Learnware | Best-single | Oracle |
|:-------------------------|:-------------|:-----------------------|:--------------|:---------------|:--------------|:------------------------|:---------|:------------|:--------------|:---------|
| australian | 43.17 | 44.6 | 43.17 | 43.17 | 43.17 | 47.48 | 44.45 | 56.83 | 42.21 | 56.83 |
| cra_lendingclub | 80.82 | 76.33 | 57.34 | 80.82 | 47.01 | 53.07 | 81.52 | 92.07 | 80.82 | 92.07 |
| fiqasa | 38.3 | 40.43 | 56.17 | 63.4 | 64.26 | 68.51 | 46.53 | 76.38 | 32.06 | 76.38 |
| fpb | 76.08 | 32.78 | 30.72 | 70.72 | 78.35 | 78.04 | 67.95 | 84.25 | 77.73 | 84.25 |
| german | 65.0 | 49.5 | 66.0 | 66.0 | 66.5 | 43.5 | 51.5 | 67.06 | 65.33 | 67.06 |
| headlines | 74.81 | 59.95 | 59.95 | 62.96 | 77.84 | 77.53 | 72.43 | 95.61 | 95.61 | 95.61 |
| ner | 21.75 | 0.62 | 9.01 | 17.89 | 9.36 | 9.52 | 24.99 | 52.79 | 23.98 | 52.79 |
| sm_acl | 51.1 | 51.4 | 51.34 | 49.3 | 51.56 | 49.38 | 51.42 | 52.82 | 50.71 | 53.63 |
| sm_bigdata | 55.3 | 55.57 | 52.79 | 51.02 | 50.27 | 47.76 | 53.86 | 52.4 | 55.52 | 55.88 |
| sm_cikm | 58.44 | 54.24 | 54.07 | 44.01 | 58.27 | 47.86 | 55.89 | 55.99 | 57.98 | 58.52 |
| causal20_sc | 65.14 | 88.48 | 79.45 | 83.75 | 76.17 | 87.16 | 74.71 | 84.17 | 88.61 | 88.61 |
| finarg_ecc_arc | 64.78 | 46.67 | 60.0 | 62.32 | 63.04 | 44.64 | 62.27 | 64.31 | 57.87 | 68.36 |
| finarg_ecc_auc | 48.3 | 51.81 | 49.85 | 55.01 | 61.71 | 65.02 | 52.08 | 58.08 | 48.68 | 58.08 |
| fomc | 60.48 | 29.44 | 34.68 | 58.47 | 57.66 | 66.13 | 56.05 | 62.7 | 61.36 | 62.7 |
| ma | 79.2 | 56.4 | 51.0 | 81.4 | 84.6 | 83.2 | 73.64 | 79.81 | 79.27 | 79.81 |
| mlesg | 35.67 | 32.67 | 20.0 | 34.67 | 38.67 | 42.33 | 31.99 | 33.42 | 38.33 | 38.33 |
| multifin_en | 60.99 | 31.32 | 28.39 | 65.38 | 63.55 | 68.5 | 54.96 | 63.46 | 58.61 | 63.46 |
| Avg. | 57.61 | 47.19 | 47.29 | 58.25 | 58.35 | 57.63 | 56.25 | 66.6 | 59.69 | 67.79 |
| Avg. rank | 5.94 | 7.35 | 7.82 | 5.94 | 4.71 | 5.24 | 6.47 | 2.88 | 5.47 | 1.65 |
| Learnware (win/tie/loss) | 13/0/4 | 15/0/2 | 16/0/1 | 14/0/3 | 12/0/5 | 11/0/6 | 16/0/1 | nan | 12/1/4 | 0/11/6 |
| Oracle (win/tie/loss) | 17/0/0 | 17/0/0 | 17/0/0 | 15/0/2 | 13/0/4 | 12/0/5 | 17/0/0 | 6/11/0 | 14/3/0 | nan |

</div>

Our system demonstrates strong performance across financial tasks, achieving the highest average score among all methods, delivering an nearly 14\% improvement compared with the best large-scale model Qwen2.5-72B. It ranks first strategies utilizing specialized SLMs except Oracle in 13 out of 17 tasks, identifies the optimal learnware (tied with Oracle) on 11 and outperforms all contenders in 8.

These results shows that our system can match or surpass large-scale models with over 70B parameters under the Task-Level evaluation setting, while requiring only the memory for models under 8B efficiently.

**For more scenarios (medical and math) and details, please see [here](./examples/dataset_llm_workflow/README.md).**

# Citation

Expand Down
45 changes: 45 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,51 @@ feature_augment_predict_y = reuse_feature_augment.predict(user_data=test_x)
<img src="./docs/_static/img/text_labeled_curves.svg" width="500" height="auto" style="max-width: 100%;"/>
</div>

# LLM 实验结果(新增)

本节对应于我们的论文 [*Learnware of Language Models: Specialized Small Language Models Can Do Big*](https://arxiv.org/abs/2505.13425) 的第 4 部分。我们模拟建立了一个含有约 100 个 8B 级别专用 SLM 学件的学件基座系统,涵盖金融、医疗和数学三个领域。

实验结果展现了我们系统的良好性能:通过为每个专用领域任务选择一个合适的学件,该系统在所有场景的基准测试中均优于基座 SLM 以及基线算法;与 70B 以上的大参数规模语言模型相比,该系统在大幅减少显存占用的情况下,在金融领域中的性能表现至少比 Qwen1.5-110B、Qwen2.5-72B 和 Llama3.1-70B-Instruct 高出 14%。此外,在医疗领域中,它超越了 Flan-PaLM-540B(在 [Open Medical LLM Leaderboard](https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard) 上排名第七)。

下图和表格展示了不同方法或模型在金融评估场景上的性能分数:

<div align=center>
<img src="./docs/_static/img/llm-finance.svg" width="800" height="auto" style="max-width: 100%;"/>
</div>

<div align=center>

| User | Qwen2.5-7B | Llama3.1-8B-Instruct | Llama3.1-8B | Qwen1.5-110B | Qwen2.5-72B | Llama3.1-70B-Instruct | Random | Learnware | Best-single | Oracle |
|:-------------------------|:-------------|:-----------------------|:--------------|:---------------|:--------------|:------------------------|:---------|:------------|:--------------|:---------|
| australian | 43.17 | 44.6 | 43.17 | 43.17 | 43.17 | 47.48 | 44.45 | 56.83 | 42.21 | 56.83 |
| cra_lendingclub | 80.82 | 76.33 | 57.34 | 80.82 | 47.01 | 53.07 | 81.52 | 92.07 | 80.82 | 92.07 |
| fiqasa | 38.3 | 40.43 | 56.17 | 63.4 | 64.26 | 68.51 | 46.53 | 76.38 | 32.06 | 76.38 |
| fpb | 76.08 | 32.78 | 30.72 | 70.72 | 78.35 | 78.04 | 67.95 | 84.25 | 77.73 | 84.25 |
| german | 65.0 | 49.5 | 66.0 | 66.0 | 66.5 | 43.5 | 51.5 | 67.06 | 65.33 | 67.06 |
| headlines | 74.81 | 59.95 | 59.95 | 62.96 | 77.84 | 77.53 | 72.43 | 95.61 | 95.61 | 95.61 |
| ner | 21.75 | 0.62 | 9.01 | 17.89 | 9.36 | 9.52 | 24.99 | 52.79 | 23.98 | 52.79 |
| sm_acl | 51.1 | 51.4 | 51.34 | 49.3 | 51.56 | 49.38 | 51.42 | 52.82 | 50.71 | 53.63 |
| sm_bigdata | 55.3 | 55.57 | 52.79 | 51.02 | 50.27 | 47.76 | 53.86 | 52.4 | 55.52 | 55.88 |
| sm_cikm | 58.44 | 54.24 | 54.07 | 44.01 | 58.27 | 47.86 | 55.89 | 55.99 | 57.98 | 58.52 |
| causal20_sc | 65.14 | 88.48 | 79.45 | 83.75 | 76.17 | 87.16 | 74.71 | 84.17 | 88.61 | 88.61 |
| finarg_ecc_arc | 64.78 | 46.67 | 60.0 | 62.32 | 63.04 | 44.64 | 62.27 | 64.31 | 57.87 | 68.36 |
| finarg_ecc_auc | 48.3 | 51.81 | 49.85 | 55.01 | 61.71 | 65.02 | 52.08 | 58.08 | 48.68 | 58.08 |
| fomc | 60.48 | 29.44 | 34.68 | 58.47 | 57.66 | 66.13 | 56.05 | 62.7 | 61.36 | 62.7 |
| ma | 79.2 | 56.4 | 51.0 | 81.4 | 84.6 | 83.2 | 73.64 | 79.81 | 79.27 | 79.81 |
| mlesg | 35.67 | 32.67 | 20.0 | 34.67 | 38.67 | 42.33 | 31.99 | 33.42 | 38.33 | 38.33 |
| multifin_en | 60.99 | 31.32 | 28.39 | 65.38 | 63.55 | 68.5 | 54.96 | 63.46 | 58.61 | 63.46 |
| Avg. | 57.61 | 47.19 | 47.29 | 58.25 | 58.35 | 57.63 | 56.25 | 66.6 | 59.69 | 67.79 |
| Avg. rank | 5.94 | 7.35 | 7.82 | 5.94 | 4.71 | 5.24 | 6.47 | 2.88 | 5.47 | 1.65 |
| Learnware (win/tie/loss) | 13/0/4 | 15/0/2 | 16/0/1 | 14/0/3 | 12/0/5 | 11/0/6 | 16/0/1 | nan | 12/1/4 | 0/11/6 |
| Oracle (win/tie/loss) | 17/0/0 | 17/0/0 | 17/0/0 | 15/0/2 | 13/0/4 | 12/0/5 | 17/0/0 | 6/11/0 | 14/3/0 | nan |

</div>

我们的系统在金融任务中表现出色,在所有方法中取得了最高的平均得分,比表现最好的大参数规模模型 Qwen2.5-72B 性能提高了14\%。在 17 个任务中,有 13 个任务的得分高于除 Oracle 外的专用 SLM 模型选择方法,在11个任务上查搜到了最优学件(性能表现与Oracle一致),在 8 个任务上战胜了所有其他方法或模型。

上述结果表明,在任务级评估的实验设定下,仅查搜使用参数规模在 8B 级别的小型语言模型,学件基座系统的整体表现可以媲美甚至超越参数规模在 70B 以上的大模型,并大幅降低模型推理时的显存占用。

**更多场景(医学和数学)上的实验结果和详细信息,请参阅[此处](./examples/dataset_llm_workflow/README.md)。**

# 引用

Expand Down
Loading
Loading