Releases: Multi-Agent-LLMs/mallm
Releases · Multi-Agent-LLMs/mallm
v1.0.5
What's Changed
- Feat/new response generators by @lkaesberg in #149
- vllm fixes by @jonas-becker in #150
- Debate fix by @jonas-becker in #151
- summary renamed to judge by @jonas-becker in #152
- Nicer charts by @lkaesberg in #153
Full Changelog: v1.0.4...v1.0.5
v1.0.4
What's Changed
- Aqua rat by @jonas-becker in #145
- fix fstring by @jonas-becker in #146
- toml PEP 621 complicance + ifeval by @jonas-becker in #147
- Judge Agent by @jonas-becker in #148
Full Changelog: v1.0.3...v1.0.4
v1.0.3
Full Changelog: v1.0.2...v1.0.3
v1.0.2
Full Changelog: v1.0.1...v1.0.2
v1.0.1
What's Changed
- ifeval evaluation (instruction following) by @jonas-becker in #139
- no duplicate answer choices by @jonas-becker in #143
- Feat/challenge results by @lkaesberg in #144
Full Changelog: v1.0.0...v1.0.1
Added release on pypi
Public Release
What's Changed
- Create update_readme.yml by @jonas-becker in #72
- add workflow py script by @jonas-becker in #73
- Added QA metric for GPQA, MMLU, etc. by @ItsNiklas in #74
- Small Code Style Improvements by @ItsNiklas in #77
- Fix readme updater by @jonas-becker in #78
- feat: added batch executor for mallm by @lkaesberg in #81
- ResponseGenerators: Handling prompts, extraction, and agreements by @jonas-becker in #79
- More robust multichoice metric + fixed datasets by @jonas-becker in #82
- Prompt improvements (+ majority consensus fix) by @jonas-becker in #85
- Fix/no unlimited voting by @lkaesberg in #88
- Extensive evaluation by @jonas-becker in #92
-
- sort output file after finishing 2) comparable moderator by @jonas-becker in #91
- feat: When there is a dataset issue, print exactly what the issue is by @jpwahle in #98
- Support hf datasets by @jpwahle in #93
- fix: Fix a bug where forgetting trailing slash in memory bucket leads to undesired behaviour by @jpwahle in #100
- Ablation by @jonas-becker in #99
- Refactor/evaluator by @lkaesberg in #90
- fix: Fix a bug where the HF dataset is not sorted properly. by @jpwahle in #102
- Fix/data load and debug output by @lkaesberg in #105
- Feat/plotting by @lkaesberg in #106
- squad metric enhancements by @jonas-becker in #107
- Paraphrase types agent generator by @jonas-becker in #103
- fix out full path by @jonas-becker in #109
- Feat/rich output by @lkaesberg in #110
- fix: unanimity is unreachable because of faulty condition by @lkaesberg in #108
- Prompt changes and minor adjustments by @jonas-becker in #114
- Distinct-N metric by @jonas-becker in #116
- Remove dbm memory by @jonas-becker in #115
- feat: added instruction templates by @lkaesberg in #118
- batch executor refinements by @jonas-becker in #113
- More expressive param names + flexible number of neutral agents by @jonas-becker in #119
- feat: all agents generate a first draft and after that improve by @lkaesberg in #120
- feat: added mmlu pro dataset by @lkaesberg in #123
- feat: add musr dataset by @lkaesberg in #125
- feat: add math lvl 5 dataset downloader by @lkaesberg in #126
- Freely combine persona types + NoPersonaGenerator by @jonas-becker in #122
- feat: added prompts for new datasets by @lkaesberg in #127
- feat: add mallm command line scripts by @lkaesberg in #128
- Feat/evaluator with alterations by @lkaesberg in #129
- feat: added metric that checks if answer is included in response by @lkaesberg in #131
- feat: added discord webhook to batch processing by @lkaesberg in #130
- feat: add mmlu dataset by @lkaesberg in #132
- BBQ, MoCa, MoralExceptQA Datasets + metadata field by @jonas-becker in #133
- Feat/summarize decision protocol by @lkaesberg in #124
- policy feedback agent by @jonas-becker in #134
- Persona diversity index by @jonas-becker in #136
- WinoGrande, ETHICS datasets by @jonas-becker in #137
- Consensus Voting by @jonas-becker in #138
- feat: add new commandline script to execute mallm batch mode by @lkaesberg in #140
- feat: add challenge of final answer to test consistency by @lkaesberg in #141
- update readme by @jonas-becker in #142
New Contributors
Full Changelog: v0.1.0-alpha...v1.0.0
v0.1.0-alpha
This is the first release of MALLM (alpha).
What's Changed
- Fix/setup by @lkaesberg in #1
- memory redesign, more extensive output logs, refactoring, bug fixes by @jonas-becker in #2
- Feat/new build system by @lkaesberg in #3
- refactor: add abstract class for datasets by @lkaesberg in #4
- feat: improve readme by @lkaesberg in #6
- Feat/logging by @lkaesberg in #7
- Feat/formatting by @lkaesberg in #8
- Added GPQA Dataset, cffi dependency for linux by @ItsNiklas in #10
- Tgi implementation by @jonas-becker in #9
- reorganized files by @jonas-becker in #16
- Multi source datasets by @jonas-becker in #19
- Create unit test by @jonas-becker in #21
- Fix/discussion by @lkaesberg in #17
- Refactor/coordinator by @lkaesberg in #18
- installable package by @jonas-becker in #25
- small fixes by @jonas-becker in #26
- WMT119, paraphrase types, and fixes by @jonas-becker in #28
- Feat/btvote by @lkaesberg in #27
- fix etpc and context by @jonas-becker in #35
- fix json stringify by @jonas-becker in #36
- Feat/decision protocol by @lkaesberg in #30
- feat: added agent history as a chat format by @lkaesberg in #38
- samples left logging and type hints by @jonas-becker in #37
- stream answers to reduce memory usage by @jonas-becker in #40
- Strict types by @lkaesberg in #42
- Refactor/GitHub action by @lkaesberg in #45
- Openai support by @jonas-becker in #47
- feat: added tests for coordinator by @lkaesberg in #46
- Refactor/moderator by @lkaesberg in #44
- Evaluation framework by @jonas-becker in #50
- Feat: fixed number of turns by @jonas-becker in #53
- Improve prompts by @jonas-becker in #55
- Feat baseline by @jonas-becker in #56
- Added IPIP Persona Generator by @ItsNiklas in #59
- fix missing hf_token, btvote not shuffled, max_samples by @jonas-becker in #57
- Discussion length by @jonas-becker in #58
- add pytest pre commit hook by @jonas-becker in #60
- Added more decision protocols by @lkaesberg in #49
- Split agree and answer by @lkaesberg in #54
- refactor: introduce config dataclass to remove duplicate code and mak… by @lkaesberg in #64
- Improve extraction reliability by @jonas-becker in #63
- failed samples logging by @jonas-becker in #65
- Stability fixes by @jonas-becker in #66
New Contributors
- @lkaesberg made their first contribution in #1
- @jonas-becker made their first contribution in #2
Full Changelog: https://github.com/Multi-Agent-LLMs/mallm/commits/v0.1.0-alpha