Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,8 @@ The Dataset column links to publicly available datasets (e.g., on HuggingFace).
| Arc Agi | knowledge | Solve puzzles designed to test intelligence. See https://arcprize.org/arc-agi. | Improve puzzle-solving capabilities. | - | ✓ | - | <a href='resources_servers/arc_agi/configs/arc_agi.yaml'>arc_agi.yaml</a> | - |
| Aviary | agent | Multi-hop question answering on the HotPotQA dataset with Wikipedia search | Improve knowledge and agentic capability | ✓ | ✓ | Apache 2.0 | <a href='resources_servers/aviary/configs/hotpotqa_aviary.yaml'>hotpotqa_aviary.yaml</a> | - |
| Aviary | math | GSM8k benchmark with calculator tool | Test math and agentic capability | ✓ | ✓ | Apache 2.0 | <a href='resources_servers/aviary/configs/gsm8k_aviary.yaml'>gsm8k_aviary.yaml</a> | - |
| Base Gymnasium | other | Base class for Gymnasium-style servers. Not a standalone server. | Reusable base class for step/reset style environments | - | - | - | <a href='resources_servers/base_gymnasium/configs/base_gymnasium.yaml'>base_gymnasium.yaml</a> | - |
| Blackjack | games | Blackjack. Model hits or stands. Reward +1 win, 0 draw, -1 loss/bust. | Example gymnasium-style multi-step environment | - | - | - | <a href='resources_servers/blackjack/configs/blackjack.yaml'>blackjack.yaml</a> | - |
| Calendar | agent | Multi-turn calendar scheduling dataset. User states events and constraints in natural language; model schedules events to satisfy all constraints. | Improve multi-turn instruction following capabilities | ✓ | ✓ | Apache 2.0 | <a href='resources_servers/calendar/configs/calendar.yaml'>calendar.yaml</a> | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-calendar_scheduling'>Nemotron-RL-agent-calendar_scheduling</a> |
| Calendar | agent | Multi-turn calendar scheduling dataset. User states events and constraints in natural language; model schedules events to satisfy all constraints. | Improve multi-turn instruction following capabilities | ✓ | ✓ | Creative Commons Attribution 4.0 International | <a href='resources_servers/calendar/configs/calendar_v2.yaml'>calendar_v2.yaml</a> | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-Instruction-Following-Calendar-v2'>Nemotron-RL-Instruction-Following-Calendar-v2</a> |
| Circle Click | other | Click on circles in images | Improve visual grounding and spatial reasoning | - | - | - | <a href='resources_servers/circle_click/configs/circle_click.yaml'>circle_click.yaml</a> | - |
Expand All @@ -171,6 +173,7 @@ The Dataset column links to publicly available datasets (e.g., on HuggingFace).
| Genrm Compare | rlhf | GenRM pairwise comparison for RLHF training | Compare multiple candidate responses using GenRM model | - | - | - | <a href='resources_servers/genrm_compare/configs/genrm_compare.yaml'>genrm_compare.yaml</a> | - |
| Google Search | agent | Multi-choice question answering problems with search tools integrated | Improve knowledge-related benchmarks with search tools | ✓ | - | Apache 2.0 | <a href='resources_servers/google_search/configs/google_search.yaml'>google_search.yaml</a> | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa</a> |
| Gpqa Diamond | knowledge | GPQA Diamond multiple-choice question answering problems | Evaluate graduate-level scientific reasoning via MCQ verification | ✓ | - | MIT | <a href='resources_servers/gpqa_diamond/configs/gpqa_diamond.yaml'>gpqa_diamond.yaml</a> | - |
| Grl Sokoban | games | Single-box Sokoban in Gymnasium API style. | Model emits one move per turn until the puzzle is solved. | - | - | - | <a href='resources_servers/grl_sokoban/configs/grl_sokoban.yaml'>grl_sokoban.yaml</a> | - |
| Ifbench | instruction_following | IFBench instruction following evaluation using AllenAI's IFBench library (57 instruction types) | Improve IFBench instruction following | - | - | - | <a href='resources_servers/ifbench/configs/ifbench.yaml'>ifbench.yaml</a> | - |
| Instruction Following | instruction_following | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities | Improve IFEval and IFBench | ✓ | - | Apache 2.0 | <a href='resources_servers/instruction_following/configs/instruction_following.yaml'>instruction_following.yaml</a> | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following'>Nemotron-RL-instruction_following</a> |
| Jailbreak Detection | safety | Jailbreak detection with Nemotron judge + combined reward | - | - | ✓ | - | <a href='resources_servers/jailbreak_detection/configs/jailbreak_detection_nemotron_combined_reward_tp8.yaml'>jailbreak_detection_nemotron_combined_reward_tp8.yaml</a> | - |
Expand Down
Loading
Loading