Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions environments/ether0/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# ether0 benchmark environment

[Benchmark](https://huggingface.co/datasets/futurehouse/ether0-benchmark) and [paper](https://arxiv.org/pdf/2506.17238).

325 chemistry reasoning questions across 14 task types. All answers are a molecule. Around 25 questions per task, including:

- Completing SMILES fragments
- Designing molecules adhering to molecular formula and functional group constraints
- Predicting reaction outcomes
- Proposing one-step synthesis pathways
- Editing the solubility of a molecule
- Converting IUPAC name to SMILES
- Answering multiple-choice questions about safety, ADME properties, BBB permeability, toxicity, scent, and pKa

Note that retro-synthesis and oracle-solubility require an additional verifier server (see `ether0-serve` in the [ether0 repo](https://github.com/Future-House/ether0/)).

## Quickstart

Create `env.yaml`:
```
policy_base_url: http://localhost:8000/v1
policy_api_key: EMPTY
policy_model_name: futurehouse/ether0
```

Start servers and collect rollouts
```bash
# start vllm and nemo gym servers
vllm serve futurehouse/ether0 &
ng_run "+config_paths=[environments/ether0/config.yaml,responses_api_models/vllm_model/configs/vllm_model.yaml]" &

# wait for above to be ready
ng_collect_rollouts \
+agent_name=ether0_simple_agent \
+input_jsonl_fpath=environments/ether0/data/example.jsonl \
+output_jsonl_fpath=environments/ether0/data/ether0_rollouts.jsonl

tail -n 1 environments/ether0/data/ether0_rollouts.jsonl | jq | less
```

See `prepare.py` to prepare the full dataset.
Empty file added environments/ether0/__init__.py
Empty file.
26 changes: 26 additions & 0 deletions environments/ether0/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
ether0:
resources_servers:
ether0:
entrypoint: app.py
domain: knowledge
verified: false
description: ether0 chemistry benchmark verifiers
value: Evalutate chemistry knowledge and reasoning with ether0 benchmark
ether0_simple_agent:
responses_api_agents:
simple_agent:
entrypoint: app.py
resources_server:
type: resources_servers
name: ether0
model_server:
type: responses_api_models
name: policy_model
datasets:
- name: example
type: example
jsonl_fpath: environments/ether0/data/example.jsonl
- name: val
type: validation
jsonl_fpath: environments/ether0/data/val.jsonl
license: Creative Commons Attribution 4.0 International
6 changes: 6 additions & 0 deletions environments/ether0/data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
*train.jsonl
*validation.jsonl
*val.jsonl
*train_prepare.jsonl
*validation_prepare.jsonl
*example_prepare.jsonl
5 changes: 5 additions & 0 deletions environments/ether0/data/example.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{"responses_create_params": {"input": [{"role": "system", "content": "You are a scientific reasoning agent. Think step by step, then place your final answer inside <answer></answer> tags. For example: <answer>CCO</answer>"}, {"role": "user", "content": "Generate a SMILES representation for a molecule containing groups: charged and nitro. It should also have formula C13H12N6O5."}]}, "verifier_metadata": {"solution": "functional_group_eval!:!('C13H12N6O5', ['charged', 'nitro'])!:!functional-group", "problem_type": "functional-group", "ideal": "Cc1ncc([N+](=O)[O-])n1CC(=O)N/N=C/c1ccc([N+](=O)[O-])cc1", "id": "00c8bc2d-0bb3-53c2-8bdf-cd19616d4536"}, "agent_ref": {"type": "responses_api_agents", "name": "ether0_simple_agent"}}
{"responses_create_params": {"input": [{"role": "system", "content": "You are a scientific reasoning agent. Think step by step, then place your final answer inside <answer></answer> tags. For example: <answer>CCO</answer>"}, {"role": "user", "content": "Among the following, which molecule is predicted to have a permeability in MDCK cells in MDR1-MDCK efflux ratio B-A/A-B close to 1.04?\nFC(C1=NC(C(NC2C(OCC)=CC3=NN(CCC(O)(C)C)C=C3C=2)=O)=CC=C1)F\nC(NC1=CC2=CN(N=C2C=C1C(C)(O)C)CCC(C)(O)C)(=O)C1N=C(C=CC=1)C(F)(F)F\nC12C=C(NC(=O)C3C=CN(C(F)F)N=3)C(OC)=CC1=NN(C=2)CCC(C)(C)O"}]}, "verifier_metadata": {"solution": "str_eval!:!FC(C1=NC(C(NC2C(OCC)=CC3=NN(CCC(O)(C)C)C=C3C=2)=O)=CC=C1)F!:!property-regression-adme/log_mdr1-mdck_er", "problem_type": "property-regression-adme/log_mdr1-mdck_er", "ideal": "CCOc1cc2nn(CCC(C)(C)O)cc2cc1NC(=O)c1cccc(C(F)F)n1", "id": "066b28c7-c991-5095-8045-a5da176c150a"}, "agent_ref": {"type": "responses_api_agents", "name": "ether0_simple_agent"}}
{"responses_create_params": {"input": [{"role": "system", "content": "You are a scientific reasoning agent. Think step by step, then place your final answer inside <answer></answer> tags. For example: <answer>CCO</answer>"}, {"role": "user", "content": "A compound with formula C30H44O7 was isolated from Nerium oleander L.. What is a plausible SMILES for it given this organism?\nspecies: Nerium oleander L.\ntaxonomicGroup: Angiosperms\nhabitat: Temperate and subtropical areas, along river banks, stream beds in river valleys, roadsides, parks, coastal gardens\nlifestyle: Free-living\nmetabolicType: Photoautotrophic\ncellularOrganization: Multicellular\npresenceOfOrganelles: Mitochondria, chloroplasts\ncellWallComposition: Cellulose"}]}, "verifier_metadata": {"solution": "formula_eval!:!CO[C@@H]1C[C@H](O[C@H]2CC[C@@]3(C)[C@H](CC[C@]45CC[C@H](C6=CC(=O)OC6)[C@@](C)(CC[C@H]34)C5=O)C2)O[C@H](C)[C@@H]1O!:!molecule-formula", "problem_type": "molecule-formula", "ideal": "CO[C@@H]1C[C@H](O[C@H]2CC[C@@]3(C)[C@H](CC[C@]45CC[C@H](C6=CC(=O)OC6)[C@@](C)(CC[C@H]34)C5=O)C2)O[C@H](C)[C@@H]1O", "id": "a0e5657a-901a-5888-af3b-87c8c8471ea8"}, "agent_ref": {"type": "responses_api_agents", "name": "ether0_simple_agent"}}
{"responses_create_params": {"input": [{"role": "system", "content": "You are a scientific reasoning agent. Think step by step, then place your final answer inside <answer></answer> tags. For example: <answer>CCO</answer>"}, {"role": "user", "content": "Identify which of the following molecules will most likely have a rat LD50 oral in mg/kg of 6.09:\nC(CS)(=O)O.C(O)CN\nFCC(=O)O\nClCC(=O)O"}]}, "verifier_metadata": {"solution": "str_eval!:!FCC(=O)O!:!property-regression-ld50", "problem_type": "property-regression-ld50", "ideal": "FCC(=O)O", "id": "6af247d8-aaec-5047-8b57-af1b42f9d38a"}, "agent_ref": {"type": "responses_api_agents", "name": "ether0_simple_agent"}}
{"responses_create_params": {"input": [{"role": "system", "content": "You are a scientific reasoning agent. Think step by step, then place your final answer inside <answer></answer> tags. For example: <answer>CCO</answer>"}, {"role": "user", "content": "Given that molecule [N+](=O)([O-])C1=CC=CC2=CC=CC=C12 is toxic, select from below the molecule most expected to not have this characteristic:\n[N+](=O)([O-])C1=C(C)C(=CC=C1)[N+](=O)[O-]\n[N+](=O)([O-])C1=CC=CC2=N[Se]N=C21\nIC1=C(C=CC=C1[N+](=O)[O-])[N+](=O)[O-]\nBrC=1C=C(C2=CC=CC=C2C1)[N+](=O)[O-]"}]}, "verifier_metadata": {"solution": "str_eval!:!BrC=1C=C(C2=CC=CC=C2C1)[N+](=O)[O-]!:!property-cat-safety/delta-toxic", "problem_type": "property-cat-safety/delta-toxic", "ideal": "BrC=1C=C(C2=CC=CC=C2C1)[N+](=O)[O-]", "id": "ea5a9ab5-7207-5016-8465-4634f7db5437"}, "agent_ref": {"type": "responses_api_agents", "name": "ether0_simple_agent"}}
38 changes: 38 additions & 0 deletions environments/ether0/data/example_metrics.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
{
"name": "example",
"type": "example",
"jsonl_fpath": "resources_servers/ether0/data/example.jsonl",
"num_repeats": 1,
"gitlab_identifier": null,
"huggingface_identifier": null,
"license": null,
"Number of examples": 5,
"Number of tools": {
"Total # non-null values": 0,
"Average": 0.0,
"Min": 0.0,
"Max": 0.0,
"Standard deviation": 0.0
},
"Json-dumped number of words (proxy for token count)": {
"Total # non-null values": 5,
"Average": 52.6,
"Min": 46.0,
"Max": 75.0,
"Standard deviation": 12.64
},
"Number of turns": {
"Total # non-null values": 5,
"Average": 1.0,
"Min": 1.0,
"Max": 1.0,
"Standard deviation": 0.0
},
"Temperature": {
"Total # non-null values": 0,
"Average": 0.0,
"Min": 0.0,
"Max": 0.0,
"Standard deviation": 0.0
}
}
Loading
Loading