diff --git a/environments/code_gen/README.md b/environments/code_gen/README.md new file mode 100644 index 000000000..1258f414c --- /dev/null +++ b/environments/code_gen/README.md @@ -0,0 +1,39 @@ +# Competitive Coding Resources Server + +### Overview +Verifies competitive programming solutions by executing submitted code against unit tests. The server consumes agent trajectories and returns a reward based on whether the assistant's code produces the correct outputs for given test inputs. + +The dataset is in preparation and the example data can be found in `data/example.jsonl`. + +### Input schema +- `responses_create_params`: OpenAI Responses create params + - Use only a user message with the problem statement and instructions (e.g., "You are an expert competitive programmer..."). + - `verifier_metadata` (required): + - `unit_tests` (required): dict with `inputs` and `outputs` arrays containing test cases. + - `inputs`: list of strings representing stdin input for each test case + - `outputs`: list of strings representing expected stdout output for each test case + +**Notes** +- All test cases must pass for a solution to receive a reward of 1.0 +- Failed test cases result in a reward of 0.0 with detailed error information + +### Test execution (for now) +We use the LiveCodeBench execution code. + +### Example of rollouts and usage + +```bash +# Running the server +config_paths="responses_api_models/openai_model/configs/openai_model.yaml,\ +environments/code_gen/config.yaml" +ng_run "+config_paths=[$config_paths]" + +# Collect rollouts from example problems +ng_collect_rollouts +agent_name=code_gen_simple_agent \ + +input_jsonl_fpath=environments/code_gen/data/example.jsonl \ + +output_jsonl_fpath=environments/code_gen/data/example_rollouts.jsonl \ + +limit=null +``` + +## Licensing information +Apache 2.0 diff --git a/environments/code_gen/__init__.py b/environments/code_gen/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/environments/code_gen/config.yaml b/environments/code_gen/config.yaml new file mode 100644 index 000000000..f5f62eb17 --- /dev/null +++ b/environments/code_gen/config.yaml @@ -0,0 +1,42 @@ +code_gen: + resources_servers: + code_gen: + entrypoint: app.py + domain: coding + verified: true + verified_url: https://wandb.ai/nvidia/bxyu-nemo-gym-rl-integration-20250926/runs/54uzarwq + description: Model must submit the right code to solve a problem + value: Improve competitive coding capabilities + num_processes: 8 + unit_test_timeout_secs: 10 + debug: false +code_gen_simple_agent: + responses_api_agents: + simple_agent: + entrypoint: app.py + resources_server: + type: resources_servers + name: code_gen + model_server: + type: responses_api_models + name: policy_model + datasets: + - name: opencodereasoning_filtered_train + type: train + jsonl_fpath: environments/code_gen/data/opencodereasoning_filtered_25k_train.jsonl + huggingface_identifier: + repo_id: nvidia/nemotron-RL-coding-competitive_coding + artifact_fpath: opencodereasoning_filtered_25k_train.jsonl + license: Apache 2.0 + num_repeats: 1 + - name: livecodebench_v5_validation + type: validation + jsonl_fpath: environments/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl + huggingface_identifier: + repo_id: nvidia/nemotron-RL-coding-competitive_coding + artifact_fpath: validation.jsonl + license: MIT + num_repeats: 10 # Use livecodebench setting, even if sampling parameters differ. + - name: example + type: example + jsonl_fpath: environments/code_gen/data/example.jsonl diff --git a/environments/code_gen/data/.gitignore b/environments/code_gen/data/.gitignore new file mode 100644 index 000000000..4424b6fde --- /dev/null +++ b/environments/code_gen/data/.gitignore @@ -0,0 +1,5 @@ +*train.jsonl +*validation.jsonl +*train_prepare.jsonl +*validation_prepare.jsonl +*example_prepare.jsonl diff --git a/environments/code_gen/data/example.jsonl b/environments/code_gen/data/example.jsonl new file mode 100644 index 000000000..2bab199f3 --- /dev/null +++ b/environments/code_gen/data/example.jsonl @@ -0,0 +1,5 @@ +{"responses_create_params": {"input": [{"role": "user", "content": "You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\nPlease use python programming language only.\n\nYou must use ```python for just the final solution code block with the following format:\n```python\n# Your code here\n```\n\nPolycarp has $n$ different binary words. A word called binary if it contains only characters '0' and '1'. For example, these words are binary: \"0001\", \"11\", \"0\" and \"0011100\".\n\nPolycarp wants to offer his set of $n$ binary words to play a game \"words\". In this game, players name words and each next word (starting from the second) must start with the last character of the previous word. The first word can be any. For example, these sequence of words can be named during the game: \"0101\", \"1\", \"10\", \"00\", \"00001\".\n\nWord reversal is the operation of reversing the order of the characters. For example, the word \"0111\" after the reversal becomes \"1110\", the word \"11010\" after the reversal becomes \"01011\".\n\nProbably, Polycarp has such a set of words that there is no way to put them in the order correspondent to the game rules. In this situation, he wants to reverse some words from his set so that: the final set of $n$ words still contains different words (i.e. all words are unique); there is a way to put all words of the final set of words in the order so that the final sequence of $n$ words is consistent with the game rules. \n\nPolycarp wants to reverse minimal number of words. Please, help him.\n\n\n-----Input-----\n\nThe first line of the input contains one integer $t$ ($1 \\le t \\le 10^4$) \u2014 the number of test cases in the input. Then $t$ test cases follow.\n\nThe first line of a test case contains one integer $n$ ($1 \\le n \\le 2\\cdot10^5$) \u2014 the number of words in the Polycarp's set. Next $n$ lines contain these words. All of $n$ words aren't empty and contains only characters '0' and '1'. The sum of word lengths doesn't exceed $4\\cdot10^6$. All words are different.\n\nGuaranteed, that the sum of $n$ for all test cases in the input doesn't exceed $2\\cdot10^5$. Also, guaranteed that the sum of word lengths for all test cases in the input doesn't exceed $4\\cdot10^6$.\n\n\n-----Output-----\n\nPrint answer for all of $t$ test cases in the order they appear.\n\nIf there is no answer for the test case, print -1. Otherwise, the first line of the output should contain $k$ ($0 \\le k \\le n$) \u2014 the minimal number of words in the set which should be reversed. The second line of the output should contain $k$ distinct integers \u2014 the indexes of the words in the set which should be reversed. Words are numerated from $1$ to $n$ in the order they appear. If $k=0$ you can skip this line (or you can print an empty line). If there are many answers you can print any of them.\n\n\n-----Example-----\nInput\n4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n\nOutput\n1\n3 \n-1\n0\n\n2\n1 2"}]}, "verifier_metadata": {"unit_tests": {"inputs": ["4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n"], "outputs": ["1\n3 \n-1\n0\n\n2\n1 2 \n"]}}, "hash_id": "c69268d8bdb4da0685d7b187c88296c1", "dataset": "taco", "source": "codeforces"} +{"responses_create_params": {"input": [{"role": "user", "content": "You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\nPlease use python programming language only.\n\nYou must use ```python for just the final solution code block with the following format:\n```python\n# Your code here\n```\n\nMikhail walks on a Cartesian plane. He starts at the point $(0, 0)$, and in one move he can go to any of eight adjacent points. For example, if Mikhail is currently at the point $(0, 0)$, he can go to any of the following points in one move: $(1, 0)$; $(1, 1)$; $(0, 1)$; $(-1, 1)$; $(-1, 0)$; $(-1, -1)$; $(0, -1)$; $(1, -1)$. \n\nIf Mikhail goes from the point $(x1, y1)$ to the point $(x2, y2)$ in one move, and $x1 \\ne x2$ and $y1 \\ne y2$, then such a move is called a diagonal move.\n\nMikhail has $q$ queries. For the $i$-th query Mikhail's target is to go to the point $(n_i, m_i)$ from the point $(0, 0)$ in exactly $k_i$ moves. Among all possible movements he want to choose one with the maximum number of diagonal moves. Your task is to find the maximum number of diagonal moves or find that it is impossible to go from the point $(0, 0)$ to the point $(n_i, m_i)$ in $k_i$ moves.\n\nNote that Mikhail can visit any point any number of times (even the destination point!).\n\n\n-----Input-----\n\nThe first line of the input contains one integer $q$ ($1 \\le q \\le 10^4$) \u2014 the number of queries.\n\nThen $q$ lines follow. The $i$-th of these $q$ lines contains three integers $n_i$, $m_i$ and $k_i$ ($1 \\le n_i, m_i, k_i \\le 10^{18}$) \u2014 $x$-coordinate of the destination point of the query, $y$-coordinate of the destination point of the query and the number of moves in the query, correspondingly.\n\n\n-----Output-----\n\nPrint $q$ integers. The $i$-th integer should be equal to -1 if Mikhail cannot go from the point $(0, 0)$ to the point $(n_i, m_i)$ in exactly $k_i$ moves described above. Otherwise the $i$-th integer should be equal to the the maximum number of diagonal moves among all possible movements.\n\n\n-----Example-----\nInput\n3\n2 2 3\n4 3 7\n10 1 9\n\nOutput\n1\n6\n-1\n\n\n\n-----Note-----\n\nOne of the possible answers to the first test case: $(0, 0) \\to (1, 0) \\to (1, 1) \\to (2, 2)$.\n\nOne of the possible answers to the second test case: $(0, 0) \\to (0, 1) \\to (1, 2) \\to (0, 3) \\to (1, 4) \\to (2, 3) \\to (3, 2) \\to (4, 3)$.\n\nIn the third test case Mikhail cannot reach the point $(10, 1)$ in 9 moves."}]}, "verifier_metadata": {"unit_tests": {"inputs": ["3\n2 2 3\n4 3 7\n10 1 9\n", "3\n2 2 3\n4 3 7\n7 1 9\n", "3\n2 2 3\n4 3 9\n7 1 9\n", "3\n2 2 3\n6 1 9\n7 1 1\n", "3\n2 1 3\n6 1 9\n7 0 2\n", "3\n2 0 3\n3 1 9\n13 0 2\n", "3\n4 0 3\n3 1 9\n13 0 2\n", "3\n7 0 0\n3 1 5\n22 0 2\n", "3\n1 0 0\n5 0 5\n22 0 2\n", "3\n1 0 1\n5 0 5\n22 0 2\n", "3\n1 0 1\n9 0 5\n35 0 1\n", "3\n2 2 3\n4 3 6\n10 1 9\n", "3\n4 2 3\n4 3 7\n7 1 9\n", "3\n2 2 3\n6 3 9\n7 2 9\n", "3\n2 2 3\n6 3 9\n7 1 12\n", "3\n3 2 3\n6 1 9\n7 1 9\n", "3\n2 2 3\n6 1 12\n7 1 1\n", "3\n2 2 3\n6 2 9\n7 0 1\n", "3\n5 0 3\n3 2 9\n13 0 2\n", "3\n7 0 0\n3 1 0\n22 0 2\n", "3\n1 0 1\n5 1 5\n22 0 3\n", "3\n1 1 1\n9 -1 5\n35 0 1\n", "3\n4 2 3\n4 3 7\n4 1 9\n", "3\n2 2 1\n6 3 9\n7 2 9\n", "3\n2 2 3\n1 3 9\n7 1 12\n", "3\n3 2 1\n6 1 9\n7 1 9\n", "3\n2 0 4\n6 1 9\n7 0 2\n", "3\n2 1 3\n6 2 9\n25 0 2\n", "3\n1 1 0\n3 1 3\n22 0 2\n", "3\n2 2 1\n1 3 9\n7 1 12\n", "3\n2 2 3\n6 2 18\n10 0 1\n", "3\n2 1 0\n6 2 9\n25 0 2\n", "3\n2 2 3\n6 3 9\n7 1 9\n", "3\n2 2 3\n6 1 9\n7 1 9\n", "3\n2 2 3\n6 1 9\n7 0 1\n", "3\n2 2 3\n6 1 9\n7 0 2\n", "3\n2 1 3\n6 1 9\n13 0 2\n", "3\n2 0 3\n6 1 9\n13 0 2\n", "3\n5 0 3\n3 1 9\n13 0 2\n", "3\n5 0 0\n3 1 9\n13 0 2\n", "3\n5 0 0\n3 1 9\n22 0 2\n", "3\n7 0 0\n3 1 9\n22 0 2\n", "3\n1 0 0\n3 1 5\n22 0 2\n", "3\n1 0 0\n5 1 5\n22 0 2\n", "3\n1 0 0\n5 0 5\n22 0 1\n", "3\n1 0 0\n5 0 5\n22 1 1\n", "3\n1 0 1\n5 0 5\n22 0 3\n", "3\n1 0 1\n5 0 5\n22 0 1\n", "3\n1 0 1\n5 0 5\n35 0 1\n", "3\n1 0 1\n9 -1 5\n35 0 1\n", "3\n1 0 1\n9 0 5\n35 -1 1\n", "3\n2 0 3\n6 1 9\n7 0 2\n", "3\n2 1 3\n6 1 9\n10 0 2\n", "3\n2 1 3\n6 1 9\n25 0 2\n", "3\n2 0 3\n6 1 9\n13 -1 2\n", "3\n2 0 1\n3 1 9\n13 0 2\n", "3\n4 0 3\n1 1 9\n13 0 2\n", "3\n5 0 0\n3 1 9\n13 0 0\n", "3\n5 -1 0\n3 1 9\n22 0 2\n", "3\n1 1 0\n3 1 5\n22 0 2\n", "3\n1 0 0\n5 1 5\n38 0 2\n", "3\n1 0 0\n5 0 5\n22 0 0\n", "3\n1 0 0\n5 0 5\n22 2 1\n", "3\n1 0 1\n5 0 5\n22 1 1\n", "3\n1 0 1\n9 0 5\n15 0 1\n", "3\n1 0 1\n9 0 5\n35 -1 0\n", "3\n2 0 3\n4 3 6\n10 1 9\n", "3\n2 2 3\n6 1 12\n5 1 1\n", "3\n2 2 3\n6 2 9\n10 0 1\n", "3\n2 2 3\n6 1 9\n10 0 2\n", "3\n3 0 3\n6 1 9\n13 0 2\n", "3\n2 0 1\n3 1 9\n20 0 2\n", "3\n4 0 3\n1 1 9\n13 0 1\n", "3\n5 0 3\n3 2 9\n13 0 3\n", "3\n5 0 0\n5 1 9\n13 0 0\n", "3\n3 -1 0\n3 1 9\n22 0 2\n", "3\n7 0 0\n3 1 1\n22 0 2\n", "3\n1 0 0\n5 1 5\n38 0 3\n", "3\n1 0 0\n5 0 5\n22 2 0\n", "3\n1 0 1\n5 1 5\n17 0 3\n", "3\n1 0 1\n5 1 5\n22 1 1\n", "3\n1 0 1\n9 0 0\n15 0 1\n", "3\n1 1 1\n9 0 5\n35 0 1\n", "3\n1 0 1\n9 0 5\n52 -1 0\n", "3\n2 0 3\n7 3 6\n10 1 9\n", "3\n4 2 3\n4 1 7\n4 1 9\n", "3\n2 2 1\n4 3 9\n7 2 9\n", "3\n3 2 1\n6 1 9\n7 1 2\n", "3\n2 2 3\n6 1 12\n9 1 1\n", "3\n2 0 0\n6 1 9\n7 0 2\n", "3\n2 2 3\n6 2 9\n10 0 2\n", "3\n2 0 3\n1 1 9\n13 0 2\n", "3\n2 0 1\n3 0 9\n20 0 2\n", "3\n4 0 1\n1 1 9\n13 0 1\n", "3\n5 0 3\n3 2 9\n13 0 4\n", "3\n5 0 0\n5 1 9\n22 0 0\n", "3\n2 2 3\n4 3 7\n10 1 9\n"], "outputs": ["1\n6\n-1\n", "1\n6\n9\n", "1\n8\n9\n", "1\n8\n-1\n", "2\n8\n-1\n", "1\n9\n-1\n", "-1\n9\n-1\n", "-1\n5\n-1\n", "-1\n4\n-1\n", "0\n4\n-1\n", "0\n-1\n-1\n", "1\n5\n-1\n", "-1\n6\n9\n", "1\n8\n8\n", "1\n8\n10\n", "2\n8\n9\n", "1\n11\n-1\n", "1\n7\n-1\n", "-1\n8\n-1\n", "-1\n-1\n-1\n", "0\n5\n-1\n", "1\n-1\n-1\n", "-1\n6\n8\n", "-1\n8\n8\n", "1\n9\n10\n", "-1\n8\n9\n", "4\n8\n-1\n", "2\n7\n-1\n", "-1\n3\n-1\n", "-1\n9\n10\n", "1\n18\n-1\n", "-1\n7\n-1\n", "1\n8\n9\n", "1\n8\n9\n", "1\n8\n-1\n", "1\n8\n-1\n", "2\n8\n-1\n", "1\n8\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n5\n-1\n", "-1\n5\n-1\n", "-1\n4\n-1\n", "-1\n4\n-1\n", "0\n4\n-1\n", "0\n4\n-1\n", "0\n4\n-1\n", "0\n-1\n-1\n", "0\n-1\n-1\n", "1\n8\n-1\n", "2\n8\n-1\n", "2\n8\n-1\n", "1\n8\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n5\n-1\n", "-1\n5\n-1\n", "-1\n4\n-1\n", "-1\n4\n-1\n", "0\n4\n-1\n", "0\n-1\n-1\n", "0\n-1\n-1\n", "1\n5\n-1\n", "1\n11\n-1\n", "1\n7\n-1\n", "1\n8\n-1\n", "2\n8\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n8\n-1\n", "-1\n9\n-1\n", "-1\n9\n-1\n", "-1\n-1\n-1\n", "-1\n5\n-1\n", "-1\n4\n-1\n", "0\n5\n-1\n", "0\n5\n-1\n", "0\n-1\n-1\n", "1\n-1\n-1\n", "0\n-1\n-1\n", "1\n-1\n-1\n", "-1\n6\n8\n", "-1\n8\n8\n", "-1\n8\n-1\n", "1\n11\n-1\n", "-1\n8\n-1\n", "1\n7\n-1\n", "1\n9\n-1\n", "-1\n8\n-1\n", "-1\n9\n-1\n", "-1\n8\n-1\n", "-1\n9\n-1\n", "1\n6\n-1\n"]}}, "hash_id": "570676e22bd21c28e452486b0668ec74", "dataset": "taco", "source": "codeforces"} +{"responses_create_params": {"input": [{"role": "user", "content": "You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\nPlease use python programming language only.\n\nYou must use ```python for just the final solution code block with the following format:\n```python\n# Your code here\n```\n\nYou are given three sequences: $a_1, a_2, \\ldots, a_n$; $b_1, b_2, \\ldots, b_n$; $c_1, c_2, \\ldots, c_n$.\n\nFor each $i$, $a_i \\neq b_i$, $a_i \\neq c_i$, $b_i \\neq c_i$.\n\nFind a sequence $p_1, p_2, \\ldots, p_n$, that satisfy the following conditions:\n\n\n\n $p_i \\in \\{a_i, b_i, c_i\\}$\n\n $p_i \\neq p_{(i \\mod n) + 1}$.\n\nIn other words, for each element, you need to choose one of the three possible values, such that no two adjacent elements (where we consider elements $i,i+1$ adjacent for $i \\max(n, m)$, Mikhail must take some \"detours\" (i.e., possibly moving back and forth to waste moves).\n- **Parity Check:** He can only make a detour move if the extra moves to be wasted can be arranged such that he ends up back at the destination. In this 8-way grid, he can move back and forth without changing the end position, but after all moves, the parity of $n + m$ and $k$ must match (otherwise, you can't waste just one move and land still at $(n,m)$).\n- **Maximum Diagonal Moves:** As many as possible, i.e., $\\min(n, m)$. However, with extra moves, unless there's some restriction, he could potentially waste them.\n\n#### Implementation Plan\n\nFor each query:\n1. Compute $min\\_moves = \\max(n, m)$.\n2. If $k < min\\_moves$ \u2192 impossible, answer $-1$.\n3. Check parity: if $(k - min\\_moves) \\% 2 != 0$ \u2192 impossible, answer $-1$.\n4. The maximal number of diagonal moves is $\\min(n, m)$. For any extra moves: if both $n$, $m$ are even/odd, you can use all moves as diagonal or detour without issue; otherwise, as long as the parity works, that's enough.\n\n#### Example:\n\nFor $(10, 1, 9)$:\n- $min\\_moves = 10$\n- $k = 9 < 10$ \u2192 Impossible\n\nFor $(2, 2, 3)$:\n- $min\\_moves = 2$\n- $k = 3 \\geq 2$\n- $(k - min\\_moves) = 1$, which is odd, so this is okay since both $n$, $m$ are even. But since he needs to land back at the destination, parity with $k$ must match $n+m$.\n\nLets formalize the check:\n\n*If $(k - \\max(n,m)) \\% 2 \\equiv (n + m) \\% 2$, then possible, else impossible.*\n\nBut actually, in 8-way moves, it's sufficient to check $(k - \\max(n, m))$ is even.\n\nLet's observe the general rule for 8-sided moves:\n\nIt **is always possible** if $k \\geq \\max(n, m)$ and $(k-\\max(n,m))$ is even.\n\n---\n\nNow, let's put this together in the final code as requested.\n\n```python\n# Your code here\nq = int(input())\nresults = []\nfor _ in range(q):\n n, m, k = map(int, input().split())\n min_moves = max(n, m)\n # Check if it is possible\n if k < min_moves:\n results.append(-1)\n elif (k - min_moves) % 2 != 0:\n results.append(-1)\n else:\n # maximal diagonals is min(n, m)\n results.append(k)\nprint('\\n'.join(map(str, results)))\n```", "type": "output_text", "logprobs": null}], "role": "assistant", "status": "completed", "type": "message"}], "parallel_tool_calls": true, "temperature": null, "tool_choice": "auto", "tools": [], "top_p": null, "background": null, "max_output_tokens": null, "max_tool_calls": null, "previous_response_id": null, "prompt": null, "reasoning": null, "service_tier": null, "status": null, "text": null, "top_logprobs": null, "truncation": null, "usage": null, "user": null}, "reward": 0.0, "extracted_model_output": "Let's break down the problem step by step:\n\n### Step-by-step Approach\n\n- **Diagonal Moves:** Each move can be diagonal if $x$ and $y$ both change. So, in one diagonal move, both coordinates change by $\\pm1$.\n- **Minimum Moves Needed:** To reach $(n, m)$ from $(0, 0)$, the minimal number of moves is $\\max(n, m)$, using as many diagonal moves as possible, then only horizontal or vertical moves for the remainder.\n- **Extra Moves:** If $k > \\max(n, m)$, Mikhail must take some \"detours\" (i.e., possibly moving back and forth to waste moves).\n- **Parity Check:** He can only make a detour move if the extra moves to be wasted can be arranged such that he ends up back at the destination. In this 8-way grid, he can move back and forth without changing the end position, but after all moves, the parity of $n + m$ and $k$ must match (otherwise, you can't waste just one move and land still at $(n,m)$).\n- **Maximum Diagonal Moves:** As many as possible, i.e., $\\min(n, m)$. However, with extra moves, unless there's some restriction, he could potentially waste them.\n\n#### Implementation Plan\n\nFor each query:\n1. Compute $min\\_moves = \\max(n, m)$.\n2. If $k < min\\_moves$ \u2192 impossible, answer $-1$.\n3. Check parity: if $(k - min\\_moves) \\% 2 != 0$ \u2192 impossible, answer $-1$.\n4. The maximal number of diagonal moves is $\\min(n, m)$. For any extra moves: if both $n$, $m$ are even/odd, you can use all moves as diagonal or detour without issue; otherwise, as long as the parity works, that's enough.\n\n#### Example:\n\nFor $(10, 1, 9)$:\n- $min\\_moves = 10$\n- $k = 9 < 10$ \u2192 Impossible\n\nFor $(2, 2, 3)$:\n- $min\\_moves = 2$\n- $k = 3 \\geq 2$\n- $(k - min\\_moves) = 1$, which is odd, so this is okay since both $n$, $m$ are even. But since he needs to land back at the destination, parity with $k$ must match $n+m$.\n\nLets formalize the check:\n\n*If $(k - \\max(n,m)) \\% 2 \\equiv (n + m) \\% 2$, then possible, else impossible.*\n\nBut actually, in 8-way moves, it's sufficient to check $(k - \\max(n, m))$ is even.\n\nLet's observe the general rule for 8-sided moves:\n\nIt **is always possible** if $k \\geq \\max(n, m)$ and $(k-\\max(n,m))$ is even.\n\n---\n\nNow, let's put this together in the final code as requested.\n\n```python\n# Your code here\nq = int(input())\nresults = []\nfor _ in range(q):\n n, m, k = map(int, input().split())\n min_moves = max(n, m)\n # Check if it is possible\n if k < min_moves:\n results.append(-1)\n elif (k - min_moves) % 2 != 0:\n results.append(-1)\n else:\n # maximal diagonals is min(n, m)\n results.append(k)\nprint('\\n'.join(map(str, results)))\n```", "extracted_model_code": "# Your code here\nq = int(input())\nresults = []\nfor _ in range(q):\n n, m, k = map(int, input().split())\n min_moves = max(n, m)\n # Check if it is possible\n if k < min_moves:\n results.append(-1)\n elif (k - min_moves) % 2 != 0:\n results.append(-1)\n else:\n # maximal diagonals is min(n, m)\n results.append(k)\nprint('\\n'.join(map(str, results)))", "result": [-2], "metadata": {"output": "-1\n-1\n-1\n", "inputs": "3\n2 2 3\n4 3 7\n10 1 9\n", "expected": "1\n6\n-1\n", "error_code": -2, "error_message": "Wrong answer at output_line_idx=0: -1 != 1"}} +{"responses_create_params": {"background": null, "include": null, "input": [{"content": "You are a helpful and harmless assistant. You should think step-by-step before responding to the instruction below.\n\nPlease use python programming language only.\n\nYou must use ```python for just the final solution code block with the following format:\n```python\n# Your code here\n```\n\nPolycarp has $n$ different binary words. A word called binary if it contains only characters '0' and '1'. For example, these words are binary: \"0001\", \"11\", \"0\" and \"0011100\".\n\nPolycarp wants to offer his set of $n$ binary words to play a game \"words\". In this game, players name words and each next word (starting from the second) must start with the last character of the previous word. The first word can be any. For example, these sequence of words can be named during the game: \"0101\", \"1\", \"10\", \"00\", \"00001\".\n\nWord reversal is the operation of reversing the order of the characters. For example, the word \"0111\" after the reversal becomes \"1110\", the word \"11010\" after the reversal becomes \"01011\".\n\nProbably, Polycarp has such a set of words that there is no way to put them in the order correspondent to the game rules. In this situation, he wants to reverse some words from his set so that: the final set of $n$ words still contains different words (i.e. all words are unique); there is a way to put all words of the final set of words in the order so that the final sequence of $n$ words is consistent with the game rules. \n\nPolycarp wants to reverse minimal number of words. Please, help him.\n\n\n-----Input-----\n\nThe first line of the input contains one integer $t$ ($1 \\le t \\le 10^4$) \u2014 the number of test cases in the input. Then $t$ test cases follow.\n\nThe first line of a test case contains one integer $n$ ($1 \\le n \\le 2\\cdot10^5$) \u2014 the number of words in the Polycarp's set. Next $n$ lines contain these words. All of $n$ words aren't empty and contains only characters '0' and '1'. The sum of word lengths doesn't exceed $4\\cdot10^6$. All words are different.\n\nGuaranteed, that the sum of $n$ for all test cases in the input doesn't exceed $2\\cdot10^5$. Also, guaranteed that the sum of word lengths for all test cases in the input doesn't exceed $4\\cdot10^6$.\n\n\n-----Output-----\n\nPrint answer for all of $t$ test cases in the order they appear.\n\nIf there is no answer for the test case, print -1. Otherwise, the first line of the output should contain $k$ ($0 \\le k \\le n$) \u2014 the minimal number of words in the set which should be reversed. The second line of the output should contain $k$ distinct integers \u2014 the indexes of the words in the set which should be reversed. Words are numerated from $1$ to $n$ in the order they appear. If $k=0$ you can skip this line (or you can print an empty line). If there are many answers you can print any of them.\n\n\n-----Example-----\nInput\n4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n\nOutput\n1\n3 \n-1\n0\n\n2\n1 2", "role": "user", "type": "message"}], "instructions": null, "max_output_tokens": null, "max_tool_calls": null, "metadata": null, "model": null, "parallel_tool_calls": true, "previous_response_id": null, "prompt": null, "reasoning": null, "service_tier": null, "store": null, "temperature": null, "text": null, "tool_choice": "auto", "tools": [], "top_logprobs": null, "top_p": null, "truncation": null, "user": null, "stream": null}, "response": {"id": "resp_6400f6d64edf4bd0b135fe6dd0ba6621", "created_at": 1758933048.0, "error": null, "incomplete_details": null, "instructions": null, "metadata": null, "model": "gpt-4.1-2025-04-14", "object": "response", "output": [{"id": "msg_972d1413d58a440fb2f507f40f9cfe28", "content": [{"annotations": [], "text": "Let's break down the problem and build the solution step by step:\n\n### Understanding the Game Rule\n\n- Each word must start with the last character of the previous word.\n- We can reverse words (minimize number), preserving uniqueness.\n\n### Approach\n\nThink of words as edges in a directed multigraph:\n\n- Each word \"s\" is an edge from its first character (`s[0]`) to its last character (`s[-1]`)\n- The game sequence is a **Eulerian path**: \n - For all nodes except 0 or 1, `outdegree == indegree`\n - For at most two nodes: one can have outdegree = indegree + 1 (start), one can have indegree = outdegree + 1 (end).\n\nReversing a word \"s\" switches the edge direction.\n\n### Steps\n\n1. For each testcase, track the number of words for each direction:\n - \"0\" \u2192 \"1\"\n - \"1\" \u2192 \"0\"\n - \"0\" \u2192 \"0\"\n - \"1\" \u2192 \"1\"\n2. Try to make number of \"0\u21921\" and \"1\u21920\" as balanced as possible by reversing some of them, but can't create duplicates when doing so.\n3. If impossible (e.g., all \"0\u21920\" and \"1\u21921\", but not all words are of one character, or cannot reach Eulerian path form), return -1.\n4. Pick a minimal set of words to reverse to balance the eulerian trail conditions.\n\n### Efficient Solution (Python)\n\nLet's implement:\n\n```python\n# Your code here\nimport sys\nfrom collections import defaultdict\n\ninput = sys.stdin.readline\n\nt = int(sys.stdin.readline())\nfor _ in range(t):\n n = int(sys.stdin.readline())\n words = [sys.stdin.readline().strip() for _ in range(n)]\n \n # Keep track of classes, their indices, and reversed forms\n idx_01, idx_10, idx_00, idx_11 = [], [], [], []\n string_set = set(words)\n \n for i, w in enumerate(words):\n if w[0] == '0' and w[-1] == '1':\n idx_01.append(i)\n elif w[0] == '1' and w[-1] == '0':\n idx_10.append(i)\n elif w[0] == '0' and w[-1] == '0':\n idx_00.append(i)\n else:\n idx_11.append(i)\n \n if len(idx_01) == 0 and len(idx_10) == 0:\n # All \"aa\" type. If both happen, impossible.\n if len(idx_00) > 0 and len(idx_11) > 0:\n print(-1)\n else:\n print(0)\n else:\n # The ones to potentially reverse are ones from the bigger group,\n # but only if their reverse does not exist;\n # Try to minimize the difference.\n diff = abs(len(idx_01) - len(idx_10))\n if diff <= 1:\n print(0)\n continue\n # Need to reverse (diff-1)//2 words from the bigger group\n need = (diff)//2\n res = []\n # If more 01, reverse some to 10\n if len(idx_01) > len(idx_10):\n src = idx_01\n from_c = '0'\n to_c = '1'\n else:\n src = idx_10\n from_c = '1'\n to_c = '0'\n for i in src:\n rev = words[i][::-1]\n if rev not in string_set:\n res.append(i+1)\n if len(res) == need:\n break\n if len(res) < need:\n print(-1)\n else:\n print(len(res))\n if len(res):\n print(' '.join(map(str, res)))\n```\nThis code reads the input, classifies the edges, checks Eulerian path conditions, and picks a minimal set of reverses to balance the traversal, while checking uniqueness.", "type": "output_text", "logprobs": null}], "role": "assistant", "status": "completed", "type": "message"}], "parallel_tool_calls": true, "temperature": null, "tool_choice": "auto", "tools": [], "top_p": null, "background": null, "max_output_tokens": null, "max_tool_calls": null, "previous_response_id": null, "prompt": null, "reasoning": null, "service_tier": null, "status": null, "text": null, "top_logprobs": null, "truncation": null, "usage": null, "user": null}, "reward": 0.0, "extracted_model_output": "Let's break down the problem and build the solution step by step:\n\n### Understanding the Game Rule\n\n- Each word must start with the last character of the previous word.\n- We can reverse words (minimize number), preserving uniqueness.\n\n### Approach\n\nThink of words as edges in a directed multigraph:\n\n- Each word \"s\" is an edge from its first character (`s[0]`) to its last character (`s[-1]`)\n- The game sequence is a **Eulerian path**: \n - For all nodes except 0 or 1, `outdegree == indegree`\n - For at most two nodes: one can have outdegree = indegree + 1 (start), one can have indegree = outdegree + 1 (end).\n\nReversing a word \"s\" switches the edge direction.\n\n### Steps\n\n1. For each testcase, track the number of words for each direction:\n - \"0\" \u2192 \"1\"\n - \"1\" \u2192 \"0\"\n - \"0\" \u2192 \"0\"\n - \"1\" \u2192 \"1\"\n2. Try to make number of \"0\u21921\" and \"1\u21920\" as balanced as possible by reversing some of them, but can't create duplicates when doing so.\n3. If impossible (e.g., all \"0\u21920\" and \"1\u21921\", but not all words are of one character, or cannot reach Eulerian path form), return -1.\n4. Pick a minimal set of words to reverse to balance the eulerian trail conditions.\n\n### Efficient Solution (Python)\n\nLet's implement:\n\n```python\n# Your code here\nimport sys\nfrom collections import defaultdict\n\ninput = sys.stdin.readline\n\nt = int(sys.stdin.readline())\nfor _ in range(t):\n n = int(sys.stdin.readline())\n words = [sys.stdin.readline().strip() for _ in range(n)]\n \n # Keep track of classes, their indices, and reversed forms\n idx_01, idx_10, idx_00, idx_11 = [], [], [], []\n string_set = set(words)\n \n for i, w in enumerate(words):\n if w[0] == '0' and w[-1] == '1':\n idx_01.append(i)\n elif w[0] == '1' and w[-1] == '0':\n idx_10.append(i)\n elif w[0] == '0' and w[-1] == '0':\n idx_00.append(i)\n else:\n idx_11.append(i)\n \n if len(idx_01) == 0 and len(idx_10) == 0:\n # All \"aa\" type. If both happen, impossible.\n if len(idx_00) > 0 and len(idx_11) > 0:\n print(-1)\n else:\n print(0)\n else:\n # The ones to potentially reverse are ones from the bigger group,\n # but only if their reverse does not exist;\n # Try to minimize the difference.\n diff = abs(len(idx_01) - len(idx_10))\n if diff <= 1:\n print(0)\n continue\n # Need to reverse (diff-1)//2 words from the bigger group\n need = (diff)//2\n res = []\n # If more 01, reverse some to 10\n if len(idx_01) > len(idx_10):\n src = idx_01\n from_c = '0'\n to_c = '1'\n else:\n src = idx_10\n from_c = '1'\n to_c = '0'\n for i in src:\n rev = words[i][::-1]\n if rev not in string_set:\n res.append(i+1)\n if len(res) == need:\n break\n if len(res) < need:\n print(-1)\n else:\n print(len(res))\n if len(res):\n print(' '.join(map(str, res)))\n```\nThis code reads the input, classifies the edges, checks Eulerian path conditions, and picks a minimal set of reverses to balance the traversal, while checking uniqueness.", "extracted_model_code": "# Your code here\nimport sys\nfrom collections import defaultdict\n\ninput = sys.stdin.readline\n\nt = int(sys.stdin.readline())\nfor _ in range(t):\n n = int(sys.stdin.readline())\n words = [sys.stdin.readline().strip() for _ in range(n)]\n \n # Keep track of classes, their indices, and reversed forms\n idx_01, idx_10, idx_00, idx_11 = [], [], [], []\n string_set = set(words)\n \n for i, w in enumerate(words):\n if w[0] == '0' and w[-1] == '1':\n idx_01.append(i)\n elif w[0] == '1' and w[-1] == '0':\n idx_10.append(i)\n elif w[0] == '0' and w[-1] == '0':\n idx_00.append(i)\n else:\n idx_11.append(i)\n \n if len(idx_01) == 0 and len(idx_10) == 0:\n # All \"aa\" type. If both happen, impossible.\n if len(idx_00) > 0 and len(idx_11) > 0:\n print(-1)\n else:\n print(0)\n else:\n # The ones to potentially reverse are ones from the bigger group,\n # but only if their reverse does not exist;\n # Try to minimize the difference.\n diff = abs(len(idx_01) - len(idx_10))\n if diff <= 1:\n print(0)\n continue\n # Need to reverse (diff-1)//2 words from the bigger group\n need = (diff)//2\n res = []\n # If more 01, reverse some to 10\n if len(idx_01) > len(idx_10):\n src = idx_01\n from_c = '0'\n to_c = '1'\n else:\n src = idx_10\n from_c = '1'\n to_c = '0'\n for i in src:\n rev = words[i][::-1]\n if rev not in string_set:\n res.append(i+1)\n if len(res) == need:\n break\n if len(res) < need:\n print(-1)\n else:\n print(len(res))\n if len(res):\n print(' '.join(map(str, res)))", "result": [-2], "metadata": {"output": "1\n3\n-1\n0\n2\n1 2\n", "inputs": "4\n4\n0001\n1000\n0011\n0111\n3\n010\n101\n0\n2\n00000\n00001\n4\n01\n001\n0001\n00001\n", "expected": "1\n3 \n-1\n0\n\n2\n1 2 \n", "error_code": -2, "error_message": "Wrong answer: mismatched output length"}} diff --git a/environments/code_gen/prepare.py b/environments/code_gen/prepare.py new file mode 100644 index 000000000..951471ec7 --- /dev/null +++ b/environments/code_gen/prepare.py @@ -0,0 +1,61 @@ +# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +"""Download competitive coding training and validation data from HuggingFace. + +Usage: + python environments/code_gen/prepare.py + python environments/code_gen/prepare.py --split train + python environments/code_gen/prepare.py --split validation +""" + +import argparse +from pathlib import Path + + +REPO_ID = "nvidia/nemotron-RL-coding-competitive_coding" + +SPLITS = { + "train": "opencodereasoning_filtered_25k_train.jsonl", + "validation": "livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl", +} + +ARTIFACTS = { + "train": "opencodereasoning_filtered_25k_train.jsonl", + "validation": "validation.jsonl", +} + + +def prepare(split: str) -> None: + try: + from huggingface_hub import hf_hub_download + except ImportError: + raise ImportError("pip install huggingface_hub") + + output_path = Path(__file__).parent / "data" / SPLITS[split] + output_path.parent.mkdir(parents=True, exist_ok=True) + + downloaded = hf_hub_download(repo_id=REPO_ID, filename=ARTIFACTS[split], repo_type="dataset") + Path(downloaded).rename(output_path) + print(f"Wrote {output_path}") + + +def main() -> None: + parser = argparse.ArgumentParser() + parser.add_argument("--split", default="train", choices=list(SPLITS)) + args = parser.parse_args() + prepare(args.split) + + +if __name__ == "__main__": + main()