Skip to content

Comments

feat: A tool to run R code#126

Merged
gadenbuie merged 50 commits intomainfrom
feat/evaluate-tool
Dec 15, 2025
Merged

feat: A tool to run R code#126
gadenbuie merged 50 commits intomainfrom
feat/evaluate-tool

Conversation

@gadenbuie
Copy link
Collaborator

@gadenbuie gadenbuie commented Nov 26, 2025

Closes #118

Summary

This PR adds btw_tool_run_r(), a tool that executes R code in the global environment and returns the results to the LLM. I've marked the tool Experimental for now.

What it captures

The tool captures and returns:

  • Text output from print(), cat(), etc.
  • Plots as inline images
  • Messages from message()
  • Warnings from warning()
  • Errors from stop()

When an error occurs, all output up to the error is returned.

The tool makes use of recent changes in ellmer v0.4.0 to allow tools to return lists of Content, include ContentImage types from plots. Each of the above output types are given btw-local content types which are also used to customize their display in shinychat.

Security

This tool is disabled by default. It executes arbitrary code in the global environment without sandboxing or review. We recommend:

  • Only enable in trusted, non-public environments
  • Avoid prompting the model to take destructive actions
  • Understand the security risks before enabling

To enable the tool

The tool can be enable via an R option (in a session, an .Rprofile or in btw.md):

options(btw.run_r.enabled = TRUE)
---
options:
  run_r:
    enabled: true
---

Or equivalently via environment variable:

BTW_RUN_R_ENABLED=true

When this option is set, btw_tools() will include the btw_tool_run_r() tool, otherwise it is excluded from btw_tools().

In btw_tools(), you can also explicitly include the "run", "run_r" or "btw_tool_run_r" tool in tools:

btw_tools(tools = "run_r")

Or in btw.md:

---
tools:
  - run_r
---

Dependencies

This feature adds a few additional suggested dependencies.

  • We use evaluate for running and evaluating the LLM-written code
  • If fansi is available, we use it to translate ANSI colors to HTML
  • If ragg is available, we use it as the plot rendering device. The plot device can be customized by providing a function via the R option btw.run_r.graphics_device.

@gadenbuie gadenbuie marked this pull request as ready for review December 9, 2025 13:54
Copy link
Collaborator

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So sharp. Interleaving that output is so nice. This looks great!

Models' "default" Amount Of Code Written when calling this tool still feels too long to me. This leads to issues like that below, where the model hallucinates column names in data is hasn't seen yet because it didn't stop to examine the output of glimpse(forested). We've prompted PA/Databot/side::kick() in the same way in their analogous tools for this reason.

Screenshot 2025-12-12 at 8 52 27 AM

If you still disagree, okay with me that maybe this is a matter of preference best resolved in our btw.mds. :)

@simonpcouch
Copy link
Collaborator

Is there any way could suppress package startup messages by default in the tool UI?

@gadenbuie
Copy link
Collaborator Author

Models' "default" Amount Of Code Written when calling this tool still feels too long to me. This leads to issues like that below, where the model hallucinates column names in data is hasn't seen yet because it didn't stop to examine the output of glimpse(forested). We've prompted PA/Databot/side::kick() in the same way in their analogous tools for this reason.

@simonpcouch ah that's a great point; thanks for showing me the example. I didn't take those lines initially because there are some shinychat limitations we need to fix around how the tool UI works when you're streaming in results. So I wasn't sure if I wanted to create a situation where the model tries to run code in too small of a chunk and ends up making that problem worse. I think I'll go back and add those lines in though after seeing your example.

I'll look into suppressing package startup messages too!

@gadenbuie gadenbuie mentioned this pull request Dec 15, 2025
@gadenbuie gadenbuie merged commit ad93e2d into main Dec 15, 2025
11 checks passed
@gadenbuie gadenbuie deleted the feat/evaluate-tool branch December 15, 2025 15:52
@gadenbuie gadenbuie restored the feat/evaluate-tool branch December 15, 2025 15:53
@gadenbuie gadenbuie deleted the feat/evaluate-tool branch January 5, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tool for running R code in the R session?

2 participants