Skip to content

RemoteRolloutProcessor use data loader and pulling based on tags#217

Merged
xzrderek merged 12 commits intomainfrom
derekx/data_loader_and_tags
Sep 25, 2025
Merged

RemoteRolloutProcessor use data loader and pulling based on tags#217
xzrderek merged 12 commits intomainfrom
derekx/data_loader_and_tags

Conversation

@xzrderek
Copy link
Copy Markdown
Contributor

2025-09-24 18:25:45.285 | WARNING  | vendor.tau2.utils.utils:<module>:14 - No .env file found
2025-09-24 18:25:46.092 | WARNING  | vendor.tau2.utils.llm_utils:<module>:77 - Sonnet thinking is disabled
INFO:datasets:PyTorch version 2.7.1 available.
==================================================================================== test session starts =====================================================================================
platform darwin -- Python 3.13.5, pytest-8.4.1, pluggy-1.6.0
rootdir: /Users/derekxu/Documents/code/python-sdk
configfile: pytest.ini
plugins: anyio-4.9.0, syrupy-4.9.1, asyncio-1.1.0, xdist-3.8.0, logfire-4.5.0, cov-6.2.1, pytest_httpserver-1.1.3, langsmith-0.4.8, eval-protocol-0.2.23+31.gfde105b, hydra-core-1.3.2
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                                             

tests/chinook/langfuse/test_remote_langfuse_chinook.py 2025-09-24 18:25:49.085 | WARNING  | vendor.tau2.utils.utils:<module>:14 - No .env file found
2025-09-24 18:25:49.641 | WARNING  | vendor.tau2.utils.llm_utils:<module>:77 - Sonnet thinking is disabled
INFO:datasets:PyTorch version 2.7.1 available.
INFO:     Started server process [33508]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:7077 (Press CTRL+C to quit)
INFO:     127.0.0.1:58415 - "GET /status?rollout_id=ping HTTP/1.1" 404 Not Found
Runs (Parallel):   0%|                                                                                                                                                 | 0/1 [00:00<?, ?run/sINFO:     127.0.0.1:58416 - "POST /init HTTP/1.1" 200 OK                                                                                                            | 0/1 [00:00<?, ?rollout/s]
INFO:     127.0.0.1:58417 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58418 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58419 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58420 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58423 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58424 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58425 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58426 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58428 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58430 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58432 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58434 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58435 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58437 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58441 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58442 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58444 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58446 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
INFO:     127.0.0.1:58449 - "GET /status?rollout_id=entire-time-50 HTTP/1.1" 200 OK
                                                                                                                                                                                             WARNING:eval_protocol.adapters.langfuse:Retrying in 2s (attempt 1/3): Empty results - indexing delay                                                                                           
WARNING:eval_protocol.adapters.langfuse:Retrying in 4s (attempt 2/3): Empty results - indexing delay
INFO:eval_protocol.adapters.langfuse:Successfully processed 2 selected traces into 2 evaluation rows
✅ Successfully received row from Langfuse with invocation_id: loud-conference-71
✅ Successfully received row from Langfuse with invocation_id: loud-conference-71                                                                                     | 0/2 [00:00<?, ?eval/s]
Runs (Parallel): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:32<00:00, 32.32s/run]
.
================================================================================                                                                                                              
🔥 FIREWORKS EXPERIMENT LINKS
================================================================================
🔗 Experiment rational-part-33: https://app.fireworks.ai/dashboard/evaluation-jobs/test-remote-rollout-and-fetch-langfuse-rational-part-33-job
================================================================================

================================================================================
📊 LOCAL UI EVALUATION RESULTS
================================================================================
📊 Invocation loud-conference-71:
  📊 Aggregate scores: http://localhost:8000/pivot?filterConfig=%5B%7B%22logic%22%3A%20%22AND%22%2C%20%22filters%22%3A%20%5B%7B%22field%22%3A%20%22%24.execution_metadata.invocation_id%22%2C%20%22operator%22%3A%20%22%3D%3D%22%2C%20%22value%22%3A%20%22loud-conference-71%22%2C%20%22type%22%3A%20%22text%22%7D%5D%7D%5D
  📋 Trajectories: http://localhost:8000/table?filterConfig=%5B%7B%22logic%22%3A%20%22AND%22%2C%20%22filters%22%3A%20%5B%7B%22field%22%3A%20%22%24.execution_metadata.invocation_id%22%2C%20%22operator%22%3A%20%22%3D%3D%22%2C%20%22value%22%3A%20%22loud-conference-71%22%2C%20%22type%22%3A%20%22text%22%7D%5D%7D%5D
================================================================================


===================================================================================== 1 passed in 44.91s ===================================================================================```

xzrderek and others added 8 commits September 25, 2025 11:35
* add typescript simple example

* publish npm package for eval protocol (#219)

* publish typescript SDK

* add createLangfuseConfigTags function and update version to 0.1.1

* use eval-protocol npm dependency

* refactor statusInfoSchema to use a record type and update version to 0.1.2

* add eval_metadata to langfuse_row in RemoteRolloutProcessor

* Refactor data generator function name and update eval-protocol version to 0.1.2

* done
@xzrderek xzrderek merged commit 2dbd868 into main Sep 25, 2025
7 checks passed
@xzrderek xzrderek deleted the derekx/data_loader_and_tags branch September 25, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant