Skip to content

Implement farm caching#9

Merged
kalisp merged 50 commits intodevelopfrom
feature/AY-7952_support-of-RR-for-farm-caching
Jan 30, 2026
Merged

Implement farm caching#9
kalisp merged 50 commits intodevelopfrom
feature/AY-7952_support-of-RR-for-farm-caching

Conversation

@kalisp
Copy link
Copy Markdown
Member

@kalisp kalisp commented Sep 2, 2025

Changelog Description

Extracting and publishing of pointcaches (for example) is slow and should be offloaded to the farm if possible.

Additional review information

Reworked from use of pre-script to use dynamic way where extractenvironments happens on each of the nodes, env vars are stored into .bat/.sh and applied to rendering session on each node separately.

Testing notes:

  1. prepare Maya workfile with pointcache instance (check on farm toggle)
  2. try to publish it via RR

@kalisp kalisp linked an issue Sep 2, 2025 that may be closed by this pull request
@h-schoenberger
Copy link
Copy Markdown

h-schoenberger commented Sep 2, 2025

About "Current issues:"
rrPythonconsole is an .exe and therefore blocking.
But:
RR reads the rrEnvFile and creates the current render .bat file with it.
That .bat execution is what you see in the render log.
Which means any changes to the rrEnvFile have no effect in that session.
There are workarounds if required, but as far as I see you use a pre-script and therefore it should be available for the render stage.

About "Error jobList_GetInfo: Job not found (1 0 %3)"
I can see that there is an issue with the error message, it should not state "%3", it should state the result of the last TCP connection. Same info as stated if you run tcp.connectionStats()

kalisp added 15 commits October 13, 2025 17:02
rrEnv.env file on job is static and used for all render nodes which will cause an issue on multiplatform farms.
`ayoN_inject_envar` could be triggered dynamically by each of the nodes, prepare bat/sh, triggered and extracted environment variables would be applied only particular rendering session.
Previously it would trigger only on job itself, problem on multiplatform farms.
Script file could be run for each render node, rrEnv.rrEnv is only for whole job
Without this nothing will get published as all instances will be disabled.
rrEnv.rrEnv only for whole job, this way is dynamic for all render nodes.
Without it staging mode wouldn't work on publish
Command line process could be configured here too, it doesnt contain '='
…s://github.com/ynput/ayon-royalrender into feature/AY-7952_support-of-RR-for-farm-caching

# Conflicts:
#	client/ayon_royalrender/plugins/publish/create_publish_royalrender_job.py
@kalisp kalisp self-assigned this Oct 20, 2025
@kalisp kalisp added the sponsored This is directly sponsored by a client or community member label Oct 20, 2025
@antirotor
Copy link
Copy Markdown
Member

Looks good, but what's with the blacklist of env vars. I feel this is wrong - more like a hack than solution. Shouldn't we pass only specific environments needed? It was in fact reported to me today by client that some environments started to break their submission to RR (they are on linux). Something suddenly passed through, not filtered and broke the jobs. Maybe this was already discussed somewhere else, in that case sorry to mention it.

This might be bigger topic on AYONs end - tracking env vars set by us explicitly that we might solve.

@h-schoenberger
Copy link
Copy Markdown

h-schoenberger commented Dec 19, 2025

It was in fact reported to me today by client that some environments started to break their submission to RR

That's one of the reasons I had send you the blacklist.
I assume they use an AYON version before it was implemented.
As it was done recently.

Looks good, but what's with the blacklist of env vars. I feel this is wrong - more like a hack than solution.
Shouldn't we pass only specific environments needed?

From the meeting we had 2(?) months ago about issues I was in the impression that it should copy any env var that is present on the system. And therefore you do not know which env vars are required for a job.

Therefore I collected all env vars that I had found on various OS which are OS specific. And the ones that RR used.
So that it at least doesn't break in a clean installation (without custom pipeline machine specific env vars).

@h-schoenberger
Copy link
Copy Markdown

h-schoenberger commented Jan 16, 2026

"Error jobList_GetInfo: Job not found (1 0 %3)"

The issue was fixed for the upcoming RoyalRender release 9.1.23 in Feb 2026

@BigRoy
Copy link
Copy Markdown
Contributor

BigRoy commented Jan 16, 2026

From the meeting we had 2(?) months ago about issues I was in the impression that it should copy any env var that is present on the system. And therefore you do not know which env vars are required for a job.

Therefore I collected all env vars that I had found on various OS which are OS specific. And the ones that RR used. So that it at least doesn't break in a clean installation (without custom pipeline machine specific env vars).

Preferably we have a means to, on the worker, have it build up the environment for the process using some output from AYON through that worker. In Deadline we use GlobalJobPreLoad.py that uses some AYON context env vars submitted with the job only, to build up more of the environment on the worker locally.

@antirotor or did you discuss/approach this differently with Royal Render?

@antirotor
Copy link
Copy Markdown
Member

While testing I've encountered:

S177| Script raised an exception
S178| 
S179| Traceback (most recent call last):
S180|   File "C:\RR_localdata\renderscripts\ayon_inject_envvar.py", line 520, in <module>
S181|   File "C:\RR_localdata\renderscripts\ayon_inject_envvar.py", line 82, in inject
S182|   File "C:\RR_localdata\renderscripts\ayon_inject_envvar.py", line 206, in _extract_environments
S183| RuntimeError: Extract failed with b'Traceback (most recent call last):\r\n  File "start.py", line 1063, in main_cli\r\n  File "E:\\projects\\ynput\\repos/ayon-core/client\\ayon_core\\cli.py", line 338, in main\r\n    _cleanup_project_args()\r\n  File "E:\\projects\\ynput\\repos/ayon-core/client\\ayon_core\\cli.py", line 294, in _cleanup_project_args\r\n    cmd_name, cmd, rem_args = parent_cmd.resolve_command(\r\n  File "C:\\Users\\annat\\AppData\\Local\\Ynput\\AYON\\dependency_packages\\ayon_2512081759_windows.zip\\runtime\\click\\core.py", line 1755, in resolve_command\r\n    ctx.fail(_("No such command {name!r}.").format(name=original_cmd_name))\r\n  File "C:\\Users\\annat\\AppData\\Local\\Ynput\\AYON\\dependency_packages\\ayon_2512081759_windows.zip\\runtime\\click\\core.py", line 691, in fail\r\n    raise UsageError(message, self)\r\nclick.exceptions.UsageError: No such command \'extractenvironments\'.\r\n'
S184| 
S185| During handling of the above exception, another exception occurred:
S186| 
S187| Traceback (most recent call last):
S188|   File "C:\RR_localdata\renderscripts\ayon_inject_envvar.py", line 524, in <module>
S189| RuntimeError: Error happened::Extract failed with b'Traceback (most recent call last):\r\n  File "start.py", line 1063, in main_cli\r\n  File "E:\\projects\\ynput\\repos/ayon-core/client\\ayon_core\\cli.py", line 338, in main\r\n    _cleanup_project_args()\r\n  File "E:\\projects\\ynput\\repos/ayon-core/client\\ayon_core\\cli.py", line 294, in _cleanup_project_args\r\n    cmd_name, cmd, rem_args = parent_cmd.resolve_command(\r\n  File "C:\\Users\\annat\\AppData\\Local\\Ynput\\AYON\\dependency_packages\\ayon_2512081759_windows.zip\\runtime\\click\\core.py", line 1755, in resolve_command\r\n    ctx.fail(_("No such command {name!r}.").format(name=original_cmd_name))\r\n  File "C:\\Users\\annat\\AppData\\Local\\Ynput\\AYON\\dependency_packages\\ayon_2512081759_windows.zip\\runtime\\click\\core.py", line 691, in fail\r\n    raise UsageError(message, self)\r\nclick.exceptions.UsageError: No such command \'extractenvironments\'.\r\n'
S190| 
S191| 
S192| 

Might be completely unrelated to this PR as I've updated from RR7 to RR9 before so thing might not be configured properly. Need to check further.

@h-schoenberger
Copy link
Copy Markdown

Preferably we have a means to, on the worker, have it build up the environment for the process using some
output from AYON through that worker. In Deadline we use GlobalJobPreLoad.py that uses some AYON
context env vars submitted with the job only, to build up more of the environment on the worker locally.

@antirotor or did you discuss/approach this differently with Royal Render?

To clarify/understand:
So with Deadline you just add AYON env vars to the job and not all env vars of the machine?
And if this is the case and it works fine, why not do that for RR as well.
I do not see any problem doing that with RR as well.
Then we do not need a env var blacklist.

(Not completely) unrelated question:
Why does AYON inject env vars after the job was submitted? (Deadline and RR)
Don't you have all information at submission time? What's missing at submission?

@BigRoy
Copy link
Copy Markdown
Contributor

BigRoy commented Jan 22, 2026

(Not completely) unrelated question:
Why does AYON inject env vars after the job was submitted? (Deadline and RR)
Don't you have all information at submission time? What's missing at submission?

The environment variables may differ based on the worker - e.g. on Linux paths may be at X, on Windows at Y. Similarly we support multiple sites too, so some site may have certain things in a different folder than another site (but admittedly this on the farm is very rare in practice - plus I'd recommend avoiding having different paths to that extent just for sake that not all things can be path remapped in DCCs, etc.)

But that explains why on the farm we need to evaluate what is the accurate enviroment for the 'site' (AYON_SITE_ID) that worker machine runs in - and that may be entirely different or unrelated to the submission machine.

Does that explain it @h-schoenberger ?

So with Deadline you just add AYON env vars to the job and not all env vars of the machine?
And if this is the case and it works fine, why not do that for RR as well.

How do I do this?

  1. Submit very limited env vars (usually only some user env for third party tools like shotgrid or alike)
  2. On the worker as the job starts then run some custom logic that can add more data to the process, like setting the env vars for the job. (Similar to what we do in ayon-deadline GlobalJobPreLoad.py)
    • preferably this logic can run per job, not per task so that if a machine continues in the same job, but to a different task and it shares the same sandboxed environment that we wouldn't need to build the environment again - but only when it starts to enter the job.

@h-schoenberger
Copy link
Copy Markdown

Does that explain it @h-schoenberger ?
I understand that you want to inject env vars at a specific machine.

But not that all env vars of one machine (minus the black list) must be added.
Which is the reason that I had send you a black list.

@BigRoy
Copy link
Copy Markdown
Contributor

BigRoy commented Jan 23, 2026

But not that all env vars of one machine (minus the black list) must be added.
Which is the reason that I had send you a black list.

I haven't been involved with earlier releases of ayon-royalrender and hence lack lots of the knowledge so far, but I see no reason for passing along all env vars. We should avoid passing along anything, unless a few we may explicitly want to pass to the job. That's it.

@kalisp
Copy link
Copy Markdown
Member Author

kalisp commented Jan 26, 2026

While testing I've encountered:

Please redeploy ayon-royalrender/client/ayon_royalrender/rr_root/render_apps/scripts to RR_ROOT/render_apps and try again.

@antirotor

Copy link
Copy Markdown
Member

@antirotor antirotor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it and it successfully published caches.

Note for the future: we should drop attrs for dataclasses and think more about the environments :)

@kalisp kalisp merged commit 02d769c into develop Jan 30, 2026
1 check failed
@kalisp kalisp deleted the feature/AY-7952_support-of-RR-for-farm-caching branch January 30, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bump minor sponsored This is directly sponsored by a client or community member type: feature Adding something new and exciting to the product

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AY-7952_support of RR for farm caching AYON version mismatch and tiny enhancements

4 participants