-
Notifications
You must be signed in to change notification settings - Fork 250
Description
(base) [root@recom-pricing-2 app]# softlearning run_example_local examples.multi_goal --algorithm SAC --universe gym --domain Default-v0 --task MultiGoal --policy gaussian
2021-01-04 00:59:21.753062: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
example_module_name= examples.multi_goal example_argv= ('--algorithm', 'SAC', '--universe', 'gym', '--domain', 'Default-v0', '--task', 'MultiGoal', '--policy', 'gaussian', '--mode=local')
INFO:absl:MUJOCO_GL is not set, so an OpenGL backend will be chosen automatically.
/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/glfw/init.py:834: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
warnings.warn(message, GLFWError)
INFO:absl:Successfully imported OpenGL backend: glfw
INFO:absl:MuJoCo library version is: 200
2021-01-04 00:59:26,522 WARNING tune.py:396 -- Tune detects GPUs, but no trials are using GPUs. To enable trials to use GPUs, set tune.run(resources_per_trial={'gpu': 1}...) which allows Tune to expose 1 GPU to each trial. You can also override Trainable.default_resource_request if using the Trainable API.
== Status ==
Memory usage on this node: 8.5/117.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 32/32 CPUs, 0/4 GPUs, 0.0/68.65 GiB heap, 0.0/23.05 GiB objects (0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24
Number of trials: 1 (1 RUNNING)
+-----------------------+----------+-------+
| Trial name | status | loc |
|-----------------------+----------+-------|
| id=17850_00000-seed=1 | RUNNING | |
+-----------------------+----------+-------+
(pid=10011) 2021-01-04 00:59:27.064132: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
(pid=10011) /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/glfw/init.py:834: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
(pid=10011) warnings.warn(message, GLFWError)
(pid=10011) /home/app/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=10011) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=10011) 2021-01-04 00:59:30.873847: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
(pid=10011) 2021-01-04 00:59:30.898979: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=10011) 2021-01-04 00:59:30.899038: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: recom-pricing-2
(pid=10011) 2021-01-04 00:59:30.899050: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: recom-pricing-2
(pid=10011) 2021-01-04 00:59:30.899141: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.27.4
(pid=10011) 2021-01-04 00:59:30.899188: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.27.4
(pid=10011) 2021-01-04 00:59:30.899199: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.27.4
(pid=10011) 2021-01-04 00:59:30.899546: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
(pid=10011) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
(pid=10011) 2021-01-04 00:59:30.912801: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
(pid=10011) 2021-01-04 00:59:30.916704: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f365444e1f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
(pid=10011) 2021-01-04 00:59:30.916731: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
(pid=10011) 2021-01-04 00:59:31,401 ERROR function_runner.py:233 -- Runner Thread raised error.
(pid=10011) Traceback (most recent call last):
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10011) self._entrypoint()
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10011) self._status_reporter.get_checkpoint())
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10011) output = train_func(config, reporter)
(pid=10011) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10011) for train_result in algorithm.train():
(pid=10011) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10011) self._do_sampling(timestep=self._total_timestep)
(pid=10011) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10011) self.sampler.sample()
(pid=10011) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10011) self.reset()
(pid=10011) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10011) self._current_observation = self.environment.reset()
(pid=10011) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10011) Exception in thread Thread-2:
(pid=10011) Traceback (most recent call last):
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=10011) self.run()
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=10011) raise e
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10011) self._entrypoint()
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10011) self._status_reporter.get_checkpoint())
(pid=10011) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10011) output = train_func(config, reporter)
(pid=10011) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10011) for train_result in algorithm.train():
(pid=10011) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10011) self._do_sampling(timestep=self._total_timestep)
(pid=10011) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10011) self.sampler.sample()
(pid=10011) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10011) self.reset()
(pid=10011) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10011) self._current_observation = self.environment.reset()
(pid=10011) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10011)
2021-01-04 00:59:31,459 ERROR trial_runner.py:567 -- Trial id=17850_00000-seed=1: Error processing event.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 488, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/worker.py", line 1428, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=10011, ip=10.22.134.202)
File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trainable.py", line 336, in train
result = self.step()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 340, in step
self._report_thread_runner_error(block=True)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 459, in _report_thread_runner_error
.format(err_tb_str)))
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=10011, ip=10.22.134.202)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
self._entrypoint()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
self._status_reporter.get_checkpoint())
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
output = train_func(config, reporter)
File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
for train_result in algorithm.train():
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
self._do_sampling(timestep=self._total_timestep)
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
self.sampler.sample()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
self.reset()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
self._current_observation = self.environment.reset()
AttributeError: 'NoneType' object has no attribute 'reset'
2021-01-04 00:59:31,463 INFO trial_runner.py:690 -- Trial id=17850_00000-seed=1: Attempting to restore trial state from last checkpoint.
(pid=10004) 2021-01-04 00:59:31.956211: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
(pid=10004) /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/glfw/init.py:834: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
(pid=10004) warnings.warn(message, GLFWError)
(pid=10004) /home/app/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=10004) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=10004) 2021-01-04 00:59:35.780731: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
(pid=10004) 2021-01-04 00:59:35.805621: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=10004) 2021-01-04 00:59:35.805675: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: recom-pricing-2
(pid=10004) 2021-01-04 00:59:35.805685: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: recom-pricing-2
(pid=10004) 2021-01-04 00:59:35.805785: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.27.4
(pid=10004) 2021-01-04 00:59:35.805831: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.27.4
(pid=10004) 2021-01-04 00:59:35.805840: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.27.4
(pid=10004) 2021-01-04 00:59:35.806175: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
(pid=10004) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
(pid=10004) 2021-01-04 00:59:35.818986: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
(pid=10004) 2021-01-04 00:59:35.822967: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fa43444e1f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
(pid=10004) 2021-01-04 00:59:35.822996: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
(pid=10004) 2021-01-04 00:59:36,308 ERROR function_runner.py:233 -- Runner Thread raised error.
(pid=10004) Traceback (most recent call last):
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10004) self._entrypoint()
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10004) self._status_reporter.get_checkpoint())
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10004) output = train_func(config, reporter)
(pid=10004) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10004) for train_result in algorithm.train():
(pid=10004) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10004) self._do_sampling(timestep=self._total_timestep)
(pid=10004) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10004) self.sampler.sample()
(pid=10004) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10004) self.reset()
(pid=10004) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10004) self._current_observation = self.environment.reset()
(pid=10004) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10004) Exception in thread Thread-2:
(pid=10004) Traceback (most recent call last):
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=10004) self.run()
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=10004) raise e
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10004) self._entrypoint()
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10004) self._status_reporter.get_checkpoint())
(pid=10004) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10004) output = train_func(config, reporter)
(pid=10004) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10004) for train_result in algorithm.train():
(pid=10004) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10004) self._do_sampling(timestep=self._total_timestep)
(pid=10004) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10004) self.sampler.sample()
(pid=10004) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10004) self.reset()
(pid=10004) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10004) self._current_observation = self.environment.reset()
(pid=10004) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10004)
2021-01-04 00:59:36,367 ERROR trial_runner.py:567 -- Trial id=17850_00000-seed=1: Error processing event.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 488, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/worker.py", line 1428, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=10004, ip=10.22.134.202)
File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trainable.py", line 336, in train
result = self.step()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 340, in step
self._report_thread_runner_error(block=True)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 459, in _report_thread_runner_error
.format(err_tb_str)))
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=10004, ip=10.22.134.202)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
self._entrypoint()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
self._status_reporter.get_checkpoint())
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
output = train_func(config, reporter)
File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
for train_result in algorithm.train():
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
self._do_sampling(timestep=self._total_timestep)
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
self.sampler.sample()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
self.reset()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
self._current_observation = self.environment.reset()
AttributeError: 'NoneType' object has no attribute 'reset'
2021-01-04 00:59:36,370 INFO trial_runner.py:690 -- Trial id=17850_00000-seed=1: Attempting to restore trial state from last checkpoint.
== Status ==
Memory usage on this node: 8.8/117.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 32/32 CPUs, 0/4 GPUs, 0.0/68.65 GiB heap, 0.0/23.05 GiB objects (0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24
Number of trials: 1 (1 RUNNING)
+-----------------------+----------+-------+
| Trial name | status | loc |
|-----------------------+----------+-------|
| id=17850_00000-seed=1 | RUNNING | |
+-----------------------+----------+-------+
Number of errored trials: 1
+-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------|
| id=17850_00000-seed=1 | 2 | /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24/id=17850_00000-seed=1_0_2021-01-04_00-59-26/error.txt |
+-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+
(pid=10037) 2021-01-04 00:59:36.868421: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
(pid=10037) /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/glfw/init.py:834: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
(pid=10037) warnings.warn(message, GLFWError)
(pid=10037) /home/app/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=10037) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=10037) 2021-01-04 00:59:40.709361: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
(pid=10037) 2021-01-04 00:59:40.735021: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=10037) 2021-01-04 00:59:40.735079: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: recom-pricing-2
(pid=10037) 2021-01-04 00:59:40.735090: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: recom-pricing-2
(pid=10037) 2021-01-04 00:59:40.735179: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.27.4
(pid=10037) 2021-01-04 00:59:40.735224: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.27.4
(pid=10037) 2021-01-04 00:59:40.735234: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.27.4
(pid=10037) 2021-01-04 00:59:40.735539: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
(pid=10037) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
(pid=10037) 2021-01-04 00:59:40.749026: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
(pid=10037) 2021-01-04 00:59:40.753493: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f488844e1f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
(pid=10037) 2021-01-04 00:59:40.753522: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
(pid=10037) 2021-01-04 00:59:41,236 ERROR function_runner.py:233 -- Runner Thread raised error.
(pid=10037) Traceback (most recent call last):
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10037) self._entrypoint()
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10037) self._status_reporter.get_checkpoint())
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10037) output = train_func(config, reporter)
(pid=10037) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10037) for train_result in algorithm.train():
(pid=10037) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10037) self._do_sampling(timestep=self._total_timestep)
(pid=10037) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10037) self.sampler.sample()
(pid=10037) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10037) self.reset()
(pid=10037) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10037) self._current_observation = self.environment.reset()
(pid=10037) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10037) Exception in thread Thread-2:
(pid=10037) Traceback (most recent call last):
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=10037) self.run()
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=10037) raise e
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10037) self._entrypoint()
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10037) self._status_reporter.get_checkpoint())
(pid=10037) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10037) output = train_func(config, reporter)
(pid=10037) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10037) for train_result in algorithm.train():
(pid=10037) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10037) self._do_sampling(timestep=self._total_timestep)
(pid=10037) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10037) self.sampler.sample()
(pid=10037) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10037) self.reset()
(pid=10037) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10037) self._current_observation = self.environment.reset()
(pid=10037) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10037)
2021-01-04 00:59:41,293 ERROR trial_runner.py:567 -- Trial id=17850_00000-seed=1: Error processing event.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 488, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/worker.py", line 1428, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=10037, ip=10.22.134.202)
File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trainable.py", line 336, in train
result = self.step()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 340, in step
self._report_thread_runner_error(block=True)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 459, in _report_thread_runner_error
.format(err_tb_str)))
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=10037, ip=10.22.134.202)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
self._entrypoint()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
self._status_reporter.get_checkpoint())
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
output = train_func(config, reporter)
File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
for train_result in algorithm.train():
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
self._do_sampling(timestep=self._total_timestep)
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
self.sampler.sample()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
self.reset()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
self._current_observation = self.environment.reset()
AttributeError: 'NoneType' object has no attribute 'reset'
2021-01-04 00:59:41,296 INFO trial_runner.py:690 -- Trial id=17850_00000-seed=1: Attempting to restore trial state from last checkpoint.
(pid=10015) 2021-01-04 00:59:41.792255: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
(pid=10015) /opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/glfw/init.py:834: GLFWError: (65544) b'X11: The DISPLAY environment variable is missing'
(pid=10015) warnings.warn(message, GLFWError)
(pid=10015) /home/app/gym/gym/logger.py:30: UserWarning: WARN: Box bound precision lowered by casting to float32
(pid=10015) warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))
(pid=10015) 2021-01-04 00:59:45.619478: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
(pid=10015) 2021-01-04 00:59:45.645539: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
(pid=10015) 2021-01-04 00:59:45.645598: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: recom-pricing-2
(pid=10015) 2021-01-04 00:59:45.645609: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: recom-pricing-2
(pid=10015) 2021-01-04 00:59:45.645700: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.27.4
(pid=10015) 2021-01-04 00:59:45.645748: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.27.4
(pid=10015) 2021-01-04 00:59:45.645759: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.27.4
(pid=10015) 2021-01-04 00:59:45.646126: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
(pid=10015) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
(pid=10015) 2021-01-04 00:59:45.659372: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2300000000 Hz
(pid=10015) 2021-01-04 00:59:45.663806: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fd15844e1f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
(pid=10015) 2021-01-04 00:59:45.663840: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-01-04 00:59:46,200 ERROR trial_runner.py:567 -- Trial id=17850_00000-seed=1: Error processing event.
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 515, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 488, in fetch_result
result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/worker.py", line 1428, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(TuneError): ray::ImplicitFunc.train() (pid=10015, ip=10.22.134.202)
File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/trainable.py", line 336, in train
result = self.step()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 340, in step
self._report_thread_runner_error(block=True)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 459, in _report_thread_runner_error
.format(err_tb_str)))
ray.tune.error.TuneError: Trial raised an exception. Traceback:
ray::ImplicitFunc.train() (pid=10015, ip=10.22.134.202)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
self._entrypoint()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
self._status_reporter.get_checkpoint())
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
output = train_func(config, reporter)
File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
for train_result in algorithm.train():
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
self._do_sampling(timestep=self._total_timestep)
File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
self.sampler.sample()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
self.reset()
File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
self._current_observation = self.environment.reset()
AttributeError: 'NoneType' object has no attribute 'reset'
== Status ==
Memory usage on this node: 8.8/117.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/32 CPUs, 0/4 GPUs, 0.0/68.65 GiB heap, 0.0/23.05 GiB objects (0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24
Number of trials: 1 (1 ERROR)
+-----------------------+----------+-------+
| Trial name | status | loc |
|-----------------------+----------+-------|
| id=17850_00000-seed=1 | ERROR | |
+-----------------------+----------+-------+
Number of errored trials: 1
+-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------|
| id=17850_00000-seed=1 | 4 | /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24/id=17850_00000-seed=1_0_2021-01-04_00-59-26/error.txt |
+-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+
== Status ==
Memory usage on this node: 8.8/117.9 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/32 CPUs, 0/4 GPUs, 0.0/68.65 GiB heap, 0.0/23.05 GiB objects (0/1.0 accelerator_type:T4)
Result logdir: /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24
Number of trials: 1 (1 ERROR)
+-----------------------+----------+-------+
| Trial name | status | loc |
|-----------------------+----------+-------|
| id=17850_00000-seed=1 | ERROR | |
+-----------------------+----------+-------+
Number of errored trials: 1
+-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+
| Trial name | # failures | error file |
|-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------|
| id=17850_00000-seed=1 | 4 | /root/ray_results/gym/Default-v0/MultiGoal/2021-01-04T00-59-25-2021-01-04T00-59-24/id=17850_00000-seed=1_0_2021-01-04_00-59-26/error.txt |
+-----------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+
Traceback (most recent call last):
File "/opt/rh/rh-python36/root/usr/bin/softlearning", line 11, in
load_entry_point('softlearning', 'console_scripts', 'softlearning')()
File "/home/app/softlearning/softlearning/scripts/console_scripts.py", line 207, in main
return cli()
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/app/softlearning/softlearning/scripts/console_scripts.py", line 73, in run_example_local_cmd
return run_example_local(example_module_name, example_argv)
File "/home/app/softlearning/examples/instrument.py", line 245, in run_example_local
reuse_actors=True)
File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/tune.py", line 427, in run
raise TuneError("Trials did not complete", incomplete_trials)
ray.tune.error.TuneError: ('Trials did not complete', [id=17850_00000-seed=1])
(pid=10015) 2021-01-04 00:59:46,155 ERROR function_runner.py:233 -- Runner Thread raised error.
(pid=10015) Traceback (most recent call last):
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10015) self._entrypoint()
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10015) self._status_reporter.get_checkpoint())
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10015) output = train_func(config, reporter)
(pid=10015) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10015) for train_result in algorithm.train():
(pid=10015) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10015) self._do_sampling(timestep=self._total_timestep)
(pid=10015) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10015) self.sampler.sample()
(pid=10015) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10015) self.reset()
(pid=10015) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10015) self._current_observation = self.environment.reset()
(pid=10015) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10015) Exception in thread Thread-2:
(pid=10015) Traceback (most recent call last):
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
(pid=10015) self.run()
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 246, in run
(pid=10015) raise e
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 227, in run
(pid=10015) self._entrypoint()
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 290, in entrypoint
(pid=10015) self._status_reporter.get_checkpoint())
(pid=10015) File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/ray/tune/function_runner.py", line 501, in _trainable_func
(pid=10015) output = train_func(config, reporter)
(pid=10015) File "/home/app/softlearning/examples/multi_goal/main.py", line 70, in run_experiment
(pid=10015) for train_result in algorithm.train():
(pid=10015) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 177, in _train
(pid=10015) self._do_sampling(timestep=self._total_timestep)
(pid=10015) File "/home/app/softlearning/softlearning/algorithms/rl_algorithm.py", line 334, in _do_sampling
(pid=10015) self.sampler.sample()
(pid=10015) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 54, in sample
(pid=10015) self.reset()
(pid=10015) File "/home/app/softlearning/softlearning/samplers/simple_sampler.py", line 28, in reset
(pid=10015) self._current_observation = self.environment.reset()
(pid=10015) AttributeError: 'NoneType' object has no attribute 'reset'
(pid=10015)