Skip to content

Commit

Permalink
[RLlib] Minor doc fixes (ray-project#35675)
Browse files Browse the repository at this point in the history
Signed-off-by: Rohan Potdar <[email protected]>
  • Loading branch information
Rohan138 committed May 24, 2023
1 parent 5acf41e commit 806b633
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 10 deletions.
2 changes: 1 addition & 1 deletion doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ The `framework` config lets you choose between "tf2", "tf" and "torch" for execu
You can also tweak RLlib's default `model` config,and set up a separate config for `evaluation`.

If you want to learn more about the RLlib training API,
`you can learn more about it here <rllib-training-api>`_.
`you can learn more about it here <rllib-training.html#using-the-python-api>`_.
Also, see `here for a simple example on how to write an action inference loop after training. <https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training.py>`_

If you want to get a quick preview of which **algorithms** and **environments** RLlib supports,
Expand Down
17 changes: 9 additions & 8 deletions doc/source/rllib/rllib-env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,18 @@ Custom env classes passed directly to the algorithm must take a single ``env_con

.. code-block:: python
import gym, ray
import gymnasium as gym
import ray
from ray.rllib.algorithms import ppo
class MyEnv(gym.Env):
def __init__(self, env_config):
self.action_space = <gym.Space>
self.observation_space = <gym.Space>
def reset(self):
return <obs>
def reset(self, seed, options):
return <obs>, <info>
def step(self, action):
return <obs>, <reward: float>, <done: bool>, <info: dict>
return <obs>, <reward: float>, <terminated: bool>, <truncated: bool>, <info: dict>
ray.init()
algo = ppo.PPO(env=MyEnv, config={
Expand All @@ -61,7 +62,7 @@ For a full runnable code example using the custom environment API, see `custom_e

.. warning::

The gym registry is not compatible with Ray. Instead, always use the registration flows documented above to ensure Ray workers can access the environment.
The gymnasium registry is not compatible with Ray. Instead, always use the registration flows documented above to ensure Ray workers can access the environment.

In the above example, note that the ``env_creator`` function takes in an ``env_config`` object.
This is a dict containing options passed in through your algorithm.
Expand All @@ -77,8 +78,8 @@ This can be useful if you want to train over an ensemble of different environmen
choose_env_for(env_config.worker_index, env_config.vector_index))
self.action_space = self.env.action_space
self.observation_space = self.env.observation_space
def reset(self):
return self.env.reset()
def reset(self, seed, options):
return self.env.reset(seed, options)
def step(self, action):
return self.env.step(action)
Expand Down Expand Up @@ -186,7 +187,7 @@ Here is an example of an env, in which all agents always step simultaneously:
# ... {"car_2": True, "__all__": False}
An another example, where agents step one after the other (turn-based game):
And another example, where agents step one after the other (turn-based game):

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion rllib/algorithms/maddpg/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Implementation of MADDPG in RLlib

Please check [justinkterry/maddpg-rllib](https://github.com/justinkterry/maddpg-rllib) for more information.
Please check [jkterry1/maddpg-rllib](https://github.com/jkterry1/maddpg-rllib) for more information.

0 comments on commit 806b633

Please sign in to comment.