[RLlib] Minor doc fixes (ray-project#35675)

Signed-off-by: Rohan Potdar <[email protected]>
pcmoritz · May 24, 2023 · 806b633 · 806b633
1 parent 5acf41e
commit 806b633
Show file tree

Hide file tree

Showing 3 changed files with 11 additions and 10 deletions.
diff --git a/doc/source/rllib/index.rst b/doc/source/rllib/index.rst
@@ -83,7 +83,7 @@ The `framework` config lets you choose between "tf2", "tf" and "torch" for execu
 You can also tweak RLlib's default `model` config,and set up a separate config for `evaluation`.
 
 If you want to learn more about the RLlib training API,
-`you can learn more about it here <rllib-training-api>`_.
+`you can learn more about it here <rllib-training.html#using-the-python-api>`_.
 Also, see `here for a simple example on how to write an action inference loop after training. <https://github.com/ray-project/ray/blob/master/rllib/examples/inference_and_serving/policy_inference_after_training.py>`_
 
 If you want to get a quick preview of which **algorithms** and **environments** RLlib supports,

diff --git a/doc/source/rllib/rllib-env.rst b/doc/source/rllib/rllib-env.rst
@@ -25,17 +25,18 @@ Custom env classes passed directly to the algorithm must take a single ``env_con
 
 .. code-block:: python
 
- import gym, ray
+ import gymnasium as gym
+ import ray
  from ray.rllib.algorithms import ppo
 
  class MyEnv(gym.Env):
  def __init__(self, env_config):
  self.action_space = <gym.Space>
  self.observation_space = <gym.Space>
- def reset(self):
- return <obs>
+ def reset(self, seed, options):
+ return <obs>, <info>
  def step(self, action):
- return <obs>, <reward: float>, <done: bool>, <info: dict>
+ return <obs>, <reward: float>, <terminated: bool>, <truncated: bool>, <info: dict>
 
  ray.init()
  algo = ppo.PPO(env=MyEnv, config={
@@ -61,7 +62,7 @@ For a full runnable code example using the custom environment API, see `custom_e
 
 .. warning::
 
- The gym registry is not compatible with Ray. Instead, always use the registration flows documented above to ensure Ray workers can access the environment.
+ The gymnasium registry is not compatible with Ray. Instead, always use the registration flows documented above to ensure Ray workers can access the environment.
 
 In the above example, note that the ``env_creator`` function takes in an ``env_config`` object.
 This is a dict containing options passed in through your algorithm.
@@ -77,8 +78,8 @@ This can be useful if you want to train over an ensemble of different environmen
  choose_env_for(env_config.worker_index, env_config.vector_index))
  self.action_space = self.env.action_space
  self.observation_space = self.env.observation_space
- def reset(self):
- return self.env.reset()
+ def reset(self, seed, options):
+ return self.env.reset(seed, options)
  def step(self, action):
  return self.env.step(action)
 
@@ -186,7 +187,7 @@ Here is an example of an env, in which all agents always step simultaneously:
  # ... {"car_2": True, "__all__": False}
 
 
-An another example, where agents step one after the other (turn-based game):
+And another example, where agents step one after the other (turn-based game):
 
 .. code-block:: python
 

diff --git a/rllib/algorithms/maddpg/README.md b/rllib/algorithms/maddpg/README.md
@@ -1,4 +1,4 @@
 # Implementation of MADDPG in RLlib
 
-Please check [justinkterry/maddpg-rllib](https://github.com/justinkterry/maddpg-rllib) for more information. 
+Please check [jkterry1/maddpg-rllib](https://github.com/jkterry1/maddpg-rllib) for more information.