[Setup] [Usage] [Demo] [Citation]
This package provides browsergym
, a gym environment for web task automation in the Chromium browser.
4x4.grid.mp4
Example of a GPT4-V agent executing openended tasks (top row, chat interactive), as well as WebArena and WorkArena tasks (bottom row)
BrowserGym includes the following benchmarks by default:
Designing new web benchmarks with BrowserGym is easy, and simply requires to inherit the AbstractBrowserTask
class.
To install browsergym, you can either install one of the browsergym-miniwob
, browsergym-webarena
, browsergym-visualwebarena
and browsergym-workarena
packages, or you can simply install browsergym
which includes all of these by default.
pip install browsergym
Then, a required step is to setup playwright by running
playwright install chromium
Finally, each benchmark comes with its own specific setup that requires to follow additional steps.
- for miniwob, see miniwob/README.md
- for webarena, see webarena/README.md
- for visualwebarena, see visualwebarena/README.md
- for workarena, see WorkArena
To install browsergym locally for development, use the following commands:
git clone https://github.com/ServiceNow/BrowserGym.git
cd BrowserGym
make install
Boilerplate code to run an agent on an interactive, open-ended task:
import gymnasium as gym
import browsergym.core # register the openended task as a gym environment
env = gym.make(
"browsergym/openended",
task_kwargs={"start_url": "https://www.google.com/"}, # starting URL
wait_for_user_message=True, # wait for a user message after each agent message sent to the chat
)
obs, info = env.reset()
done =