You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current implementation, sometimes actions take time and the agent doesn't know how to reason about. The canonical example is clicking on the chrome icon and it taking time to boot. The agent then thinks it need to click on it again, but by the time it zooms chrome has loaded and it often clicks in the wrong location.
I have tried giving the agent the ability to bail out on each zoom step but this can cause unintended consequences as the model often will choose to bail out when it shouldn't.
Some options are:
Have a secondary manager agent which continues to watch and can readjust the worker agent. This may be pricey and have race conditions
Having a reflection step after every zoom, but this is costly and slow
Find a way to incorporate time into the prompts
Tune a model to be better at bailing out
The text was updated successfully, but these errors were encountered:
In the current implementation, sometimes actions take time and the agent doesn't know how to reason about. The canonical example is clicking on the chrome icon and it taking time to boot. The agent then thinks it need to click on it again, but by the time it zooms chrome has loaded and it often clicks in the wrong location.
I have tried giving the agent the ability to bail out on each zoom step but this can cause unintended consequences as the model often will choose to bail out when it shouldn't.
Some options are:
The text was updated successfully, but these errors were encountered: