Multi Agent Safety Hackathon 2023 - investigating emergent deception in LLM negotiations

Note: Experiments for the hackathon were run on the andy_6pm_exps branch.

We chose the following set of metrics. Since rewards and items differ between rounds, we normalize all metrics to the maximum total utility possible in that round.

Welfare [aka "Net Utility"]: The sum of all agents' utility. How much value did we capture in total compared to what could have been captured?
Inequality [aka "Fairness"]: on average, how big was the utility gap between agents? i.e. utility_agent_one - utility_agent_two [Gandhi et a1., 2023]
Reward: On average, how much utility did each agent end up with? We experiment with agents with different initial moral mappings: baseline agents are not given a specific set of values, but simply prompted to think strategically. Machiavellian agents are prompted to maximize their own rewards by any means possible. Prosocial agents are prompted to look for a fair compromise, while deceptive agents are specifically encouraged that they can misrepresent their own rewards.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
__pycache__		__pycache__
data		data
results		results
analyze_logfile.py		analyze_logfile.py
experiment.py		experiment.py
negotiation_agent.py		negotiation_agent.py
negotiation_environment.py		negotiation_environment.py
plot.py		plot.py
readme.md		readme.md
strategic-strategic.csv		strategic-strategic.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi Agent Safety Hackathon 2023 - investigating emergent deception in LLM negotiations

About

Releases

Packages

Languages

anwang427/deal-or-no-deal

Folders and files

Latest commit

History

Repository files navigation

Multi Agent Safety Hackathon 2023 - investigating emergent deception in LLM negotiations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages