Skip to content
forked from lry-bupt/UCB_MARL

The simulation codes of a provably efficient multi-agent reinforcement learning algorithm with a near-optimal regret bound in industrail data collection.

Notifications You must be signed in to change notification settings

niwanli/UCB_MARL

 
 

Repository files navigation

UCB_MARL

Note: All code and demonstrations are used for the submitted paper:

Ruyu Luo, Wanli Ni, Hui Tian, Julian Cheng, and Kwang-Cheng Chen, "Joint Trajectory and Radio Resource Optimization by Multi-Agent Reinforcement Learning for Autonomous Mobile Robots in Industrial Internet of Things", submitted to IEEE TCom, Dec. 2022.

In this paper, we present the simulation codes of multi-agent reinforcement learning (MARL) with upper-confidence bound (UCB) exploration.

Parameter Settings

Here are the setting of our simulations.

Notation Simulation Value Physical Meaning
$M_k$ $3$ the number of SNs in each group
$x _{\max} $ $30 {\ \rm m}$ the maximal x-axis size of the moving area
$y _{\max} $ $30 {\ \rm m}$ the maximal y-axis size of the moving area
$H_0$ $1 {\ \rm m}$ the antenna height of robots
$H_m$ ${0, 1, 2}$ the antenna height of SNs
$\sigma^2$ $-100 \ {\rm dBm}$ the power of the AWGN
$P_{\max}$ $23 \ {\rm dBm}$ the maximum transmit power
$\beta_{0}$ $-30 \ {\rm dB}$ the large-scale channel power gain at the reference distance $d_0 = 1 \ {\rm m} $
$\alpha$ $2.2$ the path loss exponent
$G$ $10 \ {\rm dB}$ the Rician factor
$\Delta_{s}$ $1.5 \ {\rm m}$ the grid size

Representative visualization results

The visualization of the proposed MARL can be seen in Visual_MARL.

  • Here are four demonstrations for different stages in the MARL training process.
    • the beginning of training show
    • 800 rounds of training   show
    • 1600 rounds of training   show
    • the end of training    show

Source Codes

Here is a simple introduction to the code used in our paper.

  • Figure_1_reward_comparison (Reward comparison between different algorithms)

    • Centralized_QL

      • RL_brain.py:   Centralized tabular Q-learning agent with e-greedy exploration
      • main.py:  Main code of two robots that train with global information, connections between the environment and learning agents without experience exchange
    • UCB_MARL_environment1

      • RL_brain.py:  One learning agent with upper-confidence bound (UCB) exploration
      • main.py:  Main code of four robots, connections between the environment and learning agents, where two robots without interference at the strictly same environment exchange experience.
    • UCB_MARL_environment2

      • RL_brain.py:  One learning agent with upper-confidence bound (UCB) exploration

      • main.py:  Main code of four robots, connections between the environment and learning agents, where two nearby robots with interference at the same SN deployment exchange experience.

  • Figure_2_convergence_comparison (Convergence comparison between different $H$)

    • UCB_MARL

      • RL_brain.py:  One learning agent with upper-confidence bound (UCB) exploration
      • main.py:  Main code of six robots, connections between the environment and learning agents
    • e-greedy_MARL

      • RL_brain.py:  Tabular Q-learning agent with e-greedy exploration

      • main.py:   Main code of two robots that train locally without experience exchange, connections between the environment and learning agents

  • Figure_3_robot_trajectory (Robot trajectory under different $\kappa$)

  • Figure_4_relation_between_R_T (Relation between average sum rate and arrival time)

    • RL_brain.py:  One learning agent with upper-confidence bound (UCB) exploration

    • main.py:   Main code of two robots with experience exchange, connections between the environment and learning agents

  • Figure_5_R_versus_P (Convergence comparison between different $H$)

    • NOMA

      • RL_brain.py:  One learning agent with upper-confidence bound (UCB) exploration
      • main.py:   Main code of robots with experience exchange, connections between the environment and learning agents that communicate with non-orthogonal multiple access (NOMA)
    • OMA

      • RL_brain.py:  One learning agent with upper-confidence bound (UCB) exploration

      • main.py:    Main code of robots with experience exchange, connections between the environment and learning agents that communicate with orthogonal multiple access (OMA)

References

[1] R. Luo, W. Ni and H. Tian, "Visualizing Multi-Agent Reinforcement Learning for Robotic Communication in Industrial IoT Networks," admitted by IEEE INFOCOM Demo, Mar. 2022.

[2] R. Luo, H. Tian and W. Ni, “Communication-Aware Path Design for Indoor Robots Exploiting Federated Deep Reinforcement Learning,” in Proc. IEEE PIMRC, Helsinki, Finland, Sept. 2021, pp. 1197-1202.

[3] C. Jin et al., “Is Q-learning Provably Efficient?” in Proc. NeurIPS, Montr´eal, Canada, Dec. 2018, pp. 4868-4878.

About

The simulation codes of a provably efficient multi-agent reinforcement learning algorithm with a near-optimal regret bound in industrail data collection.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%