Concept MARL

This repo implements Concept Bottleneck Policies, as described in Grupen et al. [1]. The algorithm extends the MultiAgent-PPO algorithm from in Acme, adding a concept loss term to the PPO objective. Custom policy networks that incorporate the concept bottleneck architecture are provided in helpers.py.

This repo also includes custom implementations of the following environments from DeepMind Melting Pot:

Collaborative Cooking
Clean Up
Capture the Flag

Each environment is extended to support concept extraction, and miniature versions of the original collaborative cooking environments are also provided. The custom concept bottleneck policy networks include an encoder that processes Melting Pot's default multi-modal observations (RGB, position, orientation).

This repo also provides the following wrappers which are used for training concept bottleneck policies: 1.meltingpot_wrapper.py: Converts Melting Pot environment specs to dm_env specs used by Acme. 2.ma_concept_extraction_wrapper.py: Parses concept values for each agent from environment observations using a common prefix. Concept values parsed here are used to compute the concept loss term in concept_ppo/learning.py. 3.meltingpot_cooking_dense_rewards_wrapper.py: Implements pseudo-rewards specific to the collaborative cooking task. 4.meltingpot_pixels_wrapper.py: Implements RGB resizing and grayscaling (similar to Acme's Atari wrapper).

Training

To train Concept PPO agents in the Collaborative Cooking environment, run:

python -m experiments/run_meltingpot.py --env_name='cooking_basic' \
--checkpoint_dir=/tmp/cooking_basic

To train Concept PPO agents in the Clean Up environment, run:

python -m experiments/run_meltingpot.py --env_name='clean_up_mod' \
--checkpoint_dir=/tmp/clean_up_mod

To train Concept PPO agents in the Capture the Flag environment, run:

python -m experiments/run_meltingpot.py --env_name='capture_the_flag_mod' \
--checkpoint_dir=/tmp/capture_the_flag_mod

Other Notes

This project requires manual installation of DeepMind Melting Pot. Installation instructions can be found here (and also in run.sh).

References

[1] N. Grupen, N. Jaques, B. Kim, S. Omidshafiei, "Concept-based Understanding of Emergent Multi-Agent Behavior, NeuIPS Deep RL Workshop 2022 (paper link coming soon).

google-research

Concept MARL

Training

Other Notes

References

Использование cookies

google-research

DDDaniel DuckworthAdd demo notebook for SMERF6 месяцев назадf9150d

Concept MARL

Training

Other Notes

References

Использование cookies