learn-to-follow

Форк
0

3 месяца назад
9 месяцев назад
9 месяцев назад
9 месяцев назад
9 месяцев назад
3 месяца назад
9 месяцев назад
README.md

Example

Open In Colab License: MIT arXiv Paper

Learn to Follow: Lifelong Multi-agent Pathfinding with Decentralized Replanning

This study addresses the challenging problem of decentralized lifelong multi-agent pathfinding. The proposed Follower approach utilizes a combination of a planning algorithm for constructing a long-term plan and reinforcement learning for resolving local conflicts.

Paper: Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

Installation:

pip3 install -r docker/requirements.txt

Installation of ONNX runtime:

wget https://github.com/microsoft/onnxruntime/releases/download/v1.14.1/onnxruntime-linux-x64-1.14.1.tgz \
&& tar -xf onnxruntime-linux-x64-1.14.1.tgz \
&& cp onnxruntime-linux-x64-1.14.1/lib/* /usr/lib/ && cp onnxruntime-linux-x64-1.14.1/include/* /usr/include/

Optionally, you could use the Dockerfile to build the image:

cd docker && sh build.sh

Inference Example:

To execute the Follower algorithm and produce an animation using pre-trained weights, use the following command:

python3 example.py

The animation will be stored in the renders folder.

It's recommended to set environment variable to restrict Numpy CPU threads to 1, avoiding performance issues:

export OMP_NUM_THREADS="1"
export MKL_NUM_THREADS="1"
export OPENBLAS_NUM_THREADS="1"

You can adjust the environment and algorithm parameter using arguments. For example:

python3 example.py --map_name wfi_warehouse --num_agents 128
python3 example.py --map_name pico_s00_od20_na32 --num_agents 32 --algorithm FollowerLite

We offer a Google Colab example that simplifies the process: Open In Colab

Training:

To train Follower from scratch, use the following command:

python3 main.py --actor_critic_share_weights=True --batch_size=16384 --env=PogemaMazes-v0 --exploration_loss_coeff=0.023 --extra_fc_layers=1 --gamma=0.9756 --hidden_size=512 --intrinsic_target_reward=0.01 --learning_rate=0.00022 --lr_schedule=constant --network_input_radius=5 --num_filters=64 --num_res_blocks=8 --num_workers=8 --optimizer=adam --ppo_clip_ratio=0.2 --train_for_env_steps=1000000000 --use_rnn=True

To train FollowerLite from scratch, use the following command:

python3 main.py --actor_critic_share_weights=True --batch_size=16384 --env=PogemaMazes-v0 --exploration_loss_coeff=0.0156 --extra_fc_layers=0 --gamma=0.9716 --hidden_size=16 --intrinsic_target_reward=0.01 --learning_rate=0.00013 --lr_schedule=kl_adaptive_minibatch --network_input_radius=3 --num_filters=8 --num_res_blocks=1 --num_workers=4 --optimizer=adam --ppo_clip_ratio=0.2 --train_for_env_steps=20000000 --use_rnn=False

The parameters are set to the values used in the paper.

Testing and Results Visualization

To reproduce the main results of Follower and FollowerLite using pogema-toolbox, use the following command:

python3 eval.py

This script will run all the experiments, the configurations for which are placed in the experiments folder. The raw data will be saved in the corresponding folders (including plots) and optionally saved to wandb.

Example Configuration:

environment:
name: Pogema-v0
on_target: restart
max_episode_steps: 512
observation_type: POMAPF
collision_system: soft
map_name: wfi_warehouse
num_agents:
grid_search: [ 32, 64, 96, 128, 160, 192 ]
seed:
grid_search: [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
algorithms:
Follower:
name: Follower
num_process: 4
parallel_backend: 'balanced_dask'
No dynamic cost:
name: Follower
num_process: 4
parallel_backend: 'balanced_dask'
override_config:
preprocessing:
use_dynamic_cost: False
No static cost:
name: Follower
num_process: 4
num_threads: 4
parallel_backend: 'balanced_dask'
override_config:
preprocessing:
use_static_cost: False
results_views:
TabularResults:
type: tabular
drop_keys: [ seed ]
print_results: True
05-warehouse:
type: plot
x: num_agents
y: avg_throughput
name: Warehouse $46 \times 33$

Description of Configuration:

The configuration defines the environment settings and the algorithms used for the experiments. It specifies the following:

  • Environment: Includes parameters of the POGEMA environment, behavior on target (restart, corresponding to LifeLong), maximum episode steps (512), observation type, collision system, etc. It also sets up grid searches for the number of agents and seed values. The grid_search can be used for any environment parameter.
  • Algorithms: Details the algorithms to be tested. The primary algorithm is Follower. Variants include "No dynamic cost" and "No static cost," which override specific preprocessing configurations. All algorithms are configurable to use 4 processes and the balanced_dask backend for parallelization, enhancing computational efficiency.
  • Results Views: Defines how the results will be presented, including tabular and plot views.

This example configuration demonstrates how to set up experiments for the Pogema-v0 environment, varying the number of agents and seeds, and comparing different versions of the Follower algorithm.

Raw Data

The raw data, comprising the results of our experiments for Follower and FollowerLite, can be downloaded from the following link: Download Raw Data

Citation:

@inproceedings{skrynnik2024learn,
title={Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning},
author={Skrynnik, Alexey and Andreychuk, Anton and Nesterova, Maria and Yakovlev, Konstantin and Panov, Aleksandr},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={16},
pages={17541--17549},
year={2024}
}

Описание

[AAAI-2024] Follower: This study addresses the challenging problem of decentralized lifelong multi-agent pathfinding. The proposed Follower approach utilizes a combination of a planning algorithm for constructing a long-term plan and reinforcement learning for resolving local conflicts.

Языки

C++

  • Shell
  • Python
Сообщить о нарушении

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.