Imitation Learning via Off-Policy Distribution Matching

Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Source code to accompany Imitation Learning via Off-Policy Distribution Matching.

If you use this code for your research, please consider citing the paper:

@inproceedings{
  Kostrikov2020Imitation,
  title={Imitation Learning via Off-Policy Distribution Matching},
  author={Ilya Kostrikov and Ofir Nachum and Jonathan Tompson},
  booktitle={International Conference on Learning Representations},
  year={2020},
  url={https://openreview.net/forum?id=Hyg-JC4FDr}
}

Install Dependencies

pip install -m requirements.txt

You will also need to install Mujoco and use a valid license. Follow the install instructions here.

Datasets are stored in Google Cloud Storage and should be downloaded by running:

wget -P value_dice/datasets/ https://storage.googleapis.com/gresearch/value_dice/datasets/Ant-v2.npz
wget -P value_dice/datasets/ https://storage.googleapis.com/gresearch/value_dice/datasets/HalfCheetah-v2.npz
wget -P value_dice/datasets/ https://storage.googleapis.com/gresearch/value_dice/datasets/Hopper-v2.npz
wget -P value_dice/datasets/ https://storage.googleapis.com/gresearch/value_dice/datasets/Walker2d-v2.npz

Expert Trajectories:

Expert trajectories are generated using the GAIL code.

Running Training

From the root google_research directory, run:

wget -P value_dice/datasets/ https://storage.googleapis.com/gresearch/value_dice/datasets/HalfCheetah-v2.npz
python -m value_dice.train_eval \
  --expert_dir ./datasets/ \
  --save_dir ./save/ \
  --algo value_dice \
  --env_name HalfCheetah-v2 \
  --seed 42 \
  --num_trajectories 1 \
  --alsologtostderr

To reproduce results run:

sh value_dice/run_experiments.sh

google-research

Imitation Learning via Off-Policy Distribution Matching

Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Install Dependencies

Expert Trajectories:

Running Training

Использование cookies

google-research

DDDaniel DuckworthAdd demo notebook for SMERF6 месяцев назадf9150d

Imitation Learning via Off-Policy Distribution Matching

Ilya Kostrikov, Ofir Nachum, Jonathan Tompson

Install Dependencies

Expert Trajectories:

Running Training

Использование cookies