Name	Name	Last commit message	Last commit date
Latest commit History 59 Commits
common	common
rl	rl
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
config.json.dist	config.json.dist
main.py	main.py

(D)RL Agent For PySC2 Environment

Project Status

This project version is mostly usable (aside from the two bugs listed in issues) and will not be further developed. I have verified that it can be installed on a fresh environment with requirements listed (taking care to use the old pysc2 fork) and will successfully train to similar results with ~8 parallel environments (you can run 2 per core)

I am actively working on a complete rewrite and will publish it on this repository when it's ready. Current rewrite status is alpha-ish: I have finished writing core functionality and am polishing out the kinks while verifying it can train on various minigames. ETA: 1-2 weeks.

Introduction

Aim of this project is two-fold:

a.) Reproduce baseline DeepMind results by implementing RL agent (A2C) with neural network model architecture as close as possible to what is described in [1]. This includes embedding categorical (spatial-)features into continuous space with 1x1 convolution and multi-head policy, supporting actions with variable arguments (both spatial and non-spatial).

b.) Improve the results and/or sample efficiency of the baseline solution. Either with alternative algorithms (such as PPO [2]), using reduced set of features (unified across all mini-games) or alternative approaches, such as HRL [3] or Auxiliary Tasks [4].

A video of the trained agent on all minigames can be seen here: https://youtu.be/gEyBzcPU5-w

Running

To train an agent, execute python main.py --envs=1 --map=MoveToBeacon.
To resume training from last checkpoint, specify --restore flag
To run in inference mode, specify --test flag
To change number of rendered environments, specify --render= flag
To change state/action space, specify path to a json config with --cfg_path=. The configuration with reduced feature space used to achieve some of the results above is:

{
  "feats": {
    "screen": ["visibility_map", "player_relative", "unit_type", "selected", "unit_hit_points_ratio", "unit_density"],
    "minimap": ["visibility_map", "camera", "player_relative", "selected"],
    "non_spatial": ["player", "available_actions"]
  }
}

Requirements

StarCraft II 3.17
Python 3.x
Tensorflow >= 1.3
openAI baselines
PySC2 1.2 with action spec fix
- This is a relatively old version, simplest way to install it is to clone my fork and run pip install . inside

Good GPU and CPU are recommended, especially for full state/action space.

Results

These results are gathered with full feature / action config on 32 agents x 16 n-steps.

Map	This Agent	DeepMind	Human
MoveToBeacon	26.3 ± 0.5	26	28
CollectMineralShards	106 ± 4.3	103	177
DefeatRoaches	147 ± 38.7	100	215
DefeatZerglingsAndBanelings	230 ± 106.4	62	727
FindAndDefeatZerglings	43 ± 5	45	61
CollectMineralsAndGas	3340 ± 185	3978	7566
BuildMarines	0.55 ± 0.25	3	133

Learning Curves

Below are screenshots of TensorBoard views of agents learning curves for each minigame. Each curve represents a different random seed run. Here y-axis represents episode cumulative score and x-axis - number of updates. Each update contains 512 samples (32 agents x 16 n-steps).

MoveToBeacon

CollectMineralShards

DefeatRoaches

DefeatZerglingsAndBanelings

CollectMineralsAndGas

Related Work

Authors of xhujoy/pysc2-agents and pekaalto/sc2aibot were the first to attempt replicating [1] and their implementations were used as a general inspiration during development of this project, however their aim was more towards replicating results than architecture, missing key aspects, such as full feature and action space support. Authors of simonmeister/pysc2-rl-agents also aim to replicate both results and architecture, though their final goals seem to be in another direction. Their policy implementation was used as a loose reference for this project.

Acknowledgements

Work in this repository was done as part of bachelor's thesis at University of Tartu under the supervision of Ilya Kuzovkin and Tambet Matiisen.

References

[1] StarCraft II: A New Challenge for Reinforcement Learning
[2] Proximal Policy Optimization Algorithms
[3] Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
[4] Reinforcement Learning with Unsupervised Auxiliary Tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(D)RL Agent For PySC2 Environment

Project Status

Introduction