Doom - Same Action Space Across Environments #157

ppaquette · 2016-06-02T22:10:46Z

This PR adds another commit to the previous PR.

It adds ALT_ATTACK as the 6th command for Doom
It makes a default action space for all Doom environments, with some controls disabled between environments.

VizDoom is a series of mission that build on each other.
This PR creates a standard action space of 41 commands (similar to a keyboard to the human player), that is the same between environments (e.g. first command is always ATTACK, second command is always JUMP, etc.) .
With this setup, it becomes possible to run an algorithm on all the Doom environments in sequence (which is likely required to beat the Deathmatch level).

The actions to be performed are submitted as a list of 41 integers, with 1 being active and 0 being inactive.

e.g.
actions = [0] * 41
actions[0] = 1 # ATTACK
actions[13] = 1 # MOVE_FORWARD

The first levels only allow certain commands, the disabled commands are ignored.

The full list of commands is in controls.md

jietang · 2016-06-03T18:33:59Z

I can't seem to find the ALTATTACK changes - can you direct me as to where to look?

re: Doom action spaces: the original reason I changed to smaller action space sizes is that many of our existing RL algorithm implementations (TRPO, CEM, etc) fail badly when confronted with such a large discrete action space. It's possible to reduce the action space in agent code instead of in the env, but it does increase the barriers to getting started, and Doom is such a visually exciting environment that I'd like to bias in favor of making it as easy as possible to submit agents.

I'm with you on the benefits of having a single state-action space for all of Doom. What do you think about adding this as another set of "full-action" Doom envs e.g. 'DoomTakeCoverFull-v0'? People would start out on the small action space and move to the full action space once they're happy with their algorithms.

ppaquette · 2016-06-03T18:43:04Z

ALT_ATTACK is just the renumbering of other actions after it (e.g. MOVE_RIGHT is now index #10 in all action spaces).

We can probably implement an hybrid solution where the user can either submit a list of available commands, or a list of all 41 commands.

For instance, for doom-basic, users could send an action either of:

action = [0, 1, 0]

where the first parameter is ATTACK, the second MOVE_RIGHT, and the third MOVE_LEFT
or

actions = [0] * 41
actions[0] = 0       # ATTACK
actions[10] = 1      # MOVE_RIGHT
actions[11] = 0      # MOVE_LEFT

ppaquette · 2016-06-03T20:04:24Z

Added 2 commits
#1 - Action can now be a short list of allowed actions, or the full list of 41 commands
#2 - Black observation space (np.zeros) returned when is_finished is true or on error, rather than returning an empty list

jietang · 2016-06-04T00:39:54Z

I was under the impression that ALTATTACK wasn't properly supported by VizDoom; what effect does adding it to controls.md have?

Adding the ability to use both small and large action spaces doesn't fully solve the problem, because agents need to do some introspection of the action space in order to know how large of an action they should provide to step().

What do you think about the solution of registering a small and large action space environment for each Doom task?

ppaquette · 2016-06-04T01:03:23Z

I thought ALTATTACK wasn't properly supported, but someone mentioned that it was a typo in VizDoom's source code, and the misspelled action is "ALATTACK". So everything is working fine with the misspelled word (in deathmatch.cfg).

Not sure exactly what method you mean by agent introspection.

HighLow.sample() returns a list with all commands (i.e. 41 items), but we can easily make it return the small list by default
HighLow.contains() returns True for both a small list and a large list if all values are between min and max
HighLow.sample() returns the size of the initial matrix ( e.g. (41, 3) ) (which doesn't really make sense, and is not used anywhere else).

My issue with duplicating environments is that

it becomes harder to compare algorithms and evaluations for the same environment, because they would have a different name (DoomBasic-v0, DoomBasic-Full-v0).
we would have to explain what is the difference between the two, which makes it less intuitive (i.e. do I have to use the regular or the full version)

jietang · 2016-06-04T01:37:23Z

Yes I can see where you are coming from. I'd like to make the case that Doom with the "small" action space is a different (and much easier) environment compared to Doom with the full action space. Take for instance a "small" action space with 3 key actions: it's possible to write an agent that enumerates all 2^3=8 possibilities and train a policy that maintains a probability distribution over those actions. For the full 41-dimensional action space, you can't even store a distribution over 2^41 possible actions in memory (it would be multiple terabyte). So you'd need to be clever about somehow factorizing your action space. So it's not appropriate to compare agents that are learning on the small vs. large action spaces; the large action space is much more difficult.

Implementation-wise, I'm fine with having a single e.g. DoomDeathmatch class that supports both action spaces, but I'm pushing for a separate 'DoomDeathmatch-v0' and 'DoomDeathmatchFull-v0' as registered environments.

Let me know if you find this compelling.

ppaquette · 2016-06-04T02:09:36Z

The only thing is Deathmatch doesn't really have a "small" action space. All commands are enabled (except deltas), so it's not really possible to beat the level by just attacking and moving left and right.

I'm assuming the only way to beat the level is to train an algorithm on all other levels with all commands, to get used to gameplay and enemy detection, and then run it on Deathmatch.

For the other levels, the "full" action space doesn't really matter, because it will just be used as part of the training for Deathmatch.

I'll modify sample() to return the small list, but I don't think there is a need to split any levels between "simple" and "full"

ppaquette · 2016-06-04T02:46:06Z

HighLow.sample() now returns the small list.
Removed the "small" action space from Deathmatch to remove confusion.
Added description to each level explaining that the small action space is the recommended method, and the full action space is to train an algorithm across multiple environments for the Deathmatch level.

jietang · 2016-06-04T03:14:17Z

Yes, deathmatch would not have a small action space.

The use case of training on the simple environments and working your way up to deathmatch is a very interesting one. We've been talking about similar curriculum learning problems internally, and I'm not sure we have a great story for how to handle it in gym. One idea is to make a meta-doom environment which cycles through the different tasks from easiest to hardest (either based on number of episodes or reward). I'd be curious if you have concrete thoughts on how to approach this.

Re: small vs large doom environments, after thinking about it I really don't want to be comparing agents trained on small and large action spaces as if it was the same environment. So either we have two different environments and two different action spaces, or we stick with one action space.

I could be convinced that we should use the full action space instead of the small one - my main concern is that the full action space is too hard to make an interesting benchmark. One idea: if you can get a reinforcement learning algorithm to work on e.g. DoomTakeCover with the full action space and without doom specific tweaks I'd be more inclined to use the full space.

ppaquette · 2016-06-04T12:40:32Z

So I'll just add a flag to the init and registration to specify small or large environment.

For the meta-doom, here are a couple of points:

(1) - VizDoom already has an order for the mission: 1-Basic, 2-Corridor, 3-DefendCenter, 4-DefendLine, 5-HealthGathering, 6-MyWayHome, 7-PredictPosition, 8-TakeCover, 9-Deathmatch
(2) - We should standardise the reward for each mission where 0 is the minimum, and 1,000 is the maximum, with 990 (99th percentile) being the reward_threshold to pass the level.
(3) - For some missions, the passing grade is not the 99th percentile (e.g. DoomBasic), so we might have to give 1,000 to any score after a certain point, and have a starting score > 0 (because the player loses points with time and for missing a shot).
(4) - We add an option to the action_space to choose what mission to be played (1 to 9) that is only working for the very first _step, and when is_finished is true.
(5) - The total reward is the sum of the average of the last n episode for each mission (e.g. Total = Avg of last 10 score for mission 1 + Avg last 10 score for mission 2 + ... ... for mission 9)
(6) - The meta-doom reward_threshold is therefore 990 * 9 = 8,910 (i.e. pass all 9 missions)
(7) - We add an option (minimum_threshold) that needs to be passed on a mission for the next one to be available (e.g. 0 means all mission are available from the start, 990 means previous mission must be completed successfully before next mission is available to be selected, or 600 where 60% of the mission must be completed for next mission to be available)
(8) - If a locked mission is selected (because the minimum_threshold for the previous mission hasn't been reached), we start the unlocked mission with the highest order

ppaquette · 2016-06-04T15:16:30Z

Updated difficulty for some missions. Here are the stats I have:

1- Doom Basic
Min Score: -460
Max Score: 90
Rew. Threshold: 10 (kill the enemy with one shot in less than 3 secs)
Human Score: 65 (very easy to achieve > 10)
Difficulty: Very Easy

2- Corridor
Min Score: -120
Max Score: 2280
Rew. Threshold: 1270 (reach vest)
Human Score: 2,273 (you get 1,000 for touching vest)
Difficulty: Very Easy
Note: You just need to run without shooting to beat the level

3- DefendCenter
Min Score: -1
Max Score: 20
Rew. Threshold: 10 (kill 10 enemies)
Human Score: 16
Difficulty: Easy
Note: You just need to figure out that the red worms move faster than the robots

4- DefendLine
Min Score: -1
Max Score: 30
Rew. Threshold: 15 (kill 15 enemies)
Human Score: 28
Difficulty: Medium
Note: Enemies are much tougher as you progress. You need to develop a strategy to kill the fireball-shooting enemies first, rather than just the red worms.

5- HealthGathering
Min Score: 270
Max Score: 2,100
Rew. Threshold: 1,000 (survive 30 seconds)
Human Score: 2,100 (very easy to achieve)
Difficulty: Medium
Note: You just need to create a path full of medkits and move forward. Some thinking and planning required.

6- MyWayHome
Min Score: -0.42
Max Score: 1
Rew. Threshold: 0.5 (find medkit)
Human Score: 0.96 (very easy to achieve)
Difficulty: Medium
Note: Takes 10 seconds to find the medkit the first time, very easy afterwards.

7- PredictPosition
Min Score: -0.07
Max Score: 1
Rew. Threshold: 0.5 (kill the enemy with one shot)
Human Score: 0.95
Difficulty: Hard
Note: The enemy moves in a "S" pattern at constant speed. You need to wait until it is in the middle of the map and shoot in front of it at a certain distance. You need strategy and patience to beat this level.

8- TakeCover
Min Score: 70
Max Score: 2,100
Rew. Threshold: 750 (survive 20 seconds)
Human Score: 1,565
Difficulty: Hard
Note: You need to have a strategy to avoid 10 enemies shooting fireballs at you. Best way is to go on one side, let them shoot, and then move to the other side.

9- Deathmatch
Min Score: 0
Max Score: 150
Rew. Threshold: 20 (kill 20 enemies)
Human Score: 40+
Difficulty: Very Hard
Note: The only way to beat this level is to get the medkits, body armors, and rocket launchers and then kill enemies in the center room. It is almost impossible to kill 20 enemies with the default pistol. Strategy is key here.

joschu · 2016-06-04T15:21:05Z

I'm strongly in favor of using the full action space, which is fixed across doom environments.
That way, it becomes possible to do transfer learning, and it removes the arbitrary decision of what actions to include in each task.

We shouldn't be too concerned with how our current algorithm implementations do when building the environments, but FWIW, I don't think the large action space will affect TRPO and CEM that much -- just slow them down by a small constant factor. @jietang, did you find that the number of actions made a big difference?

ppaquette · 2016-06-04T16:25:45Z

MOVE_UP and MOVE_DOWN were in deathmatch.cfg, but not in controls.md.

So the full action space now has 43 commands, which replicate all commands in VizDoom.

Commands with index 0 to 37 are (0 or 1) commands, where 1 represent a pushed button.
Commands with index 38 and 39 represent mouse movement, where values are in the range -10 to +10 (integers). e.g. 38 - LOOK_UP_DOWN_DELTA with value +10 will make the player look up 10 degrees
Commands with index 40 to 42 represent speed movement, where values are in the range -100 to +100 (integers) e.g. 40 - MOVE_FORWARD_BACKWARD_DELTA with value +50 will make the player move forward at 50% of maximum speed. Value of -25 will make the player move backward at 25% of max speed.

jietang · 2016-06-04T17:45:30Z

@ppaquette, the suggestions for meta-doom all look reasonable to me. I think we'll want to include something in the state indicating the current task if it's not already available.

@joschu, definitely agreed that the transfer learning task is interesting. Do you have thoughts on ppaquette's proposal?

Re: action space size, my thinking is was that it's nice to have diverse environments that can be solved with the same policy class (e.g. ffnn with softmax over discrete actions) to enable direct comparison of the learning algorithm itself (which small action spaces enable). Maybe that's unnecessary across domains - thoughts?

Re: running trpo on doom, training on the small action space finished overnight on my (reasonably new) laptop. I haven't spent much time playing with the large action space (it requires some tweaking of the policy parameterization) but I can try it out if you're interested.

jietang · 2016-06-10T00:21:10Z

@ppaquette Had a chance to catch up with joschu IRL. The conclusion was that we should use the large action space for all environments. Could you make the appropriate changes to this PR? (sorry about the churn from my end)

Re: the meta-doom env, let's start a separate issue or PR to discuss it. One thing to think about: it's hard to tell whether it's set up correctly without an agent that is able to learn on it. So we might want to start small (e.g. with a small number of environments) and develop agent(s) in tandem.

ppaquette · 2016-06-10T20:31:20Z

Added 'fast', 'normal' and 'human' mode (env = gym.make('DoomBasic-v0'); env.mode = 'human')
'Fast' mode runs without sleep (~70 fps). It is the default mode
'Normal' mode runs with sleep at ~35fps
'Human' mode allows you to try to beat the mission, and displays reward and variables in real-time in console window
Removed small action space, only using the 43 key action space.
Properly returning game variables
Set non-deterministic to True

Remaining issues:

ViZDoomErrorException triggered by game.init() and game.make_action() during unit tests
env.seed() is not returning a deterministic environment (observation are different 50% of the time with the same seed)

jietang · 2016-06-10T21:11:20Z

gym/envs/__init__.py

@@ -1,5 +1,9 @@
 from gym.envs.registration import registry, register, make, spec

+# To be able to create new-style properties
+class Env(object):


What's this for? Can we get rid of it?

I added a property for mode, but I removed it afterwards

I'll remove and resubmit.

jietang · 2016-06-14T20:24:51Z

Mind rebasing? Looks like there are now conflicts (likely in the scoreboard registration)

…ntrols between environments).

…ull list of 41 commands

…her than empty list (which was triggering an error)

- Added 'normal', 'fast' and 'human' mode - Set non-deterministic to True - Set video.frames_per_second to 35 - Properly returning game variables

…s sporadically

jietang · 2016-06-14T22:57:50Z

Thanks @ppaquette

jietang mentioned this pull request Jun 4, 2016

Doom - Minor changes #156

Closed

ppaquette mentioned this pull request Jun 10, 2016

Remove sleep statement from DoomEnv render #168

Closed

jietang reviewed Jun 10, 2016
View reviewed changes

jietang mentioned this pull request Jun 14, 2016

Doom - Added meta-Doom as separate PR #189

Merged

Doom - Added reward_threshold and timestep_limit for all environments

6149858

ppaquette and others added 19 commits June 14, 2016 16:31

Doom - Returning all available game variables

abdb03d

Doom - Moved _seed to doom_env to avoid repetition in every environment

c39a7a6

Doom - Added ALT_ATTACK and made all action_space equivalent (same co…

6ab318b

…ntrols between environments).

Doom - Actions can either be a short list of allowed actions or the f…

98b0081

…ull list of 41 commands

Doom - Returning black observation space on error or is_finished, rat…

8b75c34

…her than empty list (which was triggering an error)

Doom - HighLow.sample() returns the small list.

749b752

Doom - Updated difficulty for some missions

8a8bc66

Doom - Fixed inconsistency between controls.md and deathmatch.cfg

e8a3640

Doom - Issue #168 - Remove sleep statement from DoomEnv render

e0c8bc4

Doom - Only using full action space (43 keys)

c68a18d

- Added 'normal', 'fast' and 'human' mode - Set non-deterministic to True - Set video.frames_per_second to 35 - Properly returning game variables

Replaced warnings.warn by logger.warn

46a4b3b

Doom - Added NUM_ACTIONS and action_idx instead of x

cdfe07b

Doom - Added NUM_ACTIONS and action_idx instead of x

8a213fb

Doom - reset() only calls game.new_episode() after first call

2ff230b

Doom is now deterministic

25c64bd

Doom - Partial fix for issue #167 - DoomDeathmatch environment crashe…

ffdd73b

…s sporadically

Doom - Standardized envs, simplified _reset

775ecbf

Doom - Removed temporary fix for issue #167

d07b653

Doom - Added scoreboard summary and description

7706e9c

jietang merged commit aff7a64 into openai:master Jun 14, 2016

ppaquette deleted the ppaquette-doom-20160602-002 branch June 14, 2016 23:19

ppaquette mentioned this pull request Jul 18, 2016

Too large action spaces for most Doom environments #248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doom - Same Action Space Across Environments #157

Doom - Same Action Space Across Environments #157

ppaquette commented Jun 2, 2016

jietang commented Jun 3, 2016

ppaquette commented Jun 3, 2016

ppaquette commented Jun 3, 2016

jietang commented Jun 4, 2016

ppaquette commented Jun 4, 2016 •

edited

Loading

jietang commented Jun 4, 2016

ppaquette commented Jun 4, 2016

ppaquette commented Jun 4, 2016

jietang commented Jun 4, 2016

ppaquette commented Jun 4, 2016

ppaquette commented Jun 4, 2016

joschu commented Jun 4, 2016

ppaquette commented Jun 4, 2016

jietang commented Jun 4, 2016

jietang commented Jun 10, 2016

ppaquette commented Jun 10, 2016

jietang Jun 10, 2016

ppaquette Jun 10, 2016

jietang commented Jun 14, 2016

jietang commented Jun 14, 2016

Doom - Same Action Space Across Environments #157

Doom - Same Action Space Across Environments #157

Conversation

ppaquette commented Jun 2, 2016

jietang commented Jun 3, 2016

ppaquette commented Jun 3, 2016

ppaquette commented Jun 3, 2016

jietang commented Jun 4, 2016

ppaquette commented Jun 4, 2016 • edited Loading

jietang commented Jun 4, 2016

ppaquette commented Jun 4, 2016

ppaquette commented Jun 4, 2016

jietang commented Jun 4, 2016

ppaquette commented Jun 4, 2016

ppaquette commented Jun 4, 2016

joschu commented Jun 4, 2016

ppaquette commented Jun 4, 2016

jietang commented Jun 4, 2016

jietang commented Jun 10, 2016

ppaquette commented Jun 10, 2016

jietang Jun 10, 2016

Choose a reason for hiding this comment

ppaquette Jun 10, 2016

Choose a reason for hiding this comment

jietang commented Jun 14, 2016

jietang commented Jun 14, 2016

ppaquette commented Jun 4, 2016 •

edited

Loading