Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doom - Same Action Space Across Environments #157

Merged
merged 20 commits into from
Jun 14, 2016
Merged

Doom - Same Action Space Across Environments #157

merged 20 commits into from
Jun 14, 2016

Conversation

ppaquette
Copy link
Contributor

This PR adds another commit to the previous PR.

  • It adds ALT_ATTACK as the 6th command for Doom
  • It makes a default action space for all Doom environments, with some controls disabled between environments.

VizDoom is a series of mission that build on each other.
This PR creates a standard action space of 41 commands (similar to a keyboard to the human player), that is the same between environments (e.g. first command is always ATTACK, second command is always JUMP, etc.) .
With this setup, it becomes possible to run an algorithm on all the Doom environments in sequence (which is likely required to beat the Deathmatch level).

The actions to be performed are submitted as a list of 41 integers, with 1 being active and 0 being inactive.

e.g.
actions = [0] * 41
actions[0] = 1 # ATTACK
actions[13] = 1 # MOVE_FORWARD

The first levels only allow certain commands, the disabled commands are ignored.

The full list of commands is in controls.md

@jietang
Copy link
Contributor

jietang commented Jun 3, 2016

I can't seem to find the ALTATTACK changes - can you direct me as to where to look?

re: Doom action spaces: the original reason I changed to smaller action space sizes is that many of our existing RL algorithm implementations (TRPO, CEM, etc) fail badly when confronted with such a large discrete action space. It's possible to reduce the action space in agent code instead of in the env, but it does increase the barriers to getting started, and Doom is such a visually exciting environment that I'd like to bias in favor of making it as easy as possible to submit agents.

I'm with you on the benefits of having a single state-action space for all of Doom. What do you think about adding this as another set of "full-action" Doom envs e.g. 'DoomTakeCoverFull-v0'? People would start out on the small action space and move to the full action space once they're happy with their algorithms.

@ppaquette
Copy link
Contributor Author

ALT_ATTACK is just the renumbering of other actions after it (e.g. MOVE_RIGHT is now index #10 in all action spaces).

We can probably implement an hybrid solution where the user can either submit a list of available commands, or a list of all 41 commands.

For instance, for doom-basic, users could send an action either of:

action = [0, 1, 0]

where the first parameter is ATTACK, the second MOVE_RIGHT, and the third MOVE_LEFT
or

actions = [0] * 41
actions[0] = 0       # ATTACK
actions[10] = 1      # MOVE_RIGHT
actions[11] = 0      # MOVE_LEFT

@ppaquette
Copy link
Contributor Author

Added 2 commits
#1 - Action can now be a short list of allowed actions, or the full list of 41 commands
#2 - Black observation space (np.zeros) returned when is_finished is true or on error, rather than returning an empty list

@jietang
Copy link
Contributor

jietang commented Jun 4, 2016

I was under the impression that ALTATTACK wasn't properly supported by VizDoom; what effect does adding it to controls.md have?

Adding the ability to use both small and large action spaces doesn't fully solve the problem, because agents need to do some introspection of the action space in order to know how large of an action they should provide to step().

What do you think about the solution of registering a small and large action space environment for each Doom task?

@jietang jietang mentioned this pull request Jun 4, 2016
@ppaquette
Copy link
Contributor Author

ppaquette commented Jun 4, 2016

I thought ALTATTACK wasn't properly supported, but someone mentioned that it was a typo in VizDoom's source code, and the misspelled action is "ALATTACK". So everything is working fine with the misspelled word (in deathmatch.cfg).

Not sure exactly what method you mean by agent introspection.

  1. HighLow.sample() returns a list with all commands (i.e. 41 items), but we can easily make it return the small list by default
  2. HighLow.contains() returns True for both a small list and a large list if all values are between min and max
  3. HighLow.sample() returns the size of the initial matrix ( e.g. (41, 3) ) (which doesn't really make sense, and is not used anywhere else).

My issue with duplicating environments is that

  • it becomes harder to compare algorithms and evaluations for the same environment, because they would have a different name (DoomBasic-v0, DoomBasic-Full-v0).
  • we would have to explain what is the difference between the two, which makes it less intuitive (i.e. do I have to use the regular or the full version)

@jietang
Copy link
Contributor

jietang commented Jun 4, 2016

Yes I can see where you are coming from. I'd like to make the case that Doom with the "small" action space is a different (and much easier) environment compared to Doom with the full action space. Take for instance a "small" action space with 3 key actions: it's possible to write an agent that enumerates all 2^3=8 possibilities and train a policy that maintains a probability distribution over those actions. For the full 41-dimensional action space, you can't even store a distribution over 2^41 possible actions in memory (it would be multiple terabyte). So you'd need to be clever about somehow factorizing your action space. So it's not appropriate to compare agents that are learning on the small vs. large action spaces; the large action space is much more difficult.

Implementation-wise, I'm fine with having a single e.g. DoomDeathmatch class that supports both action spaces, but I'm pushing for a separate 'DoomDeathmatch-v0' and 'DoomDeathmatchFull-v0' as registered environments.

Let me know if you find this compelling.

@ppaquette
Copy link
Contributor Author

The only thing is Deathmatch doesn't really have a "small" action space. All commands are enabled (except deltas), so it's not really possible to beat the level by just attacking and moving left and right.

I'm assuming the only way to beat the level is to train an algorithm on all other levels with all commands, to get used to gameplay and enemy detection, and then run it on Deathmatch.

For the other levels, the "full" action space doesn't really matter, because it will just be used as part of the training for Deathmatch.

I'll modify sample() to return the small list, but I don't think there is a need to split any levels between "simple" and "full"

@ppaquette
Copy link
Contributor Author

  • HighLow.sample() now returns the small list.
  • Removed the "small" action space from Deathmatch to remove confusion.
  • Added description to each level explaining that the small action space is the recommended method, and the full action space is to train an algorithm across multiple environments for the Deathmatch level.

@jietang
Copy link
Contributor

jietang commented Jun 4, 2016

Yes, deathmatch would not have a small action space.

The use case of training on the simple environments and working your way up to deathmatch is a very interesting one. We've been talking about similar curriculum learning problems internally, and I'm not sure we have a great story for how to handle it in gym. One idea is to make a meta-doom environment which cycles through the different tasks from easiest to hardest (either based on number of episodes or reward). I'd be curious if you have concrete thoughts on how to approach this.

Re: small vs large doom environments, after thinking about it I really don't want to be comparing agents trained on small and large action spaces as if it was the same environment. So either we have two different environments and two different action spaces, or we stick with one action space.

I could be convinced that we should use the full action space instead of the small one - my main concern is that the full action space is too hard to make an interesting benchmark. One idea: if you can get a reinforcement learning algorithm to work on e.g. DoomTakeCover with the full action space and without doom specific tweaks I'd be more inclined to use the full space.

@ppaquette
Copy link
Contributor Author

So I'll just add a flag to the init and registration to specify small or large environment.

For the meta-doom, here are a couple of points:

(1) - VizDoom already has an order for the mission: 1-Basic, 2-Corridor, 3-DefendCenter, 4-DefendLine, 5-HealthGathering, 6-MyWayHome, 7-PredictPosition, 8-TakeCover, 9-Deathmatch
(2) - We should standardise the reward for each mission where 0 is the minimum, and 1,000 is the maximum, with 990 (99th percentile) being the reward_threshold to pass the level.
(3) - For some missions, the passing grade is not the 99th percentile (e.g. DoomBasic), so we might have to give 1,000 to any score after a certain point, and have a starting score > 0 (because the player loses points with time and for missing a shot).
(4) - We add an option to the action_space to choose what mission to be played (1 to 9) that is only working for the very first _step, and when is_finished is true.
(5) - The total reward is the sum of the average of the last n episode for each mission (e.g. Total = Avg of last 10 score for mission 1 + Avg last 10 score for mission 2 + ... ... for mission 9)
(6) - The meta-doom reward_threshold is therefore 990 * 9 = 8,910 (i.e. pass all 9 missions)
(7) - We add an option (minimum_threshold) that needs to be passed on a mission for the next one to be available (e.g. 0 means all mission are available from the start, 990 means previous mission must be completed successfully before next mission is available to be selected, or 600 where 60% of the mission must be completed for next mission to be available)
(8) - If a locked mission is selected (because the minimum_threshold for the previous mission hasn't been reached), we start the unlocked mission with the highest order

@ppaquette
Copy link
Contributor Author

Updated difficulty for some missions. Here are the stats I have:

1- Doom Basic
Min Score: -460
Max Score: 90
Rew. Threshold: 10 (kill the enemy with one shot in less than 3 secs)
Human Score: 65 (very easy to achieve > 10)
Difficulty: Very Easy

2- Corridor
Min Score: -120
Max Score: 2280
Rew. Threshold: 1270 (reach vest)
Human Score: 2,273 (you get 1,000 for touching vest)
Difficulty: Very Easy
Note: You just need to run without shooting to beat the level

3- DefendCenter
Min Score: -1
Max Score: 20
Rew. Threshold: 10 (kill 10 enemies)
Human Score: 16
Difficulty: Easy
Note: You just need to figure out that the red worms move faster than the robots

4- DefendLine
Min Score: -1
Max Score: 30
Rew. Threshold: 15 (kill 15 enemies)
Human Score: 28
Difficulty: Medium
Note: Enemies are much tougher as you progress. You need to develop a strategy to kill the fireball-shooting enemies first, rather than just the red worms.

5- HealthGathering
Min Score: 270
Max Score: 2,100
Rew. Threshold: 1,000 (survive 30 seconds)
Human Score: 2,100 (very easy to achieve)
Difficulty: Medium
Note: You just need to create a path full of medkits and move forward. Some thinking and planning required.

6- MyWayHome
Min Score: -0.42
Max Score: 1
Rew. Threshold: 0.5 (find medkit)
Human Score: 0.96 (very easy to achieve)
Difficulty: Medium
Note: Takes 10 seconds to find the medkit the first time, very easy afterwards.

7- PredictPosition
Min Score: -0.07
Max Score: 1
Rew. Threshold: 0.5 (kill the enemy with one shot)
Human Score: 0.95
Difficulty: Hard
Note: The enemy moves in a "S" pattern at constant speed. You need to wait until it is in the middle of the map and shoot in front of it at a certain distance. You need strategy and patience to beat this level.

8- TakeCover
Min Score: 70
Max Score: 2,100
Rew. Threshold: 750 (survive 20 seconds)
Human Score: 1,565
Difficulty: Hard
Note: You need to have a strategy to avoid 10 enemies shooting fireballs at you. Best way is to go on one side, let them shoot, and then move to the other side.

9- Deathmatch
Min Score: 0
Max Score: 150
Rew. Threshold: 20 (kill 20 enemies)
Human Score: 40+
Difficulty: Very Hard
Note: The only way to beat this level is to get the medkits, body armors, and rocket launchers and then kill enemies in the center room. It is almost impossible to kill 20 enemies with the default pistol. Strategy is key here.

@joschu
Copy link
Contributor

joschu commented Jun 4, 2016

I'm strongly in favor of using the full action space, which is fixed across doom environments.
That way, it becomes possible to do transfer learning, and it removes the arbitrary decision of what actions to include in each task.

We shouldn't be too concerned with how our current algorithm implementations do when building the environments, but FWIW, I don't think the large action space will affect TRPO and CEM that much -- just slow them down by a small constant factor. @jietang, did you find that the number of actions made a big difference?

@ppaquette
Copy link
Contributor Author

MOVE_UP and MOVE_DOWN were in deathmatch.cfg, but not in controls.md.

So the full action space now has 43 commands, which replicate all commands in VizDoom.

  • Commands with index 0 to 37 are (0 or 1) commands, where 1 represent a pushed button.
  • Commands with index 38 and 39 represent mouse movement, where values are in the range -10 to +10 (integers). e.g. 38 - LOOK_UP_DOWN_DELTA with value +10 will make the player look up 10 degrees
  • Commands with index 40 to 42 represent speed movement, where values are in the range -100 to +100 (integers) e.g. 40 - MOVE_FORWARD_BACKWARD_DELTA with value +50 will make the player move forward at 50% of maximum speed. Value of -25 will make the player move backward at 25% of max speed.

@jietang
Copy link
Contributor

jietang commented Jun 4, 2016

@ppaquette, the suggestions for meta-doom all look reasonable to me. I think we'll want to include something in the state indicating the current task if it's not already available.

@joschu, definitely agreed that the transfer learning task is interesting. Do you have thoughts on ppaquette's proposal?

Re: action space size, my thinking is was that it's nice to have diverse environments that can be solved with the same policy class (e.g. ffnn with softmax over discrete actions) to enable direct comparison of the learning algorithm itself (which small action spaces enable). Maybe that's unnecessary across domains - thoughts?

Re: running trpo on doom, training on the small action space finished overnight on my (reasonably new) laptop. I haven't spent much time playing with the large action space (it requires some tweaking of the policy parameterization) but I can try it out if you're interested.

@jietang
Copy link
Contributor

jietang commented Jun 10, 2016

@ppaquette Had a chance to catch up with joschu IRL. The conclusion was that we should use the large action space for all environments. Could you make the appropriate changes to this PR? (sorry about the churn from my end)

Re: the meta-doom env, let's start a separate issue or PR to discuss it. One thing to think about: it's hard to tell whether it's set up correctly without an agent that is able to learn on it. So we might want to start small (e.g. with a small number of environments) and develop agent(s) in tandem.

@ppaquette
Copy link
Contributor Author

  • Added 'fast', 'normal' and 'human' mode (env = gym.make('DoomBasic-v0'); env.mode = 'human')
  • 'Fast' mode runs without sleep (~70 fps). It is the default mode
  • 'Normal' mode runs with sleep at ~35fps
  • 'Human' mode allows you to try to beat the mission, and displays reward and variables in real-time in console window
  • Removed small action space, only using the 43 key action space.
  • Properly returning game variables
  • Set non-deterministic to True

Remaining issues:

  • ViZDoomErrorException triggered by game.init() and game.make_action() during unit tests
  • env.seed() is not returning a deterministic environment (observation are different 50% of the time with the same seed)

@@ -1,5 +1,9 @@
from gym.envs.registration import registry, register, make, spec

# To be able to create new-style properties
class Env(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for? Can we get rid of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a property for mode, but I removed it afterwards

I'll remove and resubmit.

@jietang
Copy link
Contributor

jietang commented Jun 14, 2016

Mind rebasing? Looks like there are now conflicts (likely in the scoreboard registration)

ppaquette and others added 19 commits June 14, 2016 16:31
…her than empty list (which was triggering an error)
- Added 'normal', 'fast' and 'human' mode
- Set non-deterministic to True
- Set video.frames_per_second to 35
- Properly returning game variables
@jietang jietang merged commit aff7a64 into openai:master Jun 14, 2016
@jietang
Copy link
Contributor

jietang commented Jun 14, 2016

Thanks @ppaquette

@ppaquette ppaquette deleted the ppaquette-doom-20160602-002 branch June 14, 2016 23:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants