feature(ekiefl): add pooltool env and related configs (#227)

* Add SumToThree pooltool env * Woops * Update datatypes and add single inference mode * Move core into pooltool * Add some speed and memory profiling for env debug * Trying to get CNNs working * Patch #172 * Setup first experiment * Fix up sumtothreeimage * Update obs space to be float * Move image_representation into fork - It was in pooltool ai-framework branch - By moving it here, main branch of pooltool can be used * Start a README * Begin test suite for sum_to_three_env * Add tests for datatypes * Finish test suite for sum_to_three_env * rename tests -> characterize * Delete * Increase to 300,000 replay buffer * Finish README * Fix image link * Link the discussion page * Update pooltool API calls to 0.3.0 * Switch to dataclasses - attrs is not standard library, best not to impose my standards - Also had some docs * Progress on documentation and variable naming * Finish docs for datatypes.py * Data structure changes - Additionally, move reward function into reward module and add options to select different rewards via cfg * Parameterize action space bounds - Remove clunky class methods * Add a module docstring * Finish docstrings for sum_to_three coordinate environment * rm pooltool __init__.py - LSP was getting confused with the `import pooltool` statement * Add pytest * Add pooltool-billiards * Add docs for reward space * Add tests for grayscale conversion, add docs * Add module doc for reward.py * Add docs for image_representation * Fix image env * Update info about px parameter * Add serialie/deserialize methods for RenderConfig * Three things: - move px to RenderConfig - serialize/deserialization methods for RenderConfig - Mimic the refactor in cts env to the image env * Use channels in renderconfig * Buff image_representation visualization - Add an animation * Start consolidation * More consolidation between observation types * consolidate image and coordinate observation types * Remove old file * Add default config * Single source state setting * Add tests * Unused * Add default render config option - Store as attribute * Add speed test script * Small changes * Add sum to three to feature table * Update pooltool README * Move observation/ and reward.py into utils.py * polish(pu): polish sum_to_three configs * feature(pu): add sum_to_three_vector_obs_sac_config.py and polish related config names * polish(pu): polish sum_to_three configs * polish(pu): polish pooltool configs --------- Co-authored-by: dyyoungg <[email protected]> Co-authored-by: 蒲源 <[email protected]> Co-authored-by: 蒲源 <[email protected]>
opendilab · Jul 4, 2024 · 39dfa3c · 39dfa3c
1 parent 540bdcb
commit 39dfa3c
Show file tree

Hide file tree

Showing 28 changed files with 2,375 additions and 4 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1445,4 +1445,7 @@ events.*
 **/tb/*
 **/mcts/ctree/tests_cpp/*
 **/*tmp*
-lzero/mcts/ctree/ctree_alphazero/pybind11
+
+# pooltool-specific stuff
+!/assets/pooltool/**
+lzero/mcts/ctree/ctree_alphazero/pybind11
diff --git a/README.md b/README.md
@@ -140,6 +140,7 @@ The environments and algorithms currently supported by LightZero are shown in th
 | MiniGrid      | ---      | ✔     | ✔          | ✔               | 🔒         | 🔒             |✔|🔒             |
 | Bsuite        | ---      | ✔     | ✔          | ✔               | 🔒         | 🔒             |✔|🔒             |
 | Memory        | ---      | ✔     | ✔          | ✔               | 🔒         | 🔒             |✔|🔒             |
+| SumToThree (billiards) | ---      | 🔒     | 🔒          | ✔               | 🔒         | 🔒             |🔒|🔒             |
 
 
 <sup>(1): "✔" means that the corresponding item is finished and well-tested.</sup>

diff --git a/README.zh.md b/README.zh.md
@@ -127,6 +127,7 @@ LightZero 目前支持的环境及算法如下表所示：
 | MiniGrid      | ---      | ✔     | ✔          | ✔               | 🔒         | 🔒             |✔|🔒             |
 | Bsuite        | ---      | ✔     | ✔          | ✔               | 🔒         | 🔒             |✔|🔒             |
 | Memory        | ---      | ✔     | ✔          | ✔               | 🔒         | 🔒             |✔|🔒             |
+| SumToThree (billiards) | ---      | 🔒     | 🔒          | ✔               | 🔒         | 🔒             |🔒|🔒             |
 
 <sup>(1): "✔" 表示对应的项目已经完成并经过良好的测试。</sup>
 

diff --git a/assets/pooltool/3hits.gif b/assets/pooltool/3hits.gif
diff --git a/assets/pooltool/4hits.gif b/assets/pooltool/4hits.gif
diff --git a/assets/pooltool/discrete.png b/assets/pooltool/discrete.png
diff --git a/assets/pooltool/feature_planes.png b/assets/pooltool/feature_planes.png
diff --git a/assets/pooltool/largecut.gif b/assets/pooltool/largecut.gif
diff --git a/assets/pooltool/nocut.gif b/assets/pooltool/nocut.gif
diff --git a/lzero/model/common.py b/lzero/model/common.py
@@ -337,6 +337,7 @@ def __init__(
 
         self.sim_norm = SimNorm(simnorm_dim=group_size)
 
+
     def forward(self, x: torch.Tensor) -> torch.Tensor:
         """
         Shapes:

diff --git a/lzero/policy/sampled_efficientzero.py b/lzero/policy/sampled_efficientzero.py
@@ -248,8 +248,8 @@ def _init_learn(self) -> None:
             init_w = self._cfg.init_w
             self._model.prediction_network.fc_policy_head.mu.weight.data.uniform_(-init_w, init_w)
             self._model.prediction_network.fc_policy_head.mu.bias.data.uniform_(-init_w, init_w)
-            self._model.prediction_network.fc_policy_head.log_sigma_layer.weight.data.uniform_(-init_w, init_w)
             try:
+                self._model.prediction_network.fc_policy_head.log_sigma_layer.weight.data.uniform_(-init_w, init_w)
                 self._model.prediction_network.fc_policy_head.log_sigma_layer.bias.data.uniform_(-init_w, init_w)
             except Exception as exception:
                 logging.warning(exception)

diff --git a/requirements.txt b/requirements.txt
@@ -6,4 +6,6 @@ bsuite
 minigrid
 moviepy
 pycolab
-line_profiler
+pytest
+pooltool-billiards>=0.3.1
+line_profiler
diff --git a/zoo/atari/config/atari_muzero_config.py b/zoo/atari/config/atari_muzero_config.py
@@ -16,7 +16,6 @@
 batch_size = 256
 max_env_step = int(5e5)
 reanalyze_ratio = 0.
-eps_greedy_exploration_in_collect = True
 
 # =========== for debug ===========
 # collector_env_num = 1

diff --git a/zoo/pooltool/README.md b/zoo/pooltool/README.md
@@ -0,0 +1,123 @@
+# Billiards RL
+
+Welcome to the documentation for billiards simulation within the LightZero framework. Billiards offers an intriguing learning environment for reinforcement learning due to its continuous action space, turn-based play, and the need for long-term planning and strategy formulation.
+
+## Pooltool
+
+Pooltool is a general purpose billiards simulator crafted specifically for science and engineering applications (learn more [here](https://github.com/ekiefl/pooltool)). It has been incorporated into LightZero to create diverse learning environments for billiards games.
+
+## Testing your installation
+
+Pooltool comes pre-installed with LightZero. If you are using a custom setup, follow the _pip_ install instructions [here](https://pooltool.readthedocs.io/en/latest/getting_started/install.html#install-option-1-pip).
+
+Verify pooltool is found in your python path:
+
+```bash
+python -c "import pooltool; print(pooltool.__version__)"
+```
+
+Further test your installation by opening the interactive interface:
+
+```bash
+# Unix
+run_pooltool
+
+# Windows
+run_pooltool.bat
+```
+
+(For instructions on how to play, check out the [Getting Started tutorial](https://pooltool.readthedocs.io/en/latest/getting_started/interface.html))
+
+## Supported Games
+
+Currently supports the following games:
+
+1. **Sum to Three**: A simplified billiards game designed to make learning easier for agents.
+2. **Standard Billiards Games** (planned for future updates): Including 8-ball, 9-ball, and snooker.
+
+The rest of the document provides details for each supported game.
+
+## Game 1: Sum to Three
+
+Standard billiards games like 8-ball, 9-ball, and snooker have complex rulesets which make learning more difficult.
+
+In contrast, _sum to three_ is a fictitious billiards game with a simple ruleset.
+
+### Rules
+
+1. The game is played on a table with no pockets
+1. There are 2 balls: a cue ball and an object ball
+1. The player must hit the object ball with the cue ball
+1. The player scores a point if the number of times a ball hits a cushion is 3
+1. The player takes 10 shots, and their final score is the number of points they achieve
+
+For example, this is a successful shot because there are three ball-cushion collisions:
+
+<img src="../../assets/pooltool/3hits.gif" width="600" />
+
+This is an unsuccessful shot because there are four ball-cushion collisions:
+
+<img src="../../assets/pooltool/4hits.gif" width="600" />
+
+### Observation / Action Spaces
+
+Continuous and discrete observatwon spaces are supported. The continuous observation space uses the coordinates of the two balls as the observation. The discrete observation space is based on configurable image-based feature planes.
+
+In general, when an agent strikes a cue ball, the cue stick is described by 5 continuous parameters:
+
+```
+V0 : positive float
+    What initial velocity does the cue strike the ball?
+phi : float (degrees)
+    The direction you strike the ball
+theta : float (degrees)
+    How elevated is the cue from the playing surface, in degrees?
+a : float
+    How much side english should be put on? -1 being rightmost side of ball, +1 being
+    leftmost side of ball
+b : float
+    How much vertical english should be put on? -1 being bottom-most side of ball, +1 being
+    topmost side of ball
+```
+
+Since sum to three is a simple game, only a reduced action space with 2 parameters is supported:
+
+1. V0: The speed of the cue stick. Increasing this means the cue ball travels further
+1. cut angle: The angle that the cue ball hits the object ball with
+
+For example, in this shot, the cut angle is -70 (hitting the left side of the object ball):
+
+<img src="../../assets/pooltool/largecut.gif" width="600" />
+
+For example, in this shot, the cut angle is 0 (head-on collision):
+
+<img src="../../assets/pooltool/nocut.gif" width="600" />
+
+Based on the game dimensions, a suitable bound for the action parameters is used: [0.3, 3] for speed and [-70, 70] for cut angle.
+
+### Experiments
+
+You can conduct experiments using different observation spaces:
+
+1. **Continuous Observation Space Experiment**:
+    - Run the experiment with:
+      ```bash
+      python ./zoo/pooltool/sum_to_three/config/sum_to_three_config.py
+      ```
+    - Results will be saved in `./data_pooltool_sampled_efficientzero/image-obs`.
+
+2. **Discrete Observation Space Experiment**:
+    - Run the experiment with:
+      ```bash
+      python ./zoo/pooltool/sum_to_three/config/sum_to_three_image_config.py
+      ```
+    - Modify the feature plane information by editing `./zoo/pooltool/sum_to_three/config/feature_plane_config.json`. View the usage example in `./zoo/pooltool/image_representation.py` for details about the feature plane content.
+    - Results will be saved in `./data_pooltool_sampled_efficientzero/vector-obs`.
+
+### Results
+
+TODO(puyuan1996)
+
+## Game 2: 8-ball / 9-ball / 3-cushion / snooker
+
+What billiards game would you like to see next?