Final update for this cleanup phase

ceshine · Oct 31, 2019 · 2827ec2 · 2827ec2
1 parent 99b6ddc
commit 2827ec2
Show file tree

Hide file tree

Showing 10 changed files with 48 additions and 28 deletions.
diff --git a/Lee2019.pdf b/Lee2019.pdf
diff --git a/README.md b/README.md
@@ -1,8 +1,26 @@
 # 7th place solution to The 3rd YouTube-8M Video Understanding Challenge
 
-(WIP) This is the final states of the codebase at the end of the competition. Code cleanup and documentation are under way.
+A brief model summary can be found [here](https://www.kaggle.com/c/youtube8m-2019/discussion/112349). Please refer to the [workshop paper](Lee2019.pdf) for more details.
 
-A brief model summary can be found [here](https://www.kaggle.com/c/youtube8m-2019/discussion/112349). A more detailed summary will be added later as a paper.
+## 20191031 Update
+
+- Redundant functions and classes have been removed.
+- Some minor refactor.
+- **Manage the models using YAML config files**: a YAML config file is used to specify the model architecture and training parameters. An exported model now consists of a YAML file and a pickled state dictionary.
+
+**Correction to the paper** (and potential bugs): During the code cleanup, I found out that at near the end of competition, I set `num_workers=1` for train data loader when training segment classifiers. In the paper I wrote that I used `num_workers>1` to add more randomness. That was a mistake. In fact, using `num_workers>1` caused some convergence issue when I tried to reproduce the result. There might be some undiscovered bugs in the data loader. Using only one worker, although slower, should reproduce the results correctly.
+
+### Model Reproduction
+
+I've manage to reproduce the results with the cleaned codebase and Docker image (using some of the the remaining GCP credit). Two base models and seven segment classifiers are enough to obtain the 7th place:
+
+![images](reproduction_results.png)
+
+Notes:
+
+1. Because the data loader is reshuffled after each resumption from instance preemption, the base model cannot be exactly reproduced. The base model performs slightly worse this time (in terms of local CV results), and it affected the downstream models.
+2. The Dockerized version seem to be slower. But your mileage may vary.
+3. The training scripts under `/scripts` folder has been updated.
 
 ## System Environment
 
@@ -99,7 +117,3 @@ The submission file will be create as `sub.csv` at the project root folder.
 ## Troubleshooting
 
 - **RuntimeError: received 0 items of ancdata**: [Increasing ulimit and file descriptors limit on Linux](https://glassonionblog.wordpress.com/2013/01/27/increase-ulimit-and-file-descriptors-limit/).
-
-## Potential Improvements
-
-1. **Config-file-based model creation**: currently the entire PyTorch model object is pickled into a file on disk. This is to avoid remembering the hyper-parameters when restoring models, and thus acclerate model iteration. However, it is not considered the best practice. Storing the hyper-parameters in a config file is a better solution. I'll have to do some research to find out how to implement this properly.
diff --git a/images/reproduction_results.png b/images/reproduction_results.png
diff --git a/scripts/context-agnostic.bash b/scripts/context-agnostic.bash
@@ -1,6 +1,7 @@
-SEED=31537 python -m yt8m.train_pure_segment data/cache/video/ dbof-3.pth --name dbof-3 --steps 8000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fold 2 --batch-size 128
-SEED=31537 python -m yt8m.train_pure_segment data/cache/video/ dbof-3.pth --name dbof-3 --steps 8000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fold 1 --batch-size 128
-SEED=1822 python -m yt8m.train_pure_segment data/cache/video/ dbof-3.pth --name dbof-3 --steps 9000 --ckpt-interval 3000 --offset 3 --lr 2e-4 --fold 0 --batch-size 128
-SEED=1423 python -m yt8m.train_pure_segment data/cache/video/ nxvlad-2.pth --name nxvlad-2 --steps 12000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fold 0 --batch-size 128
-SEED=423 python -m yt8m.train_pure_segment data/cache/video/ nxvlad-2.pth --name nxvlad-2 --steps 8000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fold 1 --batch-size 128
-SEED=33537 python -m yt8m.train_pure_segment data/cache/video/ nxvlad-2.pth --name nxvlad-2 --steps 12000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fold 2 --batch-size 128
+SEED=1213 python -m yt8m.train_pure_segment scripts/pure_segment_dbof.yaml data/cache/video/dbof-3/ --fold 0 --name dbof-3
+SEED=1216 python -m yt8m.train_pure_segment scripts/pure_segment_dbof.yaml data/cache/video/dbof-3/ --fold 1 --name dbof-3
+SEED=1351 python -m yt8m.train_pure_segment scripts/pure_segment_dbof.yaml data/cache/video/dbof-3/ --fold 2 --name dbof-3
+
+SEED=5696 python -m yt8m.train_pure_segment scripts/pure_segment_nextvlad.yaml data/cache/video/nextvlad-2/ --fold 0 --name nextvlad-2
+SEED=1696 python -m yt8m.train_pure_segment scripts/pure_segment_nextvlad.yaml data/cache/video/nextvlad-2/ --fold 1 --name nextvlad-2 --steps 12000
+SEED=2396 python -m yt8m.train_pure_segment scripts/pure_segment_nextvlad.yaml data/cache/video/nextvlad-2/ --fold 2 --name nextvlad-2
diff --git a/scripts/context-aware.bash b/scripts/context-aware.bash
@@ -1,6 +1,5 @@
-SEED=98998 python -m yt8m.train_segment_w_context data/cache/video/ dbof-1.pth nxvlad-2.pth --name dbof-1_nxvlad-2 --steps 12000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fcn-dim 2048 --drop 0.5 --fold 6 --se-reduction 4 --max-len 150 --batch-size 128
-SEED=93498 python -m yt8m.train_segment_w_context data/cache/video/ dbof-1.pth nxvlad-2.pth --name dbof-1_nxvlad-2 --steps 12000 --ckpt-interval 4000 --offset 3 --lr 2e-4 --fcn-dim 2048 --drop 0.5 --fold 7 --se-reduction 4 --max-len 150 --batch-size 128
-SEED=23498 python -m yt8m.train_segment_w_context data/cache/video/ dbof-2.pth nxvlad-2.pth --name dbof-2_nxvlad-2 --steps 12000 --ckpt-interval 3000 --offset 3 --lr 2e-4 --fcn-dim 2048 --drop 0.5 --fold 5 --se-reduction 4 --max-len 150 --batch-size 128
-SEED=18448 python -m yt8m.train_segment_w_context data/cache/video/ dbof-3.pth nxvlad-2.pth --name dbof-3_nxvlad-2 --steps 9000 --ckpt-interval 3000 --offset 3 --lr 2e-4 --fcn-dim 2048 --drop 0.5 --fold 3 --se-reduction 4 --max-len 150 --batch-size 128
-SEED=28448 python -m yt8m.train_segment_w_context data/cache/video/ dbof-3.pth nxvlad-2.pth --name dbof-3_nxvlad-2 --steps 9000 --ckpt-interval 3000 --offset 3 --lr 2e-4 --fcn-dim 2048 --drop 0.5 --fold 4 --se-reduction 4 --max-len 150 --batch-size 128
-SEED=7498 python -m yt8m.train_segment_w_context data/cache/video/ nxvlad-2.pth nxvlad-2.pth --name nxvlad-2_nxvlad-2 --steps 15000 --ckpt-interval 5000 --offset 3 --lr 2e-4 --fcn-dim 2048 --drop 0.5 --fold 4 --se-reduction 4 --max-len 150 --batch-size 128
+SEED=4055 python -m yt8m.train_segment_w_context scripts/segment_with_context.yaml data/cache/video/dbof-3 data/cache/video/nextvlad-2 --fold 3 --name dbof-3_nextvlad-2 --steps 12000
+SEED=5055 python -m yt8m.train_segment_w_context scripts/segment_with_context.yaml data/cache/video/dbof-3 data/cache/video/nextvlad-2 --fold 4 --name dbof-3_nextvlad-2
+
+SEED=5455 python -m yt8m.train_segment_w_context scripts/segment_with_context.yaml data/cache/video/nextvlad-2 data/cache/video/nextvlad-2 --fold 5 --name nextvlad-2_x2
+SEED=3055 python -m yt8m.train_segment_w_context scripts/segment_with_context.yaml data/cache/video/dbof-3 data/cache/video/dbof-3 --fold 4 --name dbof-3_x2
diff --git a/scripts/pretraining.bash b/scripts/pretraining.bash
@@ -1,8 +1,4 @@
-SEED=27805 python -m yt8m.train_video nextvlad --steps 200000 --ckpt-interval 10000 --lr 3e-4  --groups 16 --batch-size 48 --n-clusters 64 --max-len 150
-mv data/cache/video/baseline_model.pth data/cache/video/nxvlad-2.pth
-SEED=17805 python -m yt8m.train_video dbof --steps 100000 --ckpt-interval 10000 --lr 3e-4 --batch-size 128 --max-len 150
-mv data/cache/video/baseline_model.pth data/cache/video/dbof-3.pth
-SEED=4827 python -m yt8m.train_video dbof --steps 120000 --ckpt-interval 10000 --lr 3e-4 --batch-size 32 --n-mixtures 5
-mv data/cache/video/baseline_model.pth data/cache/video/dbof-1.pth
-SEED=1635 python -m yt8m.train_video dbof --steps 100000 --ckpt-interval 10000 --lr 4e-4  --batch-size 32 --max-len 200
-mv data/cache/video/baseline_model.pth data/cache/video/dbof-2.pth
+SEED=4827 python -m yt8m.train_video scripts/video_gated_dbof.yaml
+mv $(find data/cache/video/ -name "20*" | head -1) data/cache/video/dbof-3
+SEED=1635 python -m yt8m.train_video scripts/video_nextvlad.yaml
+mv $(find data/cache/video/ -name "20*" | head -1) data/cache/video/nextvlad-2
diff --git a/scripts/pure_segment_nextvlad.yaml b/scripts/pure_segment_nextvlad.yaml
@@ -0,0 +1,9 @@
+pure_segment:
+  training:
+    lr: 2e-4
+    batch_size: 128
+    steps: 8000
+    ckpt_interval: 4000
+    offset: 3
+    weight_decay: 0.02
+    eps: 1e-7
diff --git a/scripts/segment_with_context.yaml b/scripts/segment_with_context.yaml
@@ -8,7 +8,7 @@ segment_w_context:
     n_mixture: 4
   training:
     lr: 2e-4
-    batch_size: 64
+    batch_size: 128
     steps: 9000
     ckpt_interval: 3000
     offset: 3

diff --git a/yt8m/dataloader.py b/yt8m/dataloader.py
@@ -37,6 +37,7 @@ def __init__(self, file_paths, seed=939, debug=False,
                  vocab_path="./data/segment_vocabulary.csv",
                  epochs=1, max_examples=None, offset=0):
         super(YoutubeSegmentDataset).__init__()
+        print("Offset:", offset)
         self.file_paths = file_paths
         self.seed = seed
         self.debug = debug

diff --git a/yt8m/train_pure_segment.py b/yt8m/train_pure_segment.py
@@ -183,7 +183,7 @@ def main():
         torch.optim.Adam(
             optimizer_grouped_parameters,
             lr=lr, eps=float(training_config["eps"])),
-        [training_config["weight_decay"], 0]
+        [float(training_config["weight_decay"]), 0]
     )
     # optimizer = torch.optim.Adam(
     #     optimizer_grouped_parameters, lr=lr, eps=1e-7)