Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
PyTorchHelperBot		PyTorchHelperBot
data		data
scripts		scripts
yt8m		yt8m
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
gcp-shutdown-script.py		gcp-shutdown-script.py
pylintrc		pylintrc

Repository files navigation

7th place solution to The 3rd YouTube-8M Video Understanding Challenge

(WIP) This is the final states of the codebase at the end of the competition. Code cleanup and documentation are under way.

System Environment

A conda environment.yml generated from running conda env export command is provided.

Special care: you need to install PyTorchHelperBot manually via pip install PyTorchHelperBot/.. (A copy of PyTorch Helper Bot is included in this repo via Git subtree.)

Highlights:

conda
- python=3.7.4=h265db76_1
- pytorch=1.3.0.dev20190909=py3.7_cuda10.0.130_cudnn7.6.2_0
pip
- tensorflow==2.0.0rc0
- joblib==0.13.2
- pandas==0.25.1
- python-telegram-bot==12.0.0

TODO: provide a Dockerfile.

Folder Structure

data
- segment — Put the data from YouTube-8M segment-rated frame-level features dataset here.
  - train
  - test
- video — Put the data from YouTube-8M frame-level features dataset here.
  - train
    - Please check train_file_list.txt for the list of shards used.
  - valid
    - Please check valid_file_list.txt for the list of shards used.
- cache — generated files will be stored in this folder.
  - predictions — dumped numpy.memmap files during inference will be stored here.
  - video — pre-trained video-level models will be stored here.
  - segment — both context-aware and context-agnostic models will be stored here.
  - inference — when making competition submissions, logs will be stored here.
yt8m — this is where the main code, include code that performs model pre-training, finetuning, and inference, resides.
PyTorchHelperBot — a copy of PyTorchHelperBot. A simple high-level PyTorch wrapper I wrote for my personal use.

Hardwares

I used both local comoputer and instances from Google Cloud Compute to train models and make inference.

My local computer:

1 Intel i7-7700K CPU
1 NVIDIA GTX 1070 GPU
16 GB RAM
Linux Mint 19.2

Google Cloud Compute instance:

8 vCPU
20 GB RAM
1 NVIDIA Tesla T4 or 1 NVIDIA Tesla P100
Debian 9

In addition to the disk space for the datasets, 100 GB extra space is needed for the cache folder.

Local computer can only be used to train context-agnostic models, make inference, and create submission files. Video-level model pre-training and context-aware models requries at least T4 to run.

Reproduction Instructions

pretraining: bash scripts/pretraning.bash
finetuning context-agnostic models: bash scripts/context-agnostic.bash
finetuning context-aware models: bash scripts/context-aware.bash
prepare the metadata for the test set: python -m yt8m.prepare_test_meta
creating and dumping predictions to disk: python -m yt8m.inference_memmap
create submission file: python -m yt8m.create_submission_from_memmaps

The submission file will be create as sub.csv at the project root folder.

(if you have trained segment classifiers at hand, put them into data/cache/segment folder, and run step 4 to 6.)

Troubleshooting

RuntimeError: received 0 items of ancdata: Increasing ulimit and file descriptors limit on Linux.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

7th place solution to The 3rd YouTube-8M Video Understanding Challenge

System Environment

Folder Structure

Hardwares

Reproduction Instructions

Troubleshooting

About

Releases

Packages

Languages

License

ceshine/yt8m-2019

Folders and files

Latest commit

History

Repository files navigation

7th place solution to The 3rd YouTube-8M Video Understanding Challenge

System Environment

Folder Structure

Hardwares

Reproduction Instructions

Troubleshooting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages