Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
binli123 authored Dec 16, 2020
1 parent 1c82123 commit 6a65271
Showing 1 changed file with 17 additions and 12 deletions.
29 changes: 17 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,39 +27,44 @@ Precomputed features for [TCGA Lung Cancer dataset](https://portal.gdc.cancer.go
```
$ python download.py --dataset=tcga
```
This dataset requires 20GB of free disk space.
This dataset requires 20GB of free disk space.

## Process WSI data
If you are processing WSI from raw images, you will need to download the WSIs first.
1. Navigate to './tcga-download/'
1. **Download WSIs.**
Navigate to './tcga-download/' and download WSIs from [TCGA data portal](https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Getting_Started/) using the manifest file and configuration file.
The example shows the case of Windows operating system. The WSIs will be saved in './WSI/TCGA-lung/LUAD' and './WSI/TCGA-lung/LUSC'.
The raw WSIs take about 1TB disc space and may take several days to download. Open command line tool (*Command Prompt* for the case of Windows), navigate to './tcga-download', and use commands:
```
$ cd tcga-download
```
2. Download WSIs from [TCGA data portal](https://docs.gdc.cancer.gov/Data_Transfer_Tool/Users_Guide/Getting_Started/) using the manifest file and configuration file. The example shows the case of Windows operating system. The WSIs will be saved in './WSI/TCGA-lung/LUAD' and './WSI/TCGA-lung/LUSC'. The raw WSIs take about 1TB disc space and may take several days to download. Open command line tool (*Command Prompt* for the case of Windows), navigate to './tcga-download', and use commands:
```
$ gdc-client -m gdc_manifest.2020-09-06-TCGA-LUAD.txt --config config-LUAD.dtt
$ gdc-client -m gdc_manifest.2020-09-06-TCGA-LUSC.txt --config config-LUSC.dtt
```
3. Prepare the patches. We will be using [OpenSlide](https://openslide.org/), a C library with a [Python API](https://pypi.org/project/openslide-python/) that provides a simple interface to read WSI data. We refer the users to [OpenSlide Python API document](https://openslide.org/api/python/) for the details of using this tool. The patches will be saved in './WSI/TCGA-lung/pyramid' in a pyramidal structure for the magnifications of 20x and 5x. Navigate to './tcga-download/OpenSlide/bin' and run the script 'TCGA-pre-crop.py'
2. **Prepare the patches.**
We will be using [OpenSlide](https://openslide.org/), a C library with a [Python API](https://pypi.org/project/openslide-python/) that provides a simple interface to read WSI data. We refer the users to [OpenSlide Python API document](https://openslide.org/api/python/) for the details of using this tool.
The patches will be saved in './WSI/TCGA-lung/pyramid' in a pyramidal structure for the magnifications of 20x and 5x. Navigate to './tcga-download/OpenSlide/bin' and run the script 'TCGA-pre-crop.py'
```
$ python TCGA-pre-crop.py
```
* For training your embedder, we refer the users to [Pytorch implementation of SimCLR](https://github.com/sthalles/SimCLR) for details. We provided a modified script from this repository. Navigate to './simclr' and edit the attributes in the configuration file 'config.yaml'. You will need to determine a batch size that fits your gpu. We recommand to use a batch size of at least 512 to get good simclr features. The trained model weights and loss log are saved in folder './simclr/runs'.
3. **Train the embedder.**
We provided a modified script from this repository [Pytorch implementation of SimCLR](https://github.com/sthalles/SimCLR) For training the embedder.
Navigate to './simclr' and edit the attributes in the configuration file 'config.yaml'. You will need to determine a batch size that fits your gpu(s). We recommand to use a batch size of at least 512 to get good simclr features. The trained model weights and loss log are saved in folder './simclr/runs'.
```
$ python run.py
```

## Training on default datasets
To train DSMIL on standard MIL benchmark dataset:
## **Training on default datasets.**
Train DSMIL on standard MIL benchmark dataset:
```
$ python train_mil.py
```
To switch between MIL benchmark dataset, use option:
Switch between MIL benchmark dataset, use option:
```
[--datasets] # musk1, musk2, elephant, fox, tiger
```
Other options are available for learning rate (0.0002), cross validation fold (5), weight-decay (5e-3), and number of epochs (40).
Other options are available for learning rate (0.0002), cross validation fold (5), weight-decay (5e-3), and number of epochs (40).

To train DSMIL on TCGA Lung Cancer dataset:
Train DSMIL on TCGA Lung Cancer dataset (precomputed features):
```
$ python train_tcga.py
```
Expand Down

0 comments on commit 6a65271

Please sign in to comment.