Skip to content

mmakowski/cookiecutter-dr-ds

 
 

Repository files navigation

Cookiecutter DR DS

Data science project template. Based on cookiecutter-data-science.

Prerequisites

  • Python 3.5 or 2.7
  • Cookiecutter Python package >= 1.4.0: This can be installed with pip by or conda depending on how you manage your Python packages:
$ pip install cookiecutter

or

$ conda config --add channels conda-forge
$ conda install cookiecutter

Starting a new project

cookiecutter https://github.com/mmakowski/cookiecutter-dr-ds

The directory structure

The directory structure of your new project looks like this:

├── Makefile             <- Makefile with commands like `make data` or `make train`
├── README.md            <- The top-level README for data scientists using this project.
├── data
│   ├── external         <- Data from third party sources.
│   ├── processed        <- The final, canonical data sets for modeling.
│   └── raw              <- The original, immutable data dump.
│
├── docs                 <- A default Sphinx project; see sphinx-doc.org for details
│
├── models               <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks            <- Jupyter notebooks. Naming convention is a number (for ordering),
│                           the creator's initials, and a short `-` delimited description, e.g.
│                           `01.0-mm-initial-data-exploration`.
│
├── references           <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports              <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures          <- Generated graphics and figures to be used in reporting
│
├── requirements.txt     <- The requirements file for reproducing the analysis environment, e.g.
│                           generated with `pip freeze > requirements.txt`
│
└── src                  <- Source code for use in this project.
    ├── __init__.py      <- Makes src a Python module
    │
    ├── data             <- Scripts to download, generate and transform data
    │   └── process.py   <- [template] transforms the raw data into the canonical format and
    |                       splits into the training and holdout.
    │
    ├── models           <- Scripts to train, evaluate and export the pipeline
    │   ├── evaluate.py  <- [template] evaluates the trained pipeline on the holdout set.
    |   ├── export.py    <- [template] exports the trained pipeline to PMML.
    │   └── train.py     <- [template] Trains the pipeline.
    │
    └── visualization    <- Scripts to create exploratory and results oriented visualizations

First steps

To set up the development sandbox, run:

make create_environment
source activate <project name>
make requirements

Then make evaluate will train and evaluate a dummy model, and make export will export it to PMML.

Next steps

  1. Put the raw data in data/raw, or create a script to download the data from source repository.
  2. Edit src/data/process.py to transform the raw data into TSV files.
  3. Edit src/models/train.py to specify how the model should be trained.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.1%
  • Makefile 35.8%
  • Batchfile 14.1%